HOME

TheInfoList



OR:

deepset is a startup that provides software developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in
Berlin Berlin is Capital of Germany, the capital and largest city of Germany, both by area and List of cities in Germany by population, by population. Its more than 3.85 million inhabitants make it the European Union's List of cities in the European U ...
by Milos Rusic, Malte Pietsch, and Timo Möller. deepset authored and maintains the
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
Haystack and its commercial
SaaS Software as a service (SaaS ) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software. SaaS is co ...
offering deepset Cloud.


History

In June 2018, Milos Rusic, Malte Pietsch, and Timo Möller co-founded deepset in
Berlin Berlin is Capital of Germany, the capital and largest city of Germany, both by area and List of cities in Germany by population, by population. Its more than 3.85 million inhabitants make it the European Union's List of cities in the European U ...
,
Germany Germany, officially the Federal Republic of Germany (FRG),, is a country in Central Europe. It is the most populous member state of the European Union. Germany lies between the Baltic and North Sea to the north and the Alps to the sou ...
. In the same year, the company served first customers who wanted to implement NLP services by tailoring BERT language models to their domain. In July 2019, the company released the initial version of the
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
FARM. In November 2019, the company released the initial version of the
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
Haystack. Throughout 2020 and 2021 deepset published several applied research papers at EMNLP, COLING and ACL, the leading conferences in the area of NLP. In 2020, the research contributions comprised German language models named GBERT and GELECTRA, and a
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
dataset addressing the
COVID-19 pandemic The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identified ...
called COVID-QA, which was created in collaboration with
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the devel ...
and has been annotated by biomedical experts. In 2021, the research contributions comprised German models and datasets for
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
and passage retrieval named GermanQuAD and GermanDPR, a semantic answer
similarity metric In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such mea ...
, and an approach for multimodal retrieval of texts and tables to enable
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
on tabular data. Haystack contains implementations of all three contributions, enabling the use of the research through the open source framework. In November 2021, the development of the FARM framework was discontinued and its main features were integrated into the Haystack framework. In April 2022, the company announced its commercial
SaaS Software as a service (SaaS ) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software. SaaS is co ...
offering deepset Cloud. As of October 2022, the most popular finetuned language model created by deepset was downloaded more than 7 million times.


Products and Applications

Haystack is an end-to-end Python framework for building
semantic search Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query. Semantic search seek ...
solutions. With its modular building blocks, software developers can implement pipelines to address various search tasks over large document collections, such as
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
,
document retrieval Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. Us ...
or summarization. It integrates with Hugging Face Transformers,
Elasticsearch Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is dual ...
,
OpenSearch OpenSearch is a collection of technologies that allow the publishing of search results in a format suitable for syndication and aggregation. Introduced in 2005, it is a way for websites and search engines to publish search results in a standard ...
and others. The framework has an active community on
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, co ...
, where so far more than 140 people contributed to its continuous development and it also enjoys a vibrant community on
Meetup Meetup is a social media platform for hosting and organizing in-person and virtual activities, gatherings, and events for people and communities of similar interests, hobbies, and professions. It was founded in 2002 by Scott Heiferman and four ot ...
Thousands of organizations use the framework, including Global 500 enterprises like
Airbus Airbus SE (; ; ; ) is a European multinational aerospace corporation. Airbus designs, manufactures and sells civil and military aerospace products worldwide and manufactures aircraft throughout the world. The company has three divisions: '' ...
, or
Infineon Infineon Technologies AG is a German semiconductor manufacturer founded in 1999, when the semiconductor operations of the former parent company Siemens AG were spun off. Infineon has about 50,280 employees and is one of the ten largest semicond ...
,
Alcatel-Lucent Enterprise ALE International SAS, trading as Alcatel-Lucent Enterprise, is a French software company headquartered in Colombes, France, providing communication equipment and services to telecommunications companies, ISPs and data providers. Since March 201 ...
, BetterUp, Etalab, and Sooth.ai. The deepset Cloud platform supports customers at building scalable NLP applications by covering the entire process of prototyping, experimentation, deployment, and monitoring. It is built on Haystack. FARM was a framework for adapting representation models. One of its core concepts was the implementation of adaptive models, which comprised language models and an arbitrary number of prediction heads. FARM supported domain-adaptation and finetuning of these models with advanced options, for example gradient accumulation, cross-validation or automatic mixed-precision training. Its main features were integrated into Haystack in November 2021 and its development was discontinued at that time.


Funding

On April 28, 2022, deepset announced a Series A investment round of $14 million led by GV, with the participation of Harpoon Ventures, Acequia Capital and a team of experienced commercial
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
founders, such as Alex Ratner (Snorkel AI),
Mustafa Suleyman Mustafa Suleyman (born August 1984) is the co-founder and former head of applied AI at DeepMind, an artificial intelligence company acquired by Google and now owned by Alphabet. His current venture is Inflection AI. Early life Suleyman's fath ...
(
Deepmind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...
), Spencer Kimball (
Cockroach Labs CockroachDB is a commercial distributed SQL database management system, developed by Cockroach Labs. History Cockroach Labs was founded in 2015 by ex-Google employees Spencer Kimball, Peter Mattis, and Ben Darnell. Cockroach Labs founders K ...
),
Jeff Hammerbacher Jeff Hammerbacher is a data scientist. He was chief scientist and cofounder at Cloudera and later served on the faculty of the Icahn School of Medicine at Mount Sinai. Early life Hammerbacher grew up in Fort Wayne, Indiana. His father worked at the ...
(
Cloudera Cloudera, Inc. is an American software company providing enterprise data management systems that make significant use of Apache Hadoop. As of January 31, 2021, the company had approximately 1,800 customers. History Cloudera, Inc. was formed on ...
) and Emil Eifrem (
Neo4j Neo4j is a graph database management system developed by Neo4j, Inc. Described by its developers as an ACID-compliant transactional database with native graph storage and processing, Neo4j is available in a non-open-source "community edition" ...
). A previous pre-seed investment round of $1.6 million on March 8, 2021, was led by System.One and Lunar Ventures, who also participated in the subsequent Series A round.


References

{{reflist Natural language processing software Companies of Germany