HOME

TheInfoList



OR:

Databricks, Inc. is a global data,
analytics Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data, which also falls under and directly relates to the umbrella term, data sc ...
, and
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
(AI) company, founded in 2013 by the original creators of
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
. The company provides a cloud-based platform to help enterprises build, scale, and govern data and AI, including
generative AI Generative artificial intelligence (Generative AI, GenAI, or GAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and str ...
and other
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
models. Databricks pioneered the data lakehouse, a data and AI platform that combines the capabilities of a
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
with a
data lake A data lake is a system or data repository, repository of data stored in its natural/raw format, usually object binary large object, blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor ...
, allowing organizations to manage and use both structured and unstructured data for traditional business analytics and AI workloads. The company similarly develops Delta Lake, an open-source project to bring reliability to data lakes for machine learning and other data science use cases.


History


2013-2021

Databricks grew out of the AMPLab project at
University of California, Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California), is a Public university, public Land-grant university, land-grant research university in Berkeley, California, United States. Founded in 1868 and named after t ...
that was involved in making
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
, an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
distributed computing framework built atop Scala. The company was founded by Ali Ghodsi, Andy Konwinski, Arsalan Tavakoli-Shiraji,
Ion Stoica Ion Stoica (born ) is a Romanian–American computer scientist specializing in distributed systems, cloud computing and computer networking. He is a professor of computer science at the University of California, Berkeley and co-director of AMPL ...
, Matei Zaharia, Patrick Wendell, and Reynold Xin. In November 2017, the company was announced as a first-party service on
Microsoft Azure Microsoft Azure, or just Azure ( /ˈæʒər, ˈeɪʒər/ ''AZH-ər, AY-zhər'', UK also /ˈæzjʊər, ˈeɪzjʊər/ ''AZ-ure, AY-zure''), is the cloud computing platform developed by Microsoft. It has management, access and development of ...
via integration Azure Databricks. In February 2021, together with
Google Cloud Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools ...
, Databricks provided integration with the Google
Kubernetes Kubernetes (), also known as K8s is an open-source software, open-source OS-level virtualization, container orchestration (computing), orchestration system for automating software deployment, scaling, and management. Originally designed by Googl ...
Engine and Google's BigQuery platform. At this point in time, the company said more than 5,000 organizations used its products. ''
Fortune Fortune may refer to: General * Fortuna or Fortune, the Roman goddess of luck * Luck * Wealth * Fate * Fortune, a prediction made in fortune-telling * Fortune, in a fortune cookie Arts and entertainment Film and television * ''The Fortune'' (19 ...
'' ranked Databricks as one of the "Best Large Workplaces for Millennials" in 2021.


2022-Present

In November 2023, Databricks unveiled the Databricks Data Intelligence Platform, a new offering that combines the unification benefits of the lakehouse with MosaicML’s Generative AI technology to enable customers to better understand and use their own proprietary data. The firm was valued at $62 billion in December 2024, following a funding round that raised one of the largest amounts in history, an equivalent to the largest single AI investment ever made. In early March 2025, Databricks announced it would invest $1 billion in San Francisco's downtown. Databricks partnered with
Anthropic Anthropic PBC is an American artificial intelligence (AI) startup company founded in 2021. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI's ChatGPT and Google's Gemini. According to the ...
in March 2025, with the latter's AI products to be put on the Databricks Data Intelligence Platform. The deal was for five years and $100 million. Ali Ghodsi remains CEO of Databricks.


Acquisitions

In June 2020, Databricks bought Redash, an open-source tool for data visualization and building of interactive dashboards. In 2021, it bought German
no-code No-code development platforms (NCDPs) allow creating application software through graphical user interfaces and configuration instead of traditional computer programming based on writing code. As with low-code development platforms, it is meant ...
company 8080 Labs whose product, bamboolib, allowed data exploration without any coding. In May 2023, Databricks bought data security group Okera, extending Databricks data governance capabilities. In June, it bought the open-source generative AI startup MosaicML for $1.4billion. In October, Databricks bought data replication startup Arcion for $100 million. In what is believed to be its sixth acquisition, Databricks bought Tabular, a data-management system used by open source AI, for over $1 billion. In March 2023, in response to the popularity of
OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...
's
ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...
, the company introduced an open-source
language model A language model is a model of the human brain's ability to produce natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation,Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013)"S ...
, named Dolly after
Dolly the sheep Dolly (5 July 1996 – 14 February 2003) was a female Finn-Dorset sheep and the first mammal that was cloned from an adult somatic cell. She was cloned by associates of the Roslin Institute in Scotland, using the process of nuclear trans ...
, that allowed developers to create
chatbot A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of main ...
s. Dolly uses fewer
parameters A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
to produce similar results as ChatGPT, but Databricks had not released formal benchmark tests to show whether its bot actually matched the performance of ChatGPT. Databricks reported $1.6 billion in revenue for the 2023 fiscal year, more than doubling its previous level. In 2025, Databricks acquired a serverless database startup, Neon, for around $1 billion.


Funding

In September 2013, Databricks announced it raised $13.9 million from
Andreessen Horowitz AH Capital Management, LLC (commonly known as Andreessen Horowitz, or a16z) is an American privately held venture capital firm, founded in 2009 by Marc Andreessen and Ben Horowitz. The company is headquartered in Menlo Park, California. As of M ...
and said it aimed to offer an alternative to Google's
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filte ...
system.
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
was a noted investor of Databricks in 2019, participating in the company's Series E at an unspecified amount. The company has raised $1.9 billion in funding, including a $1 billion Series G led by
Franklin Templeton Franklin Resources, Inc. is an American multinational investment management holding company that, together with its subsidiaries, is referred to as Franklin Templeton; it is a global investment firm founded in New York City in 1947 as Franklin ...
at a $28 billion post-money valuation in February 2021. Other investors include
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
,
CapitalG CapitalG Management Company LLC (file no. 5324444) (formerly Google Capital) is the independent growth fund under Alphabet Inc. Founded in 2013, it focuses on larger, Growth capital, growth-stage technology companies, and invests for profit rat ...
(a growth equity firm under
Alphabet Inc. Alphabet Inc. is an American multinational technology conglomerate holding company headquartered in Mountain View, California. Alphabet is the world's third-largest technology company by revenue, after Amazon and Apple, the largest techno ...
) and Salesforce Ventures. In August 2021, Databricks finished its eighth round of funding by raising $1.6 billion and valuing the company at $38 billion. In December 2024, Databricks announced a $10 billion financing at a valuation of $62 billion.


Products

Databricks develops and sells a cloud data platform using the marketing term "lakehouse", a portmanteau of "
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
" and "
data lake A data lake is a system or data repository, repository of data stored in its natural/raw format, usually object binary large object, blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor ...
". Databricks' Lakehouse is based on the open-source
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
framework that allows analytical queries against semi-structured data without a traditional
database schema The database schema is the structure of a database described in a formal language supported typically by a relational database management system (RDBMS). The term "wikt:schema, schema" refers to the organization of data as a blueprint of how the ...
. In October 2022, Lakehouse received
FedRAMP The Federal Risk and Authorization Management Program (FedRAMP) is a United States federal government-wide compliance program that provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud product ...
authorized status for use with the U.S. federal government and contractors. The company has also created Delta Lake, MLflow and Koalas,
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
projects that span data engineering,
data science Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, stru ...
and
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. In June 2020, Databricks launched Delta Engine, a fast query engine for Delta Lake, compatible with Apache Spark and MLflow. In November 2020, Databricks introduced Databricks SQL (previously called SQL Analytics) for running
business intelligence Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include Financial reporting, reporting, online an ...
and analytics reporting on top of data lakes. Analysts can query data sets with standard SQL or use connectors to integrate with business intelligence tools like Holistics,
Tableau Tableau (French for 'little table' literally, also used to mean 'picture'; : tableaux or, rarely, tableaus) may refer to: Arts * ''Tableau'', a series of four paintings by Piet Mondrian titled '' Tableau I'' through to ''Tableau IV'' * '' Tableau ...
, Qlik, SigmaComputing,
Looker ''Looker'' is a 1981 American science fiction thriller film written and directed by Michael Crichton, starring Albert Finney, James Coburn, Susan Dey, and Leigh Taylor-Young. It follows a series of mysterious deaths plaguing female models wh ...
, and ThoughtSpot. Databricks offers a platform for other workloads, including machine learning, data storage and processing, streaming analytics, and business intelligence. In early 2024, Databricks released the Mosaic set of tools for customizing, fine-tuning and building AI systems. It includes AI Vector Search for building RAG models; AI Model Serving, a service for deploying, governing, querying and monitoring models fine-tuned or pre-deployed by Databricks; and AI Pretraining, a platform for enterprises to create their own LLMs. In March 2024, Databricks released DBRX, an open-source foundation model. It has a mixture-of-experts architecture and is built on the MegaBlocks open-source project. DBRX cost $10 million to create. At the time of launch, it was the fastest open-source LLM, based on commonly-used industry benchmarks. It beat other models like Llama 2 at solving logic puzzles and answering general knowledge questions, among other tasks. And while it has 136 billion parameters, it only uses 36 billion, on average, to generate outputs. DBRX also serves as a foundation for companies to build or customize their own AI models. Companies can also use proprietary data to generate higher-quality outputs for specific use cases. In addition to building the Databricks platform, the company has co-organized
massive open online courses A massive open online course (MOOC ) or an open online course is an online course aimed at unlimited participation and open access via the Web. In addition to traditional course materials, such as filmed lectures, readings, and problem sets, ma ...
about Spark and a conference for the Spark community called the Data + AI Summit, formerly known as Spark Summit.


Collaborations

In December 2024, Databricks along with Wiz and Workday has decided to run their products on top of AWS via the new button called "Buy with AWS button". In June 2025, Databricks announced a strategic AI partnership with
Google Cloud Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools ...
, aimed at integrating its Data Intelligence Platform more deeply with Google Cloud's services and accelerating generative AI adoption for shared customers.


Operations

Databricks is headquartered in
San Francisco San Francisco, officially the City and County of San Francisco, is a commercial, Financial District, San Francisco, financial, and Culture of San Francisco, cultural center of Northern California. With a population of 827,526 residents as of ...
. It also has operations in
Canada Canada is a country in North America. Its Provinces and territories of Canada, ten provinces and three territories extend from the Atlantic Ocean to the Pacific Ocean and northward into the Arctic Ocean, making it the world's List of coun ...
,
the Netherlands , Terminology of the Low Countries, informally Holland, is a country in Northwestern Europe, with Caribbean Netherlands, overseas territories in the Caribbean. It is the largest of the four constituent countries of the Kingdom of the Nether ...
, the
United Kingdom The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom (UK) or Britain, is a country in Northwestern Europe, off the coast of European mainland, the continental mainland. It comprises England, Scotlan ...
, and elsewhere.


References

{{reflist Big data companies Companies based in San Francisco Free software companies Privately held companies based in California Software companies based in the San Francisco Bay Area Software companies established in 2013 Software companies of the United States