DataStax
   HOME

TheInfoList



OR:

DataStax, Inc. is a real-time data for AI company based in
Santa Clara, California Santa Clara ( ; Spanish language, Spanish for "Clare of Assisi, Saint Clare") is a city in Santa Clara County, California. The city's population was 127,647 at the 2020 United States census, 2020 census, making it the List of cities and towns i ...
. Its product Astra DB is a cloud database-as-a-service based on
Apache Cassandra Apache Cassandra is a free and open-source software, free and open-source database management system designed to handle large volumes of data across multiple Commodity computing, commodity servers. The system prioritizes availability and scalab ...
. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on
Apache Pulsar The Apache ( ) are several Southern Athabaskan language-speaking peoples of the Southwest, the Southern Plains and Northern Mexico. They are linguistically related to the Navajo. They migrated from the Athabascan homelands in the north into th ...
. As of June 2022, the company has roughly 800 customers distributed in over 50 countries.


History

DataStax was built on the open source
NoSQL NoSQL (originally meaning "Not only SQL" or "non-relational") refers to a type of database design that stores and retrieves data differently from the traditional table-based structure of relational databases. Unlike relational databases, which ...
database
Apache Cassandra Apache Cassandra is a free and open-source software, free and open-source database management system designed to handle large volumes of data across multiple Commodity computing, commodity servers. The system prioritizes availability and scalab ...
. Cassandra was initially developed internally at
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
to handle large data sets across multiple servers, and was released as an Apache open source project in 2008. In 2010, Jonathan Ellis and Matt Pfeil left
Rackspace Rackspace Technology, Inc. is an American cloud computing company based in San Antonio, Texas. It also has offices in Blacksburg, Virginia, Blacksburg, Virginia and Austin, Texas, as well as in Australia, Canada, United Kingdom, India, Dubai, Sw ...
, where they had worked with Cassandra, to launch Riptano in Austin, Texas. Ellis and Pfeil later renamed the company DataStax, and moved its headquarters to Santa Clara, California. The company went on to create its own enterprise version of Cassandra, a NoSQL database called DataStax Enterprise (DSE). In 2019, Chet Kapoor was named the company's new CEO, taking over from Billy Bosworth. In May 2020, DataStax released Astra DB, a DBaaS for Cassandra applications. In November 2020, DataStax released K8ssandra, an open source distribution of Cassandra on Kubernetes. In December 2020, DataStax released Stargate, an open source data API gateway. After acquiring streaming event vendor Kesque in January 2021, the company launched Luna Streaming, a data streaming platform for Apache Pulsar. DataStax then rebuilt the Kesque technology into Astra Streaming. The Astra Streaming cloud service became generally available on June 29, 2022. With the release, the company added API-level support for messaging tools
Apache Kafka Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency pl ...
, RabbitMQ and Java Message Service, in addition to Apache Pulsar. Astra Streaming can connect to a larger data platform by utilizing DataStax's Astra DB cloud service. Starting in 2023, DataStax began incorporating
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
and
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
into its platform. In January 2023, the company acquired Kaskada, developer of a platform that helps organizations use data for AI applications. DataStax made the formerly proprietary Kaskada technology open source, and integrated it into its Luna ML service, which was launched on May 4, 2023. With the acquisition, former Kaskada CEO Davor Bonaci was named DataStax chief technology officer and executive vice president. On May 24, 2023, DataStax announced that it would be partnering with ThirdAI to bring large language models to DSE and AstraDB, to help developers develop generative AI applications. In June 2023, the company announced the development of a GPT-based schema translator in its Astra Streaming cloud service. The Astra Streaming GPT Schema Translator uses generative AI to automatically generate schema mappings, to enable
data integration Data integration refers to the process of combining, sharing, or synchronizing data from multiple sources to provide users with a unified view. There are a wide range of possible applications for data integration, from commercial (such as when a ...
and interoperability between multiple systems and data sources. On July 18, 2023, the company announced a partnership with
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
to make
semantic search Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query. Semantic search seek ...
available in its Astra DB cloud database for developers building generative AI applications. On September 13, 2023, DataStax launched the LangStream open source project, which works with Astra DB and supports vector databases including Milvus and Pinecone. LangStream enables developers to better work with streaming data sources, using Apache Kafka technology and generative AI to help build event-driven architectures. In November 2023, DataStax announced RAGStack, a simplified commercial offering for RAG (
retrieval-augmented generation Retrieval-augmented generation (RAG) is a technique that enables large language model, large language models (LLMs) to retrieve and incorporate new information. With RAG, LLMs do not respond to user queries until they refer to a specified set of d ...
) based on LangChain and Astra DB vector search. On February 25, 2025,
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
announced its intention to acquire DataStax.


Products


Astra DB

Astra DB is available on cloud services such as
Microsoft Azure Microsoft Azure, or just Azure ( /ˈæʒər, ˈeɪʒər/ ''AZH-ər, AY-zhər'', UK also /ˈæzjʊər, ˈeɪzjʊər/ ''AZ-ure, AY-zure''), is the cloud computing platform developed by Microsoft. It has management, access and development of ...
,
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
, and
Google Cloud Platform Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, Computer data storage, data storage, Data analysis, data analytics, and machine learnin ...
. In February 2021, DataStax announced the serverless version of Astra DB, offering developers pay-as-you-go data. In March 2022, DataStax introduced new
change data capture In databases, change data capture (CDC) is a set of software design patterns used to determine and track the data that has changed (the "deltas") so that action can be taken using the changed data. The result is a delta-driven dataset. CDC is an ...
(CDC) capabilities to its Astra DB cloud service. Astra DB CDC is powered by Apache Pulsar, which allows developers to manage operational and streaming data in one place. DataStax leads the open-source Starlight, which provides a
compatibility layer In software engineering, a compatibility layer is an interface that allows binaries for a legacy or foreign system to run on a host system. This translates system calls for the foreign system into native system calls for the host system. With s ...
for different protocols on top of Apache Pulsar. On February 8, 2023, DataStax launched Astra Block, a cloud-based service based on the
Ethereum Ethereum is a decentralized blockchain with smart contract functionality. Ether (abbreviation: ETH) is the native cryptocurrency of the platform. Among cryptocurrencies, ether is second only to bitcoin in market capitalization. It is open-s ...
blockchain to support building
Web3 Web3 (also known as Web 3.0) is an idea for a new iteration of the World Wide Web which incorporates concepts such as decentralization, blockchain technologies, and token-based economics. This is distinct from Tim Berners-Lee's concept of th ...
applications, available as part of Astra DB. Astra Block can be used by developers to stream enhanced data from the Ethereum blockchain to build or scale Web3 experiences on Astra DB. Astra DB supports open source LangChain technology, making it easier for developers to create generative AI applications.


DSE

Version 1.0 of the DataStax Enterprise (DSE), released in October 2011, was the first commercial distribution of the Cassandra database, designed to provide real-time application performance and heavy analytics on the same physical infrastructure. It grew to include advanced security controls, graph database models, operational analytics and advanced search capabilities. In April 2016, the company announced the release of DataStax Enterprise Graph, adding graph data model functionality to DSE. In March 2017, DataStax announced the release of its DSE platform 5.1, which included improved search capabilities, improved security control, improvements to its Graph data management and improvements to operational analytics performance. DataStax also announced a shift in strategy, with an added focus on customer experience applications. Rather than a new set of technologies, the company started to offer advice on best practice to users of its core DSE platform. In April 2018, DataStax released DSE 6, with the new version focused on businesses using a hybrid cloud computing model, with all the benefits of a distributed cloud database on any public cloud or on-premise, twice the responsiveness and ability to handle twice the throughput. In December 2018, DataStax released DSE 6.7, which offers enterprise customers five key new feature upgrades, including: improved analytics, geospatial search, improved data protection in the cloud, enhanced performance insights and new developer integration tools with Apache Kafka Connector and certified production Docker images. In April 2020, DataStax released DSE 6.8, offering enterprises new capabilities for bare-metal performance and to support more workloads, and serving as a Kubernetes operator for Cassandra. DSE 7.0 was introduced in August 2023. It offers enhancements in cloud-native operations and generative AI capabilities, and includes vector search.


Funding and IPO

In September 2014, DataStax raised in a Series E funding round, raising the total investment in the company to . On June 15, 2022, the company announced it had raised an additional , at a valuation. In 2020, Mergermarket reported that DataStax was preparing for an initial public offering that could launch in 2021. However, in June 2022, DataStax CEO Chet Kapoor said that the company would not rush into an IPO.


See also

*
Apache Cassandra Apache Cassandra is a free and open-source software, free and open-source database management system designed to handle large volumes of data across multiple Commodity computing, commodity servers. The system prioritizes availability and scalab ...
*
Wide-column store A wide-column store (or extensible record store) is a type of NoSQL database.Wide Column Stores


References


External links

*{{Official website Companies based in Santa Clara, California Column-oriented DBMS software for Linux Bigtable implementations Graph databases NoSQL American companies established in 2010 Cloud computing providers Artificial intelligence companies