StreamSQL
   HOME

TheInfoList



OR:

StreamSQL is a query language that extends
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
with the ability to process real-time
data stream In connection-oriented communication, a data stream is the transmission of a sequence of digitally encoded signals to convey information. Typically, the transmitted symbols are grouped into a series of packets. Data streaming has become u ...
s. SQL is primarily intended for manipulating
relations Relation or relations may refer to: General uses * International relations, the study of interconnection of politics, economics, and law on a global level * Interpersonal relationship, association or acquaintance between two or more people * ...
(also known as tables), which are finite
bags A bag, also known regionally as a sack, is a common tool in the form of a floppy container, typically made of cloth, leather, bamboo, paper, or plastic. The use of bags predates recorded history, with the earliest bags being lengths of animal s ...
of
tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
s (rows). StreamSQL adds the ability to manipulate streams, which are infinite sequences of tuples that are not all available at the same time. Because streams are infinite, operations over streams must be
monotonic In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of ord ...
. Queries over streams are generally "continuous", executing for long periods of time and returning incremental results. The StreamSQL language is typically used in the context of a
Data Stream Management System A data stream management system (DSMS) is a computer software system to manage continuous data streams. It is similar to a database management system (DBMS), which is, however, designed for static data in conventional databases. A DBMS also offer ...
(DSMS), for applications including market data analytics,
network monitoring Network monitoring is the use of a system that constantly monitors a computer network for slow or failing components and that notifies the network administrator (via email, SMS or other alarms) in case of outages or other trouble. Network monitor ...
, surveillance, e-fraud detection and prevention,
clickstream A click path or clickstream is the sequence of hyperlinks one or more website visitors follows on a given site, presented in the order viewed. A visitor's click path may start within the website or at a separate third party website, often a search ...
analytics and real-time compliance ( anti-money laundering, RegNMS,
MiFID Markets in Financial Instruments Directive 20142014/65/EU commonly known as MiFID 2), is a directive of the European Union (EU). Together with Regulation No 600/2014 it provides a legal framework for securities markets, investment intermediari ...
). Other streaming and continuous variants of SQL includ
StreamSQL.ioKafka KSQLSQLStreamBuilderWSO2 Stream Processor
{{usurped,
SQLStreams
}
SamzaSQL
an


Technical details

StreamSQL extends the type system of SQL to support streams in addition to tables. Several new operations are introduced to manipulate streams. Selecting from a stream - A standard SELECT statement can be issued against a stream to calculate functions (using the target list) or filter out unwanted tuples (using a WHERE clause). The result will be a new stream. Stream-Relation Join - A stream can be joined with a relation to produce a new stream. Each tuple on the stream is joined with the current value of the relation based on a predicate to produce 0 or more tuples. Union and Merge - Two or more streams can be combined by unioning or merging them. Unioning combines tuples in strict FIFO order. Merging is more deterministic, combining streams according to a sort key. Windowing and Aggregation - A stream can be windowed to create finite sets of tuples. For example, a window of size 5 minutes would contain all the tuples in a given 5 minute period. Window definitions can allow complex selections of messages, based on tuple field values. Once a finite batch of tuples is created, analytics such as count, average, max, etc., can be applied. Windowing and Joining - A pair of streams can also be windowed and then joined together. Tuples within the join windows will combine to create resulting tuples if they fulfill the predicate.


History

StreamSQL is derived from academic research into
Event Stream Processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views streams, or sequences of events in time, as the central input and output ...
, closely related to
complex event processing Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events), and deriving a conclusion from them. Complex event processing (CEP) consists of a set of concepts and techniques de ...
. Led by
Michael Stonebraker Michael Ralph Stonebraker (born October 11, 1943) is an American computer scientist specializing in database, database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to m ...
, a team of 30 professors and students on project Aurora worked collaboratively from 2001 through 2003 to develop the core principles behind StreamSQL. The Aurora project was superseded by th
Borealis project
Borealis is a distributed multi-processor version of Aurora. Query languages