HOME

TheInfoList



OR:

Apache Beam is an
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
unified programming model to define and execute data processing
pipelines A pipeline is a system of pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries around the world. The Un ...
, including ETL, batch and
stream A stream is a continuous body of water, body of surface water Current (stream), flowing within the stream bed, bed and bank (geography), banks of a channel (geography), channel. Depending on its location or certain characteristics, a strea ...
(continuous) processing. Beam Pipelines are defined using one of the provided SDKs and executed in one of the Beam’s supported ''runners'' (
distributed processing Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers. The components of a distributed system commun ...
back-ends) including
Apache Flink Apache Flink is an Open-source software, open-source, unified stream processing, stream-processing and batch processing, batch-processing software framework, framework developed by the Apache Software Foundation. The core of Apache Flink is a dis ...
, Apache Samza,
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
, and Google Cloud Dataflow.


History

Apache Beam is one implementation of the Dataflow model paper. The Dataflow model is based on previous work on distributed processing abstractions at Google, in particular on FlumeJava and Millwheel. Google released an open SDK implementation of the Dataflow model in 2014 and an environment to execute Dataflows locally (non-distributed) as well as in the
Google Cloud Platform Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, Computer data storage, data storage, Data analysis, data analytics, and machine learnin ...
service.


Timeline

Apache Beam makes minor releases every 6 weeks.


See also

* List of Apache Software Foundation projects


References

{{Google FOSS Apache Software Foundation Apache Software Foundation projects Big data products Cluster computing Distributed stream processing Google software Hadoop Java platform Free software programmed in Java (programming language)