Apache Apex is a
YARN
Yarn is a long continuous length of interlocked fibres, used in sewing, crocheting, knitting, weaving, embroidery, ropemaking, and the production of textiles. Thread is a type of yarn intended for sewing by hand or machine. Modern manufac ...
-native platform that unifies
stream and
batch processing
Computerized batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch. Batches may automatically ...
. It processes
big data-in-motion in a way that is
scalable, performant,
fault-tolerant
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
,
stateful, secure, distributed, and easily operable.
Apache Apex was named a top-level project by The Apache Software Foundation on April 25, 2016. As of September 2019, it is no longer actively developed.
Overview
Apache Apex is developed under the
Apache License 2.0. The project was driven by the San Jose, California-based start-up company DataTorrent.
There are two parts of Apache Apex: Apex Core and Apex Malhar. Apex Core is the platform or framework for building distributed applications on
Hadoop
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
. The core Apex platform is supplemented by Malhar, a library of connector and logic functions, enabling rapid application development. These input and output operators provide templates to sources and sinks such as
Alluxio
Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis,
advised by Professor Scott Shenker ...
,
S3,
HDFS
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
,
NFS,
FTP,
Kafka
Franz Kafka (3 July 1883 – 3 June 1924) was a German-speaking Bohemian novelist and short-story writer, widely regarded as one of the major figures of 20th-century literature. His work fuses elements of realism and the fantastic. It typ ...
,
ActiveMQ,
RabbitMQ
RabbitMQ is an open-source message-broker software (sometimes called message-oriented middleware) that originally implemented the Advanced Message Queuing Protocol (AMQP) and has since been extended with a plug-in architecture to support Stre ...
,
JMS,
Cassandra
Cassandra or Kassandra (; Ancient Greek: Κασσάνδρα, , also , and sometimes referred to as Alexandra) in Greek mythology was a Trojan priestess dedicated to the god Apollo and fated by him to utter true prophecies but never to be believe ...
,
MongoDB
MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Ser ...
,
Redis
Redis (; Remote Dictionary Server) is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, suc ...
,
HBase
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Fil ...
,
CouchDB
Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.
CouchDB uses multiple formats and protocols to store, transfer, and process its data. It uses JSON to store data, JavaScript as its query language using ...
, generic
JDBC
Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. ...
, and other database connectors.
History
DataTorrent has developed the platform since 2012 and then decided to open source the core that became Apache Apex. It entered incubation in August 2015 and became Apache Software Foundation top level project within 8 months. DataTorrent itself shut down in May 2018.
As of September 2019, Apache Apex is no longer being developed.
Apex Big Data World
Apex Big Data World
is a conference about Apache Apex. The first conference of Apex Big Data World took place in 2017. They were held in Pune, India and Mountain View, California, USA.
References
External links
*
{{DEFAULTSORT:Apex
Apache Software Foundation projects
Free software programmed in Java (programming language)
Apache Software Foundation
Software using the Apache license
Free system software
Distributed stream processing