Dataflow Language
   HOME

TheInfoList



OR:

In
computer programming Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
, dataflow programming is a
programming paradigm A programming paradigm is a relatively high-level way to conceptualize and structure the implementation of a computer program. A programming language can be classified as supporting one or more paradigms. Paradigms are separated along and descri ...
that models a program as a directed graph of the data flowing between operations, thus implementing
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dat ...
principles and architecture. Dataflow
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s share some features of
functional language In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map ...
s, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term ''datastream'' instead of ''
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dat ...
'' to avoid confusion with dataflow computing or
dataflow architecture Dataflow architecture is a dataflow-based computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures have no program counter, in concept: the executability and ex ...
, based on an indeterministic machine paradigm. Dataflow programming was pioneered by
Jack Dennis Jack Bonnell Dennis (born October 13, 1931) is an American computer scientist and Emeritus Professor of Computer Science and Engineering at Massachusetts Institute of Technology. The work of Dennis in computer systems and computer languages is ...
and his graduate students at MIT in the 1960s.


Considerations

Traditionally, a program is modelled as a series of operations happening in a specific order; this may be referred to as sequential, procedural,
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
(indicating that the program chooses a specific path), or
imperative programming In computer science, imperative programming is a programming paradigm of software that uses Statement (computer science), statements that change a program's state (computer science), state. In much the same way that the imperative mood in natural ...
. The program focuses on commands, in line with the von Neumann vision of sequential programming, where data is normally "at rest". In contrast, dataflow programming emphasizes the movement of data and models programs as a series of connections. Explicitly defined inputs and outputs connect operations, which function like
black box In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...
es. An operation runs as soon as all of its inputs become valid. Thus, dataflow languages are inherently parallel and can work well in large, decentralized systems.


State

One of the key concepts in computer programming is the idea of
state State most commonly refers to: * State (polity), a centralized political organization that regulates law and society within a territory **Sovereign state, a sovereign polity in international law, commonly referred to as a country **Nation state, a ...
, essentially a snapshot of various conditions in the system. Most programming languages require a considerable amount of state information, which is generally hidden from the programmer. Often, the computer itself has no idea which piece of information encodes the enduring state. This is a serious problem, as the state information needs to be shared across multiple processors in parallel processing machines. Most languages force the programmer to add extra code to indicate which data and parts of the code are important to the state. This code tends to be both expensive in terms of performance, as well as difficult to read or debug.
Explicit parallelism In computer programming, explicit parallelism is the representation of concurrent computations using primitives in the form of operators, function calls or special-purpose directives. Most parallel primitives are related to process synchronizati ...
is one of the main reasons for the poor performance of
Enterprise Java Beans Jakarta Enterprise Beans (EJB; formerly Enterprise JavaBeans) is one of several Java APIs for modular construction of enterprise software. EJB is a server-side software component that encapsulates business logic of an application. An EJB web c ...
when building data-intensive, non-
OLTP Online transaction processing (OLTP) is a type of database system used in transaction-oriented applications, such as many operational systems. "Online" refers to the fact that such systems are expected to respond to user requests and process them i ...
applications. Where a sequential program can be imagined as a single worker moving between tasks (operations), a dataflow program is more like a series of workers on an
assembly line An assembly line, often called ''progressive assembly'', is a manufacturing process where the unfinished product moves in a direct line from workstation to workstation, with parts added in sequence until the final product is completed. By mechan ...
, each doing a specific task whenever materials are available. Since the operations are only concerned with the availability of data inputs, they have no hidden state to track, and are all "ready" at the same time.


Representation

Dataflow programs are represented in different ways. A traditional program is usually represented as a series of text instructions, which is reasonable for describing a serial system which pipes data between small, single-purpose tools that receive, process, and return. Dataflow programs start with an input, perhaps the
command line A command-line interface (CLI) is a means of interacting with software via command (computing), commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user ...
parameters, and illustrate how that data is used and modified. The flow of data is explicit, often visually illustrated as a line or pipe. In terms of encoding, a dataflow program might be implemented as a
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
, with uniquely identified inputs as the keys, used to look up pointers to the instructions. When any operation completes, the program scans down the list of operations until it finds the first operation where all inputs are currently valid, and runs it. When that operation finishes, it will typically output data, thereby making another operation become valid. For parallel operation, only the list needs to be shared; it is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's runtime. On machines with a single processor core where an implementation designed for parallel operation would simply introduce overhead, this overhead can be removed completely by using a different runtime.


Incremental updates

Some recent dataflow libraries such as Differential/
Timely __NOTOC__ Punctuality is the characteristic of completing a required task or fulfilling an obligation before or at a previously designated time based on job requirements and or daily operations. "Punctual" is often used synonymously with "on ti ...
Dataflow have used
incremental computing Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is su ...
for much more efficient data processing.


History

A pioneer dataflow language was BLOck DIagram ( BLODI), published in 1961 by John Larry Kelly, Jr., Carol Lochbaum and
Victor A. Vyssotsky Victor Alexander Vyssotsky (February 26, 1931 – December 24, 2012) was a mathematician and computer scientist. He was the technical head of the Multics project at Bell Labs and later executive director of Research in the Information Systems Divi ...
for specifying
sampled data systems In systems science, a sampled-data system is a control system in which a continuous-time plant is controlled with a digital device. Under periodic sampling, the sampled-data system is time-varying but also periodic; thus, it may be modeled by a si ...
. A BLODI specification of functional units (amplifiers, adders, delay lines, etc.) and their interconnections was compiled into a single loop that updated the entire system for one clock tick. In a 1966 Ph.D. thesis, ''The On-line Graphical Specification of Computer Procedures'',
Bert Sutherland William Robert Sutherland (May 10, 1936 – February 18, 2020) was an American computer scientist who was the longtime manager of three prominent research laboratories, including Sun Microsystems Laboratories (1992–1998), the Systems Science L ...
created one of the first graphical dataflow programming frameworks in order to make parallel programming easier. Subsequent dataflow languages were often developed at the large
supercomputer A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...
labs. POGOL, an otherwise conventional data-processing language developed at
NSA The National Security Agency (NSA) is an intelligence agency of the United States Department of Defense, under the authority of the director of national intelligence (DNI). The NSA is responsible for global monitoring, collection, and proces ...
, compiled large-scale applications composed of multiple file-to-file operations, e.g. merge, select, summarize, or transform, into efficient code that eliminated the creation of or writing to intermediate files to the greatest extent possible.
SISAL Sisal (, ; ''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The sisal fiber is ...
, a popular dataflow language developed at
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a Federally funded research and development centers, federally funded research and development center in Livermore, California, United States. Originally established in 1952, the laboratory now i ...
, looks like most statement-driven languages, but variables should be assigned once. This allows the
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
to easily identify the inputs and outputs. A number of offshoots of SISAL have been developed, including SAC, ''Single Assignment C'', which tries to remain as close to the popular
C programming language C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of ...
as possible. The United States Navy funded development of signal processing graph notation (SPGN) and ACOS starting in the early 1980s. This is in use on a number of platforms in the field today.Underwater Acoustic Data Processing, Y.T. Chan A more radical concept is
Prograph Prograph is a Visual programming language, visual, Object-oriented programming, object-oriented, dataflow programming, dataflow, Multi-paradigm programming language, multiparadigm programming language that uses iconic symbols to represent action ...
, in which programs are constructed as graphs onscreen, and variables are replaced entirely with lines linking inputs to outputs. Prograph was originally written on the
Macintosh Mac is a brand of personal computers designed and marketed by Apple Inc., Apple since 1984. The name is short for Macintosh (its official name until 1999), a reference to the McIntosh (apple), McIntosh apple. The current product lineup inclu ...
, which remained single-processor until the introduction of the DayStar Genesis MP in 1996. There are many hardware architectures oriented toward the efficient implementation of dataflow programming models. MIT's tagged token dataflow architecture was designed by
Greg Papadopoulos Gregory Michael Papadopoulos (born 1958) is an American engineer, computer scientist, executive, and venture capitalist. He is the creator and lead proponent for Redshift, a theory on whether technology markets are over or under-served by Moore's ...
. Data flow has been proposed as an abstraction for specifying the global behavior of distributed system components: in the
live distributed object Live distributed object (also abbreviated as ''live object'') refers to a running instance of a distributed multi-party (or peer-to-peer) protocol, viewed from the object-oriented perspective, as an entity that has a distinct identity, may enc ...
s programming model,
distributed data flow Distributed data flow (also abbreviated as ''distributed flow'') refers to a set of events in a distributed application or protocol. Distributed data flows serve a purpose analogous to variables or method parameters in programming languages suc ...
s are used to store and communicate state, and as such, they play the role analogous to variables, fields, and parameters in Java-like programming languages.


Languages

Dataflow programming languages include: *
Céu (programming language) Céu is a synchronous reactive language intended for front-end applications that aims to be a safer alternative to C and C++. Céu supports synchronous concurrency with shared memory and deterministic execution and has a small memory footprint. ...
* ASCET *
AviSynth AviSynth is a frameserver program for Microsoft Windows, Linux and macOS initially developed by Ben Rudiak-Gould, Edwin van Eggelen, Klaus Post, Richard Berg and Ian Brabham in May 2000 and later picked up and maintained by the open source commu ...
scripting language, for video processing *
BMDFM Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications ...
Binary Modular Dataflow Machine *
CAL Cal or CAL may refer to: Arts and entertainment * ''Cal'' (novel), a 1983 novel by Bernard MacLaverty * "Cal" (short story), a science fiction short story by Isaac Asimov * ''Cal'' (1984 film), an Irish drama starring John Lynch and Helen Mir ...
*
Cuneiform Cuneiform is a Logogram, logo-Syllabary, syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform script ...
, a
functional Functional may refer to: * Movements in architecture: ** Functionalism (architecture) ** Form follows function * Functional group, combination of atoms within molecules * Medical conditions without currently visible organic basis: ** Functional s ...
workflow language. *
CMS Pipelines CMS Pipelines is a feature of the VM/CMS operating system that allows the user to create and use a pipeline. The programs in a pipeline operate on a sequential stream of records. A program writes records that are read by the next program in the pip ...
*
Hume Hume most commonly refers to: * David Hume (1711–1776), Scottish philosopher Hume may also refer to: People * Hume (surname) * Hume (given name) * James Hume Nisbet (1849–1923), Scottish-born novelist and artist In fiction * Hume, t ...
*
Joule The joule ( , or ; symbol: J) is the unit of energy in the International System of Units (SI). In terms of SI base units, one joule corresponds to one kilogram- metre squared per second squared One joule is equal to the amount of work d ...
* Keysight VEE *
KNIME KNIME (), the Konstanz Information Miner, is a data analytics, reporting and integrating platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" con ...
is a free and open-source data analytics, reporting and integration platform * LabVIEW, G *
Linda Linda is an English feminine given name, derived from the Spanish word , meaning "pretty." Linda may also refer to: Names * Linda (given name), a female given name (including a list of people and fictional characters so named) * Linda (singer) ...
*
Lucid Lucid may refer to: Arts and entertainment * ''Lucid'' (film), a 2005 Canadian film * ''Lucid'' (Lyfe Jennings album), 2013 * ''Lucid'' (Aṣa album), 2019 * "Lucid" (song), a 2020 song by Rina Sawayama * "Lucid", a 2023 song by (G)I-dle from ...
* Lustre *
Max/MSP Max, also known as Max/MSP/Jitter, is a visual programming language for music and multimedia developed and maintained by San Francisco-based software company Cycling '74. Over its more than thirty-year history, it has been used by composers, pe ...
*
Microsoft Visual Programming Language Microsoft Visual Programming Language, or VPL, is a visual programming and dataflow programming language developed by Microsoft for the Microsoft Robotics Studio. VPL is based on the event-driven and data-driven approach. The programming languag ...
- A component of
Microsoft Robotics Studio Microsoft Robotics Developer Studio (Microsoft RDS, MRDS) is a discontinued Windows-based environment for robot control and simulation that was aimed at academic, hobbyist, and commercial developers and handled a wide variety of robot hardware. It ...
designed for
robotics Robotics is the interdisciplinary study and practice of the design, construction, operation, and use of robots. Within mechanical engineering, robotics is the design and construction of the physical structures of robots, while in computer s ...
programming *
Nextflow Nextflow is a scientific workflow system predominantly used for Bioinformatics, bioinformatic data analysis. It establishes standards for programmatically creating a series of dependent computational steps and facilitates their execution on var ...
: a workflow language *
Orange Orange most often refers to: *Orange (fruit), the fruit of the tree species '' Citrus'' × ''sinensis'' ** Orange blossom, its fragrant flower ** Orange juice *Orange (colour), the color of an orange fruit, occurs between red and yellow in the vi ...
- An open-source, visual programming tool for
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
, statistical
data analysis Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
, and
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. * Oz now also distributed since 1.4.0 * Pipeline Pilot *
Prograph Prograph is a Visual programming language, visual, Object-oriented programming, object-oriented, dataflow programming, dataflow, Multi-paradigm programming language, multiparadigm programming language that uses iconic symbols to represent action ...
*
Pure Data Pure Data (Pd) is a visual programming language developed by Miller Puckette in the 1990s for creating interactive computer music and multimedia works. While Puckette is the main author of the program, Pd is an open-source software, open-source ...
*
Quartz Composer Quartz Composer is a node graph system provided as part of the Xcode development environment in macOS for processing and rendering graphical data. It is capable of making sophisticated animations for keynote or presentations and creating ani ...
- Designed by
Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
; used for graphic animations and effects * SAC Single assignment C *
SIGNAL A signal is both the process and the result of transmission of data over some media accomplished by embedding some variation. Signals are important in multiple subject fields including signal processing, information theory and biology. In ...
(a dataflow-oriented synchronous language enabling multi-clock specifications) *
Simulink Simulink is a MATLAB-based graphical programming environment for modeling, simulating and analyzing multidomain dynamical systems. Its primary interface is a graphical block diagramming tool and a customizable set of block libraries. It offe ...
*
SISAL Sisal (, ; ''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The sisal fiber is ...
*
SystemVerilog SystemVerilog, standardized as IEEE 1800 by the Institute of Electrical and Electronics Engineers (IEEE), is a hardware description and hardware verification language commonly used to model, design, simulate, test and implement electronic sy ...
- A hardware description language *
Verilog Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits, with the highest level of abstraction being at the re ...
- A hardware description language absorbed into the SystemVerilog standard in 2009 *
VisSim VisSim is a visual block diagram program for the simulation of dynamical systems and model-based design of embedded systems, with its own visual language. It is developed by Visual Solutions of Westford, Massachusetts. Visual Solutions was acqui ...
- A block diagram language for simulation of dynamic systems and automatic firmware generation *
VHDL VHDL (Very High Speed Integrated Circuit Program, VHSIC Hardware Description Language) is a hardware description language that can model the behavior and structure of Digital electronics, digital systems at multiple levels of abstraction, ran ...
- A hardware description language *Wapice IOT-TICKET implements an unnamed visual dataflow programming language for IoT data analysis and reporting. * XEE (Starlight) XML engineering environment *
XProc XProc is an XML transformation language for processing documents in pipelines: chaining conversions and other steps together to achieve the desired results. It can handle documents in XML, HTML, JSON, text and binary. The current (stable) versi ...


Libraries

*
Apache Beam Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. Beam Pipelines are defined using one of the provided SDKs and executed in one of t ...
: Java/Scala SDK that unifies streaming (and batch) processing with several execution engines supported (Apache Spark, Apache Flink, Google Dataflow etc.) *
Apache Flink Apache Flink is an Open-source software, open-source, unified stream processing, stream-processing and batch processing, batch-processing software framework, framework developed by the Apache Software Foundation. The core of Apache Flink is a dis ...
: Java/Scala library that allows streaming (and batch) computations to be run atop a distributed Hadoop (or other) cluster *
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
*
SystemC SystemC is a set of C++ classes and macros which provide an event-driven simulation interface (see also discrete event simulation). These facilities enable a designer to ''simulate'' concurrent processes, each described using plain C++ synta ...
: Library for C++, mainly aimed at hardware design. *
TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...
: A machine-learning library based on dataflow programming.


See also

*
Actor model The actor model in computer science is a mathematical model of concurrent computation that treats an ''actor'' as the basic building block of concurrent computation. In response to a message it receives, an actor can: make local decisions, create ...
*
Data-driven programming In computer programming, data-driven programming is a programming paradigm in which the program statements describe the data to be matched and the processing required rather than defining a sequence of steps to be taken. Standard examples of dat ...
*
Digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
*
Event-driven programming In computer programming, event-driven programming is a programming paradigm in which the Control flow, flow of the program is determined by external Event (computing), events. User interface, UI events from computer mouse, mice, computer keyboard, ...
*
Flow-based programming In computer programming, flow-based programming (FBP) is a programming paradigm that defines application software, applications as networks of black box process (computer science), processes, which exchange data across predefined connections by mes ...
*
Functional reactive programming Functional reactive programming (FRP) is a programming paradigm for reactive programming (asynchronous dataflow programming) using the building blocks of functional programming (e.g., map, reduce, filter). FRP has been used for programming graph ...
*
Glossary of reconfigurable computing This is a glossary of terms used in the field of Reconfigurable computing and reconfigurable computing systems, as opposed to the traditional Von Neumann architecture. ...
* High-performance reconfigurable computing *
Incremental computing Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is su ...
*
Parallel programming model In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its ''generalit ...
*
Partitioned global address space In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion ...
*
Pipeline (Unix) In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of process (computing), processes chained together by their standard streams, so that the output text of ...
*
Quantum circuit In quantum information theory, a quantum circuit is a model for quantum computation, similar to classical circuits, in which a computation is a sequence of quantum gates, measurements, initializations of qubits to known values, and possibly o ...
*
Signal programming SIGNAL is a programming language based on synchronized dataflow (flows + synchronization): a process is a set of equations on elementary flows describing both data and control. The SIGNAL formal model provides the capability to describe systems ...
*
Stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views Stream (computing), streams, or sequences of events in time, as the centr ...
* Yahoo Pipes


References


External links


Book: Dataflow and Reactive Programming SystemsBasics of Dataflow Programming in F# and C#

Dataflow Programming - Concept, Languages and Applications

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

Handling huge loads without adding complexity
The basic concepts of dataflow programming, Dr. Dobb's, Sept. 2011 {{Types of programming languages Concurrent programming languages Programming paradigms