HOME

TheInfoList



OR:

In
Unix-like A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
computer
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
s, a pipeline is a mechanism for
inter-process communication In computer science, interprocess communication (IPC) is the sharing of data between running Process (computing), processes in a computer system. Mechanisms for IPC may be provided by an operating system. Applications which use IPC are often cat ...
using message passing. A pipeline is a set of
process A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic. Things called a process include: Business and management * Business process, activities that produce a specific s ...
es chained together by their
standard streams In computer programming, standard streams are preconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin), ...
, so that the output text of each process ('' stdout'') is passed directly as input (''
stdin In computer programming, standard streams are preconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin), ...
'') to the next one. The second process is started as the first process is still executing, and they are executed concurrently. The concept of pipelines was championed by
Douglas McIlroy Malcolm Douglas McIlroy (born 1932) is an American mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and de ...
at
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
's ancestral home of
Bell Labs Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, Murray Hill, New Jersey, the compa ...
, during the development of Unix, shaping its toolbox philosophy. It is named by analogy to a physical
pipeline A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...
. A key feature of these pipelines is their "hiding of internals". This in turn allows for more clarity and simplicity in the system. The pipes in the pipeline are
anonymous pipe In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a parent ...
s (as opposed to named pipes), where data written by one process is buffered by the operating system until it is read by the next process, and this uni-directional channel disappears when the processes are completed. The standard
shell Shell may refer to: Architecture and design * Shell (structure), a thin structure ** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses Science Biology * Seashell, a hard outer layer of a marine ani ...
syntax for
anonymous pipe In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a parent ...
s is to list multiple commands, separated by vertical bars ("
pipes Pipe(s), PIPE(S) or piping may refer to: Objects * Pipe (fluid conveyance), a hollow cylinder following certain dimension rules ** Piping, the use of pipes in industry * Smoking pipe ** Tobacco pipe * Half-pipe and quarter pipe, semi-circu ...
" in common Unix verbiage).


History

The pipeline concept was invented by
Douglas McIlroy Malcolm Douglas McIlroy (born 1932) is an American mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and de ...
and first described in the
man pages A man page (short for manual page) is a form of software documentation found on Unix and Unix-like operating systems. Topics covered include programs, system libraries, system calls, and sometimes local system details. The local host administ ...
of
Version 3 Unix Research Unix refers to the early versions of the Unix operating system for DEC PDP-7, PDP-11, VAX and Interdata 7/32 and 8/32 computers, developed in the Bell Labs Computing Sciences Research Center (CSRC). The term ''Research Unix'' first appe ...
. McIlroy noticed that much of the time command shells passed the output file from one program as input to another. The concept of pipelines was championed by
Douglas McIlroy Malcolm Douglas McIlroy (born 1932) is an American mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and de ...
at
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
's ancestral home of
Bell Labs Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, Murray Hill, New Jersey, the compa ...
, during the development of Unix, shaping its toolbox philosophy. His ideas were implemented in 1973 when ("in one feverish night", wrote McIlroy)
Ken Thompson Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B (programmi ...
added the pipe() system call and pipes to the
shell Shell may refer to: Architecture and design * Shell (structure), a thin structure ** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses Science Biology * Seashell, a hard outer layer of a marine ani ...
and several utilities in Version 3 Unix. "The next day", McIlroy continued, "saw an unforgettable orgy of one-liners as everybody joined in the excitement of plumbing." McIlroy also credits Thompson with the , notation, which greatly simplified the description of pipe syntax in Version 4. Although developed independently, Unix pipes are related to, and were preceded by, the 'communication files' developed by Ken Lochner in the 1960s for the
Dartmouth Time-Sharing System The Dartmouth Time-Sharing System (DTSS) is a discontinued operating system first developed at Dartmouth College between 1963 and 1964. It was the first successful large-scale time-sharing system to be implemented, and was also the system for wh ...
.


Other operating systems

This feature of
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
was borrowed by other operating systems, such as
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few op ...
and the CMS Pipelines package on VM/CMS and MVS, and eventually came to be designated the pipes and filters design pattern of
software engineering Software engineering is a branch of both computer science and engineering focused on designing, developing, testing, and maintaining Application software, software applications. It involves applying engineering design process, engineering principl ...
.


Further concept development

In Tony Hoare's
communicating sequential processes In computer science, communicating sequential processes (CSP) is a formal language for describing patterns of interaction in concurrent systems. It is a member of the family of mathematical theories of concurrency known as process algebras, or p ...
(CSP), McIlroy's pipes are further developed.


Implementation

A pipeline mechanism is used for
inter-process communication In computer science, interprocess communication (IPC) is the sharing of data between running Process (computing), processes in a computer system. Mechanisms for IPC may be provided by an operating system. Applications which use IPC are often cat ...
using message passing. A pipeline is a set of
process A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic. Things called a process include: Business and management * Business process, activities that produce a specific s ...
es chained together by their
standard streams In computer programming, standard streams are preconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin), ...
, so that the output text of each process ('' stdout'') is passed directly as input (''
stdin In computer programming, standard streams are preconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin), ...
'') to the next one. The second process is started as the first process is still executing, and they are executed concurrently. It is named by analogy to a physical
pipeline A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...
. A key feature of these pipelines is their "hiding of internals". This in turn allows for more clarity and simplicity in the system. In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected, and managed by the scheduler together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of buffering: for example a sending program may produce 5000
bytes The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
per
second The second (symbol: s) is a unit of time derived from the division of the day first into 24 hours, then to 60 minutes, and finally to 60 seconds each (24 × 60 × 60 = 86400). The current and formal definition in the International System of U ...
, and a receiving program may only be able to accept 100 bytes per second, but no data is lost. Instead, the output of the sending program is held in the buffer. When the receiving program is ready to read data, the next program in the pipeline reads from the buffer. If the buffer is filled, the sending program is stopped (blocked) until at least some data is removed from the buffer by the receiver. In Linux, the size of the buffer is 65,536 bytes (64KiB). An open source third-party filter calle
bfr
is available to provide larger buffers if required.


Network pipes

Tools like netcat and socat can connect pipes to TCP/IP sockets.


Pipelines in command line interfaces

All widely used Unix shells have a special syntax construct for the creation of pipelines. In all usage one writes the commands in sequence, separated by the
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
vertical bar The vertical bar, , is a glyph with various uses in mathematics, computing, and typography. It has many names, often related to particular meanings: Sheffer stroke (in logic), pipe, bar, or (literally, the word "or"), vbar, and others. Usage ...
character , (which, for this reason, is often called "pipe character"). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of buffer storage). The pipeline uses
anonymous pipe In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a parent ...
s. For anonymous pipes, data written by one process is buffered by the operating system until it is read by the next process, and this uni-directional channel disappears when the processes are completed; this differs from named pipes, where messages are passed to or from a pipe that is named by making it a file, and remains after the processes are completed. The standard
shell Shell may refer to: Architecture and design * Shell (structure), a thin structure ** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses Science Biology * Seashell, a hard outer layer of a marine ani ...
syntax for
anonymous pipe In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a parent ...
s is to list multiple commands, separated by
vertical bar The vertical bar, , is a glyph with various uses in mathematics, computing, and typography. It has many names, often related to particular meanings: Sheffer stroke (in logic), pipe, bar, or (literally, the word "or"), vbar, and others. Usage ...
s ("pipes" in common Unix verbiage): command1 , command2 , command3 For example, to list files in the current directory (), retain only the lines of output containing the string (), and view the result in a scrolling page (), a user types the following into the command line of a terminal: ls -l , grep key , less The command ls -l is executed as a process, the output (stdout) of which is piped to the input (stdin) of the process for grep key; and likewise for the process for less. Each
process A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic. Things called a process include: Business and management * Business process, activities that produce a specific s ...
takes input from the previous process and produces output for the next process via ''
standard streams In computer programming, standard streams are preconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin), ...
''. Each , tells the shell to connect the standard output of the command on the left to the standard input of the command on the right by an
inter-process communication In computer science, interprocess communication (IPC) is the sharing of data between running Process (computing), processes in a computer system. Mechanisms for IPC may be provided by an operating system. Applications which use IPC are often cat ...
mechanism called an (anonymous) pipe, implemented in the operating system. Pipes are unidirectional; data flows through the pipeline from left to right.


Example

Below is an example of a pipeline that implements a kind of
spell checker In software, a spell checker (or spelling checker or spell check) is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic ...
for the
web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
resource indicated by a
URL A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identi ...
. An explanation of what it does follows. curl 'https://en.wikipedia.org/wiki/Pipeline_(Unix)' , sed 's/ a-zA-Z /g' , tr 'A-Z ' 'a-z\n' , grep ' -z , sort -u , comm -23 - <(sort /usr/share/dict/words) , less # curl obtains the
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
contents of a web page (could use wget on some systems). # sed replaces all characters (from the web page's content) that are not spaces or letters, with spaces. (
Newline A newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or ...
s are preserved.) # tr changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line). #
grep grep is a command-line utility for searching plaintext datasets for lines that match a regular expression. Its name comes from the ed command g/re/p (global regular expression search and print), which has the same effect. grep was originally de ...
includes only lines that contain at least one lowercase
alphabetical Alphabetical order is a system whereby character strings are placed in order based on the position of the characters in the conventional ordering of an alphabet. It is one of the methods of collation. In mathematics, a lexicographical order is ...
character (removing any blank lines). # sort sorts the list of 'words' into alphabetical order, and the -u switch removes duplicates. # comm finds lines in common between two files, -23 suppresses lines unique to the second file, and those that are common to both, leaving only those that are found only in the first file named. The - in place of a filename causes comm to use its standard input (from the pipe line in this case). sort /usr/share/dict/words sorts the contents of the words file alphabetically, as comm expects, and <( ... ) outputs the results to a temporary file (via process substitution), which comm reads. The result is a list of words (lines) that are not found in /usr/share/dict/words. # less allows the user to page through the results.


Error stream

By default, the standard error streams (" stderr") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to the
console Console may refer to: Computing and video games * System console, a physical device to operate a computer ** Virtual console, a user interface for multiple computer consoles on one device ** Command-line interface, a method of interacting with ...
. However, many shells have additional syntax for changing this behavior. In the csh shell, for instance, using , & instead of , signifies that the standard error stream should also be merged with the standard output and fed to the next process. The Bash shell can also merge standard error with , & since version 4.0 or using 2>&1, as well as redirect it to a different file.


Pipemill

In the most commonly used simple pipelines the shell connects a series of sub-processes via pipes, and executes external commands within each sub-process. Thus the shell itself is doing no direct processing of the data flowing through the pipeline. However, it's possible for the shell to perform processing directly, using a so-called mill or pipemill (since a while command is used to "mill" over the results from the initial command). This construct generally looks something like: command , while read -r var1 var2 ...; do # process each line, using variables as parsed into var1, var2, etc # (note that this may be a subshell: var1, var2 etc will not be available # after the while loop terminates; some shells, such as zsh and newer # versions of Korn shell, process the commands to the left of the pipe # operator in a subshell) done Such pipemill may not perform as intended if the body of the loop includes commands, such as cat and ssh, that read from
stdin In computer programming, standard streams are preconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin), ...
: on the loop's first iteration, such a program (let's call it ''the drain'') will read the remaining output from command, and the loop will then terminate (with results depending on the specifics of the drain). There are a couple of possible ways to avoid this behavior. First, some drains support an option to disable reading from stdin (e.g. ssh -n). Alternatively, if the drain does not ''need'' to read any input from stdin to do something useful, it can be given < /dev/null as input. As all components of a pipe are run in parallel, a shell typically forks a subprocess (a subshell) to handle its contents, making it impossible to propagate variable changes to the outside shell environment. To remedy this issue, the "pipemill" can instead be fed from a here document containing a
command substitution In computing, command substitution is a facility that allows a Command-line interpreter, command to be run and its output to be pasted back on the command line as arguments to another command. Command substitution first appeared in the Bourne she ...
, which waits for the pipeline to finish running before milling through the contents. Alternatively, a named pipe or a process substitution can be used for parallel execution.
GNU bash In computing, Bash (short for "''Bourne Again SHell''") is an interactive command interpreter and command programming language developed for UNIX-like operating systems. Created in 1989 by Brian Fox for the GNU Project, it is supported by the Fre ...
also has a option to disable forking for the last pipe component.


Creating pipelines programmatically

Pipelines can be created under program control. The Unix pipe()
system call In computing, a system call (syscall) is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services (for example, accessing a hard disk drive ...
asks the operating system to construct a new
anonymous pipe In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a parent ...
object. This results in two new, opened file descriptors in the process: the read-only end of the pipe, and the write-only end. The pipe ends appear to be normal, anonymous
file descriptor In Unix and Unix-like computer operating systems, a file descriptor (FD, less frequently fildes) is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket. File descriptors typically h ...
s, except that they have no ability to seek. To avoid deadlock and exploit parallelism, the Unix process with one or more new pipes will then, generally, call fork() to create new processes. Each process will then close the end(s) of the pipe that it will not be using before producing or consuming any data. Alternatively, a process might create new threads and use the pipe to communicate between them. '' Named pipes'' may also be created using mkfifo() or
mknod In Unix-like operating systems, a device file, device node, or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These spec ...
()
and then presented as the input or output file to programs as they are invoked. They allow multi-path pipes to be created, and are especially effective when combined with standard error redirection, or with tee.


Popular culture

The robot in the icon for
Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
's Automator, which also uses a pipeline concept to chain repetitive commands together, holds a pipe in homage to the original Unix concept.


See also

*
Everything is a file "Everything is a file" is an approach to interface design in Unix derivatives. While this turn of phrase does not as such figure as a Unix design principle or philosophy, it is a common way to analyse designs, and informs the design of new interfa ...
– describes one of the defining features of Unix; pipelines act on "files" in the Unix sense *
Anonymous pipe In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a parent ...
– a FIFO structure used for interprocess communication *
GStreamer GStreamer is a Pipeline (computing), pipeline-based multimedia framework that links together a wide variety of media processing systems to complete complex workflows. For instance, GStreamer can be used to build a system that reads files in one f ...
– a pipeline-based multimedia framework * CMS Pipelines * Iteratee * Named pipe – persistent pipes used for interprocess communication * Process substitution — shell syntax for connecting multiple pipes to a process * GNU parallel *
Pipeline (computing) In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time- ...
– other computer-related pipelines *
Redirection (computing) In computing, redirection is a form of interprocess communication, and is a function common to most command-line interpreters, including the various Unix shells that can redirect standard streams to user-specified locations. The concept of red ...
*
Tee (command) tee is shell command that copies data from standard input to one or more files in addition to standard output; duplicating the input to each output. The name derives from the tee pipe fitting even though the command duplicates the input into ...
– a general command for tapping data from a pipeline * XML pipeline – for processing of XML files * xargs


References

* Sal Soghoian on MacBreak Episode 3 "Enter the Automatrix"


External links


History of Unix pipe notation
*
Doug McIlroy's original 1964 memo
proposing the concept of a pipe for the first time * {{man, sh, pipe, SUS, create an interprocess channel

by The Linux Information Project (LINFO)
Unix Pipes – powerful and elegant programming paradigm (Softpanorama)

''Ad Hoc Data Analysis From The Unix Command Line'' at Wikibooks
– Shows how to use pipelines composed of simple filters to do complex data analysis.
Use And Abuse Of Pipes With Audio Data
– Gives an introduction to using and abusing pipes with netcat, nettee and fifos to play audio across a network.
stackoverflow.com
– A Q&A about bash pipeline handling. Inter-process communication Unix sv:Vertikalstreck#Datavetenskap