In
software
Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work.
...
, an XML pipeline is formed when
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
(Extensible Markup Language) processes, especially
XML transformations and
XML validation
XML validation is the process of checking a document written in XML (eXtensible Markup Language) to confirm that it is both well-formed and also "valid" in that it follows a defined structure. A well-formed document follows the basic syntactic rul ...
s, are connected.
For instance, given two transformations T
1 and T
2, the two can be connected so that an input XML document is transformed by T
1 and then the output of T
1 is fed as input document to T
2. Simple pipelines like the one described above are called ''linear''; a single input document always goes through the same sequence of transformations to produce a single output document.
Linear operations
Linear operations can be divided in at least two parts
Micro-operations
They operate at the inner document level
* Rename - renames elements or attributes without modifying the content
* Replace - replaces elements or attributes
* Insert - adds a new data element to the output stream at a specified point
* Delete - removes an element or attribute (also known as pruning the input tree)
* Wrap - wraps elements with additional elements
* Reorder - changes the order of elements
Document operations
They take the input document as a whole
*
Identity transform
The identity transform is a data transformation that copies the source data into the destination data without change.
The identity transformation is considered an essential process in creating a reusable transformation library. By creating a li ...
- makes a verbatim copy of its input to the output
* Compare - it takes two documents and compare them
* Transform - execute a transform on the input file using a specified
XSLT
XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subse ...
file. Version 1.0 or 2.0 should be specified.
* Split - take a single XML document and split it into distinct documents
Sequence operations
They are mainly introduced in
XProc
XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines.
Below is an example abbreviated XProc file:
This is a pipeline that consists of two atomic steps, XInclude and Val ...
and help to handle the sequence of document as a whole
* Count - it takes a sequence of documents and counts them
*
Identity transform
The identity transform is a data transformation that copies the source data into the destination data without change.
The identity transformation is considered an essential process in creating a reusable transformation library. By creating a li ...
- makes a verbatim copy of its input sequence of documents to the output
* split-sequence - takes a sequence of documents as input and routes them to different outputs depending on matching rules
* wrap-sequence - takes a sequence of documents as input and wraps them into one or more documents
Non-linear
Non-linear operations on pipelines may include:
* Conditionals — where a given transformation is executed if a condition is met while another transformation is executed otherwise
* Loops — where a transformation is executed on each node of a node set selected from a document or a transformation is executed until a condition evaluates to false
* Tees — where a document is fed to multiple transformations potentially happening in parallel
* Aggregations — where multiple documents are aggregated into a single document
* Exception Handling — where failures in processing can result in an alternate pipeline being processed
Some standards also categorize transformation as macro (changes impacting an entire file) or micro (impacting only an element or attribute)
XML pipeline languages
XML pipeline languages are used to define pipelines. A program written with an XML pipeline language is implemented by software known as an XML pipeline engine, which creates processes, connects them together and finally executes the pipeline. Existing XML pipeline languages include:
Standards
*
XProc: An XML Pipeline Language is a W3C Recommendation for defining linear and non-linear XML pipelines.
Product-specific
* W3C XML Pipeline Definition Language is specified in a W3C Note.
* W3C XML Pipeline Language (XPL) Version 1.0 (Draft) is specified in a W3C Submission and a component of Orbeon Presentation Server OPS (now called Orbeon Forms). This specification provides an implementation of an earlier version of the language. XPL allows the declaration of complex pipelines with conditionals, loops, tees, aggregations, and sub-pipelines. XProc is roughly a superset of XPL.
*
Cocoon
Cocoon may refer to:
*Cocoon (silk), a pupal casing made by moth caterpillars and other insect larvae
*Apache Cocoon, web development software
* ''Cocoon'' (film), a 1985 science fiction-fantasy film
**'' Cocoon: The Return'', 1988 sequel to ''Coco ...
sitemaps allow, among other functionality, the declaration of XML pipelines. Cocoon sitemaps are one of the earliest implementations of the concept of XML pipeline.
* smallx XML Pipelines are used by the smallx project.
* ServingXML defines a vocabulary for expressing flat-XML, XML-flat, flat-flat, and XML-XML transformations in pipelines.
* '
PolarLake Circuit Markup Language'' used by PolarLake's runtime to defin
XML pipelines Circuits are collections of paths through which fragments of XML stream (usually as SAX or DOM events). Components are placed on paths to interact with the stream (and/or the outside world) in a low latency process.
* xmlsh is a scripting language based on the unix shells which natively supports xml and text pipeline
*
Stylus Studio
Stylus Studio is an integrated development environment (IDE) for the Extensible Markup Language (XML). It consists of a variety of tools and visual designers to edit and transform XML documents and legacy data such as electronic data interchange ...
XML Pipeline is a visual grammar which defines the following operations: Input, Output, XQuery, XSLT, Validate, XSL-FO to PDF, Convert To XML, Convert From XML, Choose, Warning, Stop.
Pipe granularity
Different XML Pipeline implementations support different granularity of flow.
* Document: Whole documents flow through the pipe as atomic units. A document can only be in one place at a time. Though usually multiple documents may be in the pipe at once.
* Event: Element/Text nodes events may flow through different paths. A document may be concurrently flowing through many components at the same time.
Standardization
Until May 2010, there was no widely used standard for XML pipeline languages. However, with the introduction of the
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working t ...
XProc standard as a
W3C Recommendation
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working ...
as of May 2010, widespread adoption can be expected.
History
* 1972
Douglas McIlroy
Malcolm Douglas McIlroy (born 1932) is a mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College.
McIlroy is best known for having originally proposed Unix pipelines and developed se ...
of
Bell Laboratories
Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984),
then AT&T Bell Laboratories (1984–1996)
and Bell Labs Innovations (1996–2007),
is an American industrial research and scientific development company owned by mul ...
adds the pipe operator to the
UNIX
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
command shell. This allows the output from one shell program to go directly into input of another shell program without going to disk. This allowed programs such as the UNIX
awk
AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.
The AWK l ...
and
sed
sed ("stream editor") is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs,
and is available today for most operating systems.
sed w ...
to be specialized yet work togethe
For more details see
Pipeline (Unix)
In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of processes chained together by their standard streams, so that the output text of each process ('' s ...
.
* 199
Sean McGrathdeveloped a C++ toolkit for
SGML
The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
* Declarative: Markup should d ...
processing.
* 1998
Stefano Mazzocchi
Stefano is the Italian form of the masculine given name Στέφανος (Stefanos, Stephen). The name is of Greek origin, Στέφανος, meaning a person who made a significant achievement and has been crowned. In Orthodox Christianity the a ...
releases the first version of
Apache Cocoon
Apache Cocoon, usually abbreviated as Cocoon, is a web application framework built around the concepts of Pipeline, separation of concerns, and component-based web development. The framework focuses on XML and XSLT publishing and is built using ...
, one of the first software programs to use XML pipelines.
* 199
PolarLakebuil
XML Operating System which include
XML Pipelining
* 2002 Notes submitted by Norman Walsh and
Eve Maler
Eve (; ; ar, حَوَّاء, Ḥawwāʾ; el, Εὕα, Heúa; la, Eva, Heva; Syriac: romanized: ) is a figure in the Book of Genesis in the Hebrew Bible. According to the origin story, "Creation myths are symbolic stories describing how the ...
from
Sun Microsystems
Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, ...
, as well as a W3C Submission submitted in 2005 by
Erik Bruchez and
Alessandro Vernet
Alessandro is both a given name and a surname, the Italian form of the name Alexander. Notable people with the name include:
People with the given name Alessandro
* Alessandro Allori (1535–1607), Italian portrait painter
* Alessandro Bari ...
from
Orbeon, were important steps toward spawning an actual standardization effort. While neither submission directly became a W3C recommendation, they were considered key sources of inspiration for the W3C XML Processing
Working Group
A working group, or working party, is a group of experts working together to achieve specified goals. The groups are domain-specific and focus on discussion or activity around a specific subject area. The term can sometimes refer to an interdis ...
.
* September 2005 W3C XML Processing
Working Group
A working group, or working party, is a group of experts working together to achieve specified goals. The groups are domain-specific and focus on discussion or activity around a specific subject area. The term can sometimes refer to an interdis ...
started. The task of this working group was to create a specification for an XML pipelining language.
* August 2008
xmlsh an XML pipeline language was announced a
See also
*
Apache Cocoon
Apache Cocoon, usually abbreviated as Cocoon, is a web application framework built around the concepts of Pipeline, separation of concerns, and component-based web development. The framework focuses on XML and XSLT publishing and is built using ...
*
Identity transform
The identity transform is a data transformation that copies the source data into the destination data without change.
The identity transformation is considered an essential process in creating a reusable transformation library. By creating a li ...
*
NetKernel
NetKernel is a British software company and software platform by the same name that is used for High Performance Computing, Enterprise Application Integration, and Energy Efficient Computation.
It allows developers to cleanly separate code from ar ...
*
Pipeline (Unix)
In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of processes chained together by their standard streams, so that the output text of each process ('' s ...
*
W3C recommendation
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working ...
*
XSLT
XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subse ...
References
External links
Standards
Recommendations
XProc: An XML Pipeline Language W3C Recommendation 11 May 2010
Working drafts
W3C XML Processing Model Working GroupW3C XML Pipeline Definition Language NoteW3C XML Pipeline Language (XPL) Version 1.0 (Draft) Submission
Product specific
XProc tutorial and reference''Part of XML Developer's kit, no individual download''
Managing Complex Document Generation through PipeliningXML Pipeline Language (XPL) DocumentationSXPipePolarLake Reference data managementPolarLake XML circuits and reference data management
smallxServingXML- This program allows XML transforms to be chained together along with other operations on XML files such as validation and
HTML Tidy
HTML Tidy is a console application for correcting invalid HyperText Markup Language (HTML), detecting potential web accessibility errors, and for improving the layout and indent style of the resulting markup. It is also a cross-platform library ...
.
IVI XML Pipeline ServerXML Pipeline Server is an implementation for the Stylus Studio XML Pipeline language
Norman Walsh's XProc web site- Norman Walsh is the chair of the W3C XProc standards committee.
yax - an XProc Implementationcurrently with commandline and Apache ant interface
Yahoo! Pipeslet's users create multi-source data mashups in a web-based visual environment
xmlshA shell for manipulating xml based on the unix shells. Supports in-process multithreaded xml and text processing pipelines.
How to implement XML Pipeline in XSLT
Calabash is an implementation of XProcCalumetis an XProc implementation from EMC
QuiXProcis an XProc implementation of Innovimax
{{DEFAULTSORT:Xml Pipeline
XML-based standards
Inter-process communication