YAML ( ) is a
human-readable data serialization language. It is commonly used for
configuration file
A configuration file, a.k.a. config file, is a computer file, file that stores computer data, data used to configure a software system such as an application software, application, a server (computing), server or an operating system.
Some applic ...
s and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as
Extensible Markup Language
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The Wor ...
(XML) but has a minimal syntax that intentionally differs from
Standard Generalized Markup Language
The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
* Declarative: Markup should de ...
(SGML).
It uses
Python-style indentation to indicate nesting
and does not require quotes around most string values (it also supports
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
style and
mixed in the same file).
Custom data types are allowed, but YAML natively encodes
scalars
Scalar may refer to:
*Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers
*Scalar (physics), a physical quantity that can be described by a single element of a number field such a ...
(such as
strings,
integers
An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
, and
floats),
lists, and
associative arrays
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
(also known as maps, dictionaries or hashes). These data types are based on the
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
programming language, though all commonly used high-level programming languages share very similar concepts. The colon-centered syntax, used for expressing
key-value pairs, is inspired by
electronic mail
Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...
headers as defined in , and the
document separator is borrowed from
MIME
A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...
().
Escape sequences are reused from
C, and whitespace wrapping for multi-line strings is inspired by
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
. Lists and hashes can contain nested lists and hashes, forming a
tree structure
A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is gen ...
; arbitrary
graphs can be represented using YAML aliases (similar to XML in
SOAP
Soap is a salt (chemistry), salt of a fatty acid (sometimes other carboxylic acids) used for cleaning and lubricating products as well as other applications. In a domestic setting, soaps, specifically "toilet soaps", are surfactants usually u ...
).
YAML is intended to be read and written in streams, a feature inspired by
SAX.
Support for reading and writing YAML is available for many programming languages. Some source-code editors such as
Vim,
Emacs
Emacs (), originally named EMACS (an acronym for "Editor Macros"), is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, s ...
, and various
integrated development environment
An integrated development environment (IDE) is a Application software, software application that provides comprehensive facilities for software development. An IDE normally consists of at least a source-code editor, build automation tools, an ...
s have features that make editing YAML easier, such as folding up nested structures or automatically highlighting syntax errors.
The official recommended
filename extension
A filename extension, file name extension or file extension is a suffix to the name of a computer file (for example, .txt, .mp3, .exe) that indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
for YAML files has been since 2006. In 2024, the
MIME type has been finalized.
History and name
YAML (, rhymes with ''camel''
) was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net
and Oren Ben-Kiki.
Originally YAML was said to mean ''Yet Another Markup Language'',
because it was released in an era that saw a proliferation of markup languages for presentation and connectivity (HTML, XML, SGML, etc.). Its initial name was intended as a
tongue-in-cheek
Tongue-in-cheek is an idiom that describes a humorous or sarcastic statement expressed in a serious manner.
History
The phrase originally expressed contempt, but by 1842 had acquired its modern meaning. Early users of the phrase include Sir Walte ...
reference
to the technology landscape, referencing its purpose as a
markup language
A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
with the
yet another
A naming convention as a form of computer humour especially among playful programmers, yet another is often abbreviated ya, Ya, or YA in the prefix of an acronym or backronym.
This humorous prefix is an idiomatic qualifier in the name of a compu ...
construct, but it was then repurposed as ''YAML Ain't Markup Language'', a
recursive acronym
A recursive acronym is an acronym that refers to itself, and appears most frequently in computer programming. The term was first used in print in 1979 in Douglas Hofstadter's book '' Gödel, Escher, Bach: An Eternal Golden Braid'', in which Hofs ...
, to distinguish its purpose as data-oriented, rather than document markup.
Versions
Design
Syntax
A cheat sheet and full specification are available at the official site. The following is a synopsis of the basic elements.
YAML accepts the entire Unicode character set, except for some
control character
In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character encoding, character set that does not represent a written Character (computing), character or symbol. They are used as in-ba ...
s, and may be encoded in any one of
UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8.
UTF-8 supports all 1,112,0 ...
,
UTF-16
UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two ''code units''. UTF-16 arose from an earli ...
or
UTF-32
UTF-32 (32- bit Unicode Transformation Format), sometimes called UCS-4, is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far ...
. (Though UTF-32 is not mandatory, it is required for a parser to have
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
compatibility.)
*
Whitespace
White space or whitespace may refer to:
Technology
* Whitespace characters, characters in computing that represent horizontal or vertical space
* White spaces (radio), allocated but locally unused radio frequencies
* TV White Space Database, a m ...
indentation
__FORCETOC__
In the written form of many languages, indentation describes empty space ( white space) used before or around text to signify an important aspect of the text such as:
* Beginning of a paragraph
* Hierarchy subordinate concept
* Qu ...
is used for denoting structure; however,
tab characters are not allowed as part of that indentation.
*
Comments begin with the
number sign
The symbol is known as the number sign, hash, (or in North America) the pound sign. The symbol has historically been used for a wide range of purposes including the designation of an ordinal number and as a Typographic ligature, ligatured abbre ...
(), can start anywhere on a line and continue until the end of the line. Comments must be separated from other tokens by whitespace characters. If characters appear inside of a string, then they are number sign () literals.
* List members are denoted by a leading
hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation.
The hyphen is sometimes confused with dashes (en dash , em dash and others), which are wider, or with t ...
() with one member per line.
** A list can also be specified by enclosing text in
square brackets () with each entry separated by a
comma
The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
.
* An
associative array
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
entry is represented using
colon space
Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
in the form ''key: value'' with one entry per line. YAML requires the colon be followed by a space so that url-style strings like can be represented without needing to be enclosed in quotes.
** A
question mark
The question mark (also known as interrogation point, query, or eroteme in journalism) is a punctuation, punctuation mark that indicates a question or interrogative clause or phrase in many languages.
History
The history of the question mark is ...
can be used in front of a key, in the form "?key: value" to allow the key to contain leading dashes, square brackets, etc., without quotes.
** An associative array can also be specified by text enclosed in
curly braces
A bracket is either of two tall fore- or back-facing punctuation marks commonly used to isolate a segment of text or data from its surroundings. They come in four main pairs of shapes, as given in the box to the right, which also gives their n ...
(), with keys separated from values by colon and the entries separated by commas (spaces are not required to retain compatibility with JSON).
*
String
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
s (one type of scalar in YAML) are ordinarily unquoted, but may be enclosed in
double-quotes (), or
single-quotes ().
** Within double-quotes, special characters may be represented with
C-style escape sequences starting with a
backslash
The backslash is a mark used mainly in computing and mathematics. It is the mirror image of the common slash (punctuation), slash . It is a relatively recent mark, first documented in the 1930s. It is sometimes called a hack, whack, Escape c ...
(). According to the documentation the only octal escape supported is .
** Within single quotes the only supported escape sequence is a doubled single quote () denoting the single quote itself as in .
* Block scalars are delimited with
indentation
__FORCETOC__
In the written form of many languages, indentation describes empty space ( white space) used before or around text to signify an important aspect of the text such as:
* Beginning of a paragraph
* Hierarchy subordinate concept
* Qu ...
with optional modifiers to preserve (
,
) or fold () newlines.
* Multiple documents within a single stream are separated by three
hyphens ().
** Three
periods () optionally end a document within a stream.
* Repeated nodes are initially denoted by an
ampersand
The ampersand, also known as the and sign, is the logogram , representing the grammatical conjunction, conjunction "and". It originated as a typographic ligature, ligature of the letters of the word (Latin for "and").
Etymology
Tradi ...
() and thereafter referenced with an
asterisk
The asterisk ( ), from Late Latin , from Ancient Greek , , "little star", is a Typography, typographical symbol. It is so called because it resembles a conventional image of a star (heraldry), heraldic star.
Computer scientists and Mathematici ...
().
* Nodes may be labeled with a type or tag using a double
exclamation mark
The exclamation mark (also known as exclamation point in American English) is a punctuation mark usually used after an interjection or exclamation to indicate strong feelings or to show wikt:emphasis, emphasis. The exclamation mark often marks ...
() followed by a string, which can be expanded into a URI.
* YAML documents in a stream may be preceded by 'directives' composed of a
percent sign
The percent sign (sometimes per cent sign in British English) is the symbol used to indicate a percentage, a number or ratio as a fraction (mathematics), fraction of 100. Related signs include the permille (per thousand) sign and the Basis p ...
() followed by a name and space-delimited parameters. Two directives are defined in YAML 1.1:
** The %YAML directive is used for identifying the version of YAML in a given document.
** The %TAG directive is used as a shortcut for URI prefixes. These shortcuts may then be used in node type tags.
Basic components
Conventional block format uses a hyphen+space to begin a new item in list.
--- # Favorite movies
- Casablanca
- North by Northwest
- The Man Who Wasn't There
Optional inline format is delimited by comma+space and enclosed in brackets (similar to
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
).
--- # Shopping list
ilk, pumpkin pie, eggs, juice
Keys are separated from values by a colon+space. Indented blocks, common in YAML data files, use indentation and new lines to separate the key/value pairs. Inline blocks, common in YAML data streams, use comma+space to separate the key/value pairs between braces.
--- # Indented Block
name: John Smith
age: 33
--- # Inline Block
Strings do not require quotation marks. There are two ways to write multi-line strings, one preserving newlines (using the
,
character) and one that folds the newlines (using the character), both followed by a newline character.
data: ,
There once was a tall man from Ealing
Who got on a bus to Darjeeling
It said on the door
"Please don't sit on the floor"
So he carefully sat on the ceiling
By default, the leading indentation (of the first line) and trailing whitespace are stripped, though other behavior can be explicitly specified.
data: >
Wrapped text
will be folded
into a single
paragraph
Blank lines denote
paragraph breaks
Folded text converts newlines to spaces and removes leading whitespace.
--- # The Smiths
-
- name: Mary Smith
age: 27
- ame, age ae Smith, 4 # sequences as keys are supported
--- # People, by gender
men: ohn Smith, Bill Joneswomen:
- Mary Smith
- Susan Williams
Objects and lists are important components in yaml and can be mixed. The first example is a list of key-value objects, all people from the Smith family. The second lists them by gender; it is a key-value object containing two lists.
Advanced components
Two features that distinguish YAML from the capabilities of other data-serialization languages are structures
and data typing.
YAML structures enable storage of multiple documents within a single file, usage of references for repeated nodes, and usage of arbitrary nodes as keys.
For clarity, compactness, and avoiding data entry errors, YAML provides node anchors (using ) and references (using ). References to the anchor work for all data types (see the ship-to reference in the example below).
Below is an example of a queue in an instrument sequencer in which two steps are referenced without being fully described.
--- # Sequencer protocols for Laser eye surgery
- step: &id001 # defines anchor label &id001
instrument: Lasik 2000
pulseEnergy: 5.4
pulseDuration: 12
repetition: 1000
spotSize: 1mm
- step: &id002
instrument: Lasik 2000
pulseEnergy: 5.0
pulseDuration: 10
repetition: 500
spotSize: 2mm
- Instrument1: *id001 # refers to the first step (with anchor &id001)
- Instrument2: *id002 # refers to the second step
Explicit data typing is seldom seen in the majority of YAML documents since YAML autodetects simple types. Data types can be divided into three categories: core, defined, and user-defined. Core are ones expected to exist in any parser (e.g. floats, ints, strings, lists, maps, ...). Many more advanced data types, such as binary data, are defined in the YAML specification but not supported in all implementations. Finally YAML defines a way to extend the data type definitions locally to accommodate user-defined classes, structures or primitives (e.g. quad-precision floats).
YAML autodetects the datatype of the entity, but sometimes one wants to cast the datatype explicitly. The most common situation is where a single-word string that looks like a number, Boolean or tag requires disambiguation by surrounding it with quotes or using an explicit datatype tag.
---
a: 123 # an integer
b: "123" # a string, disambiguated by quotes
c: 123.0 # a float
d: !!float 123 # also a float via explicit data type prefixed by (!!)
e: !!str 123 # a string, disambiguated by explicit type
f: !!str Yes # a string via explicit type
g: Yes # a Boolean True (yaml1.1), string "Yes" (yaml1.2)
h: Yes we have No bananas # a string, "Yes" and "No" disambiguated by context.
Not every implementation of YAML has every specification-defined data type. These built-in types use a double-exclamation
sigil prefix (). Particularly interesting ones not shown here are sets, ordered maps, timestamps, and hexadecimal. Here is an example of
base64
In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits ...
-encoded binary data.
---
picture: !!binary ,
R0lGODdhDQAIAIAAAAAAANn
Z2SwAAAAADQAIAAACF4SDGQ
ar3xxbJ9p0qa7R0YxwzaFME
1IAADs=
Many implementations of YAML can support user-defined data types for object serialization. Local data types are not universal data types but are defined in the application using the YAML parser library. Local data types use a single exclamation mark ().
Example
Data-structure hierarchy is maintained by outline indentation.
---
receipt: Oz-Ware Purchase Invoice
date: 2012-08-06
customer:
first_name: Dorothy
family_name: Gale
items:
- part_no: A4786
descrip: Water Bucket (Filled)
price: 1.47
quantity: 4
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
price: 133.7
quantity: 1
bill-to: &id001
street: ,
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
Pay no attention to the
man behind the curtain.
...
Notice that strings do not require enclosure in quotation marks. The specific number of spaces in the indentation is unimportant as long as parallel elements have the same left justification and the hierarchically nested elements are indented further. This sample document defines an associative array with 7 top level keys: one of the keys, "items", contains a 2-element list, each element of which is itself an associative array with differing keys. Relational data and redundancy removal are displayed: the "ship-to" associative array content is copied from the "bill-to" associative array's content as indicated by the anchor () and reference () labels. Optional blank lines can be added for readability. Multiple documents can exist in a single file/stream and are separated by . An optional can be used at the end of a file (useful for signaling an end in streamed communications without closing the pipe).
Features
Indented delimiting
Because YAML primarily relies on outline indentation for structure, it is especially resistant to
delimiter collision
A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts ...
. YAML's insensitivity to quotation marks and braces in scalar values means one may embed XML, JSON or even YAML documents inside a YAML document by simply indenting it in a block literal (using
,
or ):
---
example: >
HTML goes into YAML without modification
message: ,
"Three is always greater than two,
even for large values of two"
--Author Unknown
date: 2007-06-01
YAML may be placed in JSON by quoting and escaping all interior quotation marks. YAML may be placed in XML by escaping reserved characters (, , , , ) and converting whitespace, or by placing it in a
CDATA section.
Non-hierarchical data models
Unlike JSON, which can only represent data in a hierarchical model with each child node having a single parent, YAML also offers a simple relational scheme that allows repeats of identical data to be referenced from two or more points in the tree rather than entered redundantly at those points. This is similar to the facility IDREF built into XML. The YAML parser then expands these references into the fully populated data structures they imply when read in, so whatever program is using the parser does not have to be aware of a relational encoding model, unlike XML processors, which do not expand references. This expansion can enhance readability while reducing data entry errors in configuration files or processing protocols where many parameters remain the same in a sequential series of records while only a few vary. An example being that "ship-to" and "bill-to" records in an invoice are nearly always the same data.
Practical considerations
YAML is line-oriented and thus it is often simple to convert the unstructured output of existing programs into YAML format while having them retain much of the look of the original document. Because there are no closing tags, braces, or quotation marks to balance, it is generally easy to generate well-formed YAML directly from distributed print statements within unsophisticated programs. Likewise, the whitespace delimiters facilitate quick-and-dirty filtering of YAML files using the line-oriented commands in grep, AWK, Perl, Ruby, and Python.
In particular, unlike markup languages, chunks of consecutive YAML lines tend to be well-formed YAML documents themselves. This makes it very easy to write parsers that do not have to process a document in its entirety (e.g. balancing opening and closing tags and navigating quoted and escaped characters) before they begin extracting specific records within. This property is particularly expedient when iterating in a single, stateless pass, over records in a file whose entire data structure is too large to hold in memory, or for which reconstituting the entire structure to extract one item would be prohibitively expensive.
Counterintuitively, although its indented delimiting might seem to complicate deeply nested hierarchies, YAML handles indents as small as a single space, and this may achieve better compression than markup languages. Additionally, extremely deep indentation can be avoided entirely by either: 1) reverting to "inline style" (i.e. JSON-like format) without the indentation; or 2) using relational anchors to unwind the hierarchy to a flat form that the YAML parser will transparently reconstitute into the full data structure.
Security
YAML is purely a data-representation language and thus has no executable commands. While
validation and
safe parsing is inherently possible in any data language, implementation is such a notorious pitfall that YAML's lack of an associated command language may be a relative security benefit.
However, YAML allows language-specific tags so that arbitrary local objects can be created by a parser that supports those tags. Any YAML parser that allows sophisticated object instantiation to be executed opens the potential for an injection attack. Perl parsers that allow loading of objects of arbitrary classes create so-called "blessed" values. Using these values may trigger unexpected behavior, e.g. if the class uses overloaded operators. This may lead to execution of arbitrary Perl code.
The situation is similar for Python or Ruby parsers. According to the PyYAML documentation:
Note that the ability to construct an arbitrary Python object may be dangerous if you receive a YAML document from an untrusted source such as the Internet. The function limits this ability to simple Python objects like integers or lists. ..
PyYAML allows you to construct a Python object of any type. Even instances of Python classes can be constructed using the tag.
Data processing and representation
The YAML specification identifies an ''instance document'' as a "Presentation" or "character stream". The primary logical structures in a YAML instance document are scalars, sequences, and mappings.
[Additional, optional-use, logical structures are enumerated in the YAML types repository. The tagged types in the YAML types repository are optional and therefore not essential for conformant YAML processors. "The use of these tags is not mandatory."] The YAML specification also indicates some basic constraints that apply to these primary logical structures. For example, according to the specification, mapping keys do not have an order. In every case where node order is significant, a sequence must be used.
Moreover, in defining conformance for YAML processors, the YAML specification defines two primary operations: ''dump'' and ''load''. All YAML-compliant processors must provide ''at least'' one of these operations, and may optionally provide both. Finally, the YAML specification defines an ''information model'' or "representation graph", which must be created during processing for both ''dump'' and ''load'' operations, although this representation need not be made available to the user through an API.
Comparison with other serialization formats
Comparison with JSON
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
syntax is a basis of YAML version 1.2, which was promulgated with the express purpose of bringing YAML "into compliance with JSON as an official subset".
Though prior versions of YAML were not strictly compatible, the discrepancies were rarely noticeable, and most JSON documents can be parsed by some YAML parsers such as Syck. This is because JSON's semantic structure is equivalent to the optional "inline-style" of writing YAML. While extended hierarchies can be written in inline-style like JSON, this is not a recommended YAML style except when it aids clarity.
YAML has many additional features not present in JSON, including comments, extensible data types, relational anchors, strings without quotation marks, and mapping types preserving key order.
Due to the
conciseness, JSON
serialization
In computing, serialization (or serialisation, also referred to as pickling in Python (programming language), Python) is the process of translating a data structure or object (computer science), object state into a format that can be stored (e. ...
and deserialization is much faster than YAML.
Comparison with TOML
TOML
Tom's Obvious, Minimal Language (TOML, originally ''Tom's Own Markup Language'') is a file format for configuration files. It is intended to be easy to read and write due to obvious semantics which aim to be "minimal", and it is designed to map u ...
was designed to be an advancement of the
.ini file format. YAML's minimal use of indicator characters is compared favorably to TOML's strict requirement of quotation marks and square brackets. YAML's use of
significant indentation
The off-side rule describes syntax of a computer programming language that defines the bounds of a code block via indentation.
The term was coined by Peter Landin, possibly as a pun on the offside law in association football.
An off-side r ...
has been contrasted with the
dot notation of TOML's key and table names to convey the same semantic structure. Opinions differ on which convention leads to more-readable configuration files.
Comparison with XML
YAML lacks the notion of tag attributes that are found in XML. Instead YAML has extensible type declarations (including class types for objects).
YAML itself does not have XML's language-defined document schema descriptors that allow, for example, a document to self-validate. However, there are several externally defined schema descriptor languages for YAML (e.g.
Doctrine
Doctrine (from , meaning 'teaching, instruction') is a codification (law), codification of beliefs or a body of teacher, teachings or instructions, taught principles or positions, as the essence of teachings in a given branch of knowledge or in a ...
,
Kwalify and Rx) that fulfill that role. Moreover, the semantics provided by YAML's language-defined type declarations in the YAML document itself frequently relaxes the need for a validator in simple, common situations. Additionally,
YAXML, which represents YAML data structures in XML, allows XML schema importers and output mechanisms like
XSLT
XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text, or XSL Formatting Objects. These formats c ...
to be applied to YAML.
Comparison of data-serialization formats provides a more comprehensive comparison of YAML with other serialization formats.
Software (emitters and parsers)
For fixed data structures, YAML files can simply be generated using ''print'' commands that write both the data and the YAML specific decoration. To dump varying, or complex, hierarchical data, however, a dedicated YAML ''emitter'' is preferable. Similarly, simple YAML files (e.g. key-value pairs) are readily parsed with
regular expression
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s. For more complex, or varying, data structures, a formal YAML ''parser'' is recommended.
YAML emitters and parsers exist for many popular languages. Most of them are written in the native language itself. Some are language bindings of the C library ''libyaml''; they may run faster. There used to be another C library, called ''Syck'', written and orphaned by
why the lucky stiff
Jonathan Gillette, known by the pseudonym why the lucky stiff (often abbreviated as _why), is a writer, cartoonist, artist, and programmer notable for his work with the Ruby (programming language), Ruby programming language. Annie Lowrey descri ...
: it is unmaintained, there is no authoritative source bundle, and the web site has been hijacked. Hence the only recommendable C library is ''libyaml''. It was originally developed by Kirill Simonov. In 2018, development was resumed by the new maintainers Ian Cordasco and Ingy döt Net.
C++ programmers have the choice between the C library ''libyaml'' and the C++ library ''libyaml-cpp''. Both have completely independent code bases and completely different
APIs. The library ''libyaml-cpp'' still has a major version number of 0, indicating that the API may change at any moment, as happened indeed after version 0.3. There is a grammar-focused implementation written in C#, with an aim on extensions for the nested elements.
Some implementations of YAML, such as Perl's YAML.pm, will load an entire file (stream) and parse it ''en masse''. Other implementations like PyYaml are lazy and iterate over the next document only upon request. For very large files in which one plans to handle the documents independently, instantiating the entire file before processing may be prohibitive. Thus in YAML.pm, occasionally one must chunk a file into documents and parse those individually. YAML makes this easy, since this simply requires splitting on the document end marker, which is defined as three periods at the start of a line followed by a whitespace (and possible a comment). This marker is forbidden in content.
Criticism
YAML has been criticized for its
significant whitespace
The off-side rule describes Syntax (programming languages), syntax of a computer programming language that defines the bounds of a block (programming), code block via indent style, indentation.
The term was coined by Peter Landin, possibly as ...
, confusing features, insecure defaults, and its complex and ambiguous specification:
* Configuration files can execute commands or load contents without the users realizing it.
[
* Editing large YAML files is difficult, as indentation errors can go unnoticed.][
* Type autodetection is a source of errors. For example, unquoted and are converted to Booleans; software version numbers might be converted to floats.]
* Truncated files are often interpreted as valid YAML due to the absence of terminators.
* The complexity of the standard led to inconsistent implementations and making the language non-portable.
The perceived flaws and complexity of YAML has led to the emergence of stricter alternatives such a
StrictYAML
and NestedText.
See also
* Comparison of data-serialization formats
* Lightweight markup language
A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightw ...
References
External links
*
YAMLScript
{{Data Exchange
Computer-related introductions in 2001
Data serialization formats