History
Lightweight markup languages were originally used on text-only displays which could not display characters inTypes
Lightweight markup languages can be categorized by their tag types. Like HTML (<b>bold</b>
), some languages use named elements that share a common format for start and end tags (e.g. ''bold b/code>), whereas proper lightweight markup languages are restricted to ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
-only punctuation marks and other non-letter symbols for tags, but some also mix both styles (e.g. Textile
Textile is an umbrella term that includes various fiber-based materials, including fibers, yarns, filaments, threads, different fabric types, etc. At first, the word "textiles" only referred to woven fabrics. However, weaving is not t ...
bq.
) or allow embedded HTML (e.g. Markdown
Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber and Aaron Swartz created Markdown in 2004 as a markup language that is appealing to human readers in its source code form. Markdown i ...
), possibly extended with custom elements (e.g. MediaWiki
MediaWiki is a Free and open-source software, free and open-source wiki software. It is used on Wikipedia and almost all other Wikimedia movement, Wikimedia Website, websites, including Wiktionary, Wikimedia Commons and Wikidata; these sit ...
).
Most languages distinguish between markup for lines or blocks and for shorter spans of texts, but some only support inline markup.
Some markup languages are tailored for a specific purpose, such as documenting computer code (e.g. POD, reST
Rest or REST may refer to:
Relief from activity
* Sleep
** Bed rest
* Kneeling
* Lying (position)
* Sitting
* Squatting position
Structural support
* Structural support
** Rest (cue sports)
** Armrest
** Headrest
** Footrest
Arts and ente ...
, RD) or being converted to a certain output format (usually HTML or LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosper ...
) and nothing else, others are more general in application. This includes whether they are oriented on textual presentation or on data serialization.
Presentation oriented languages include AsciiDoc
AsciiDoc is a human-readable document format, semantically equivalent to DocBook XML, but using plain-text mark-up conventions. AsciiDoc documents can be created using any text editor and read “as-is”, or rendered to HTML or any other f ...
, atx
ATX (Advanced Technology eXtended) is a motherboard and power supply configuration specification developed by Intel in 1995 to improve on previous de facto standards like the AT (form factor), AT design. It was the first major change in comput ...
, BBCode
BBCode ("Bulletin Board Code") is a lightweight markup language used to format messages in much Internet forum software, first introduced in 1998. The available "tags" of BBCode are usually indicated by square brackets ( and ">/code> and /code> ...
, Creole, Crossmark, Epytext
Epydoc is a documentation generator that processes its own lightweight markup language Epytext for Python documentation strings. As opposed to freeform Python docstrings, reStructuredText (both also supported) and other markup languages for docs ...
, Haml
Haml (HTML Abstraction Markup Language) is a templating system that is designed to avoid writing inline code in a web document and make the HTML cleaner. Haml gives the flexibility to have some dynamic content in HTML. Similar to other template s ...
, JsonML
JsonML, the JSON Markup Language is a lightweight markup language used to map between XML (Extensible Markup Language) and JSON (JavaScript Object Notation). It converts an XML document or fragment into a JSON data structure for ease of use within ...
, MakeDoc MakeDoc is a lightweight markup language created in 2000 by Carl Sassenrath for creating documentation and web pages using simple text notations. The language is used extensively in the REBOL community for documentation, websites, and wikis.
Over ...
, Markdown
Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber and Aaron Swartz created Markdown in 2004 as a markup language that is appealing to human readers in its source code form. Markdown i ...
, Org-mode
Org Mode (also: ''org-mode''; ) is a document editing, formatting, and organizing mode, designed for notes, planning, and authoring within the free software text editor Emacs. The name is used to encompass plain text files ("org files") that incl ...
, POD (Perl), reST (Python), RD (Ruby), SECST, Setext
Setext (Structure Enhanced Text) is a lightweight markup language used to format plain text documents such as e-newsletters, Usenet postings, and e-mails. In contrast to some other markup languages (such as HTML), the markup is easily readable wi ...
, SiSU
SiSU (SiSU information structuring universe or Structured information, serialized units), is a Unix command line-oriented framework for document structuring, publishing and search.
Usage
Using markup applied to a document, or a collection of doc ...
, SPIP
SPIP (''Système de Publication pour l'Internet'') is a free software content management system designed for web site publishing, oriented towards online collaborative editing.
The software is designed for easy setup, use and maintenance, and is ...
, Xupl, Texy!
Texy is a lightweight markup language as well as converter of this format to XHTML, in a form of a library written in the PHP scripting language. It allows the user to write structured documents without knowledge or using of HTML language. Use ...
, Textile, txt2tags
txt2tags is a document generator software that uses a lightweight markup language. txt2tags is free software under GNU General Public License.
Written in Python, it can export documents to several formats including: HTML, XHTML, SGML, LaTeX, Lo ...
, UDO
Udo is a masculine given name. It may refer to:
People Medieval era
*Udo of Neustria, 9th century nobleman
*Udo (Obotrite prince) (died 1028)
*Udo (archbishop of Trier) (c. 1030 – 1078)
*Lothair Udo II, Margrave of the Nordmark (c. 1025 – 1 ...
and Wikitext
A wiki ( ) is an online hypertext publication collaboratively edited and managed by its own audience, using a web browser. A typical wiki contains multiple pages for the subjects or scope of the project, and could be either open to the pub ...
.
Data serialization oriented languages include Curl
cURL (pronounced like "curl", UK: , US: ) is a computer software project providing a library (libcurl) and command-line tool (curl) for transferring data using various network protocols. The name stands for "Client URL".
History
cURL was fir ...
(homoiconic
In computer programming, homoiconicity (from the Greek words ''homo-'' meaning "the same" and ''icon'' meaning "representation") is a property of some programming languages. A language is homoiconic if a program written in it can be manipulated as ...
, but also reads JSON; every object serializes), JSON
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other s ...
, and YAML
YAML ( and ) (''see '') is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Ext ...
.
Comparison of language features
Markdown's own syntax does not support class attributes or id attributes; however, since Markdown supports the inclusion of native HTML code, these features can be implemented using direct HTML. (Some extensions may support these features.)
txt2tags' own syntax does not support class attributes or id attributes; however, since txt2tags supports inclusion of native HTML code in tagged areas, these features can be implemented using direct HTML when saving to an HTML target.
Comparison of implementation features
Comparison of lightweight markup language syntax
Inline span syntax
Although usually documented as yielding italic and bold text, most lightweight markup processors output semantic HTML elements em
and strong
instead. Monospaced text may either result in semantic code
or presentational tt
elements. Few languages make a distinction, e.g. Textile, or allow the user to configure the output easily, e.g. Texy.
LMLs sometimes differ for multi-word markup where some require the markup characters to replace the inter-word spaces (''infix'').
Some languages require a single character as prefix and suffix, other need doubled or even tripled ones or support both with slightly different meaning, e.g. different levels of emphasis.
Gemtext does not have any inline formatting, monospaced text (called preformatted text in the context of Gemtext) must have the opening and closing ```
on their own lines.
Emphasis syntax
In HTML, text is emphasized with the <em>
and <strong>
element types, whereas <i>
and <b>
traditionally mark up text to be italicized or bold-faced, respectively.
Microsoft Word and Outlook, and accordingly other word processors and mail clients that strive for a similar user experience, support the basic convention of using asterisks for boldface and underscores for italic style. While Word removes the characters, Outlook retains them.
Editorial syntax
In HTML, removed or deleted and inserted text is marked up with the <del>
and <ins>
element types, respectively. However, legacy element types <s>
or <strike>
and <u>
are still also available for stricken and underlined spans of text.
AsciiDoc, ATX, Creole, MediaWiki, PmWiki, reST, Slack, Textile, Texy! and WhatsApp do not support dedicated markup for underlining text. Textile does, however, support insertion via the +inserted+
syntax.
AsciiDoc, ATX, Creole, MediaWiki, PmWiki, reST, Setext and Texy! do not support dedicated markup for striking through text.
Programming syntax
Quoted computer code is traditionally presented in typewriter-like fonts where each character occupies the same fixed width. HTML offers the semantic <code>
and the deprecated, presentational <tt>
element types for this task.
Mediawiki and Gemtext do not provide lightweight markup for inline code spans.
Heading syntax
Headings are usually available in up to six levels, but the top one is often reserved to contain the same as the document title, which may be set externally. Some documentation may associate levels with divisional types, e.g. part, chapter, section, article or paragraph.
Most LMLs follow one of two styles for headings, either Setext
Setext (Structure Enhanced Text) is a lightweight markup language used to format plain text documents such as e-newsletters, Usenet postings, and e-mails. In contrast to some other markup languages (such as HTML), the markup is easily readable wi ...
-like underlines or atx
ATX (Advanced Technology eXtended) is a motherboard and power supply configuration specification developed by Intel in 1995 to improve on previous de facto standards like the AT (form factor), AT design. It was the first major change in comput ...
-like"atx, the true structured text format" by Aaron Swartz (2002)
/ref> line markers, or they support both.
Underlined headings
Level 1 Heading
Level 2 Heading
---------------
Level 3 Heading
~~~~~~~~~~~~~~~
The first style uses underlines, i.e. repeated characters (e.g. equals =
, hyphen -
or tilde ~
, usually at least two or four times) in the line below the heading text.
RST determines heading levels dynamically, which makes authoring more individual on the one hand, but complicates merges from external sources on the other hand.
Prefixed headings
# Level 1 Heading
## Level 2 Heading ##
### Level 3 Heading ###
The second style is based on repeated markers (e.g. hash #
, equals =
or asterisk *
) at the start of the heading itself, where the number of repetitions indicates the (sometimes inverse) heading level. Most languages also support the reduplication of the markers at the end of the line, but whereas some make them mandatory, others do not even expect their numbers to match.
Org-mode supports indentation as a means of indicating the level.
BBCode
BBCode ("Bulletin Board Code") is a lightweight markup language used to format messages in much Internet forum software, first introduced in 1998. The available "tags" of BBCode are usually indicated by square brackets ( and ">/code> and /code> ...
does not support section headings at all.
POD and Textile choose the HTML convention of numbered heading levels instead.
Microsoft Word supports auto-formatting paragraphs as headings if they do not contain more than a handful of words, no period at the end and the user hits the enter key twice. For lower levels, the user may press the tabulator key the according number of times before entering the text, i.e. one through eight tabs for heading levels two through nine.
Link syntax
Hyperlinks can either be added inline, which may clutter the code because of long URLs, or with named alias
or numbered id
references to lines containing nothing but the address and related attributes and often may be located anywhere in the document.
Most languages allow the author to specify text Text
to be displayed instead of the plain address http://example.com
and some also provide methods to set a different link title Title
which may contain more information about the destination.
LMLs that are tailored for special setups, e.g. wikis or code documentation, may automatically generate named anchors (for headings, functions etc.) inside the document, link to related pages (possibly in a different namespace) or provide a textual search for linked keywords.
Most languages employ (double) square or angular brackets to surround links, but hardly any two languages are completely compatible. Many can automatically recognize and parse absolute URLs inside the text without further markup.
Gemtext and setext links must be on a line by themselves, they cannot be used inline.
Org-mode's normal link syntax does a text search of the file. You can also put in dedicated targets with <>
.
List syntax
HTML requires an explicit element for the list, specifying its type, and one for each list item, but most lightweight markup languages need only different line prefixes for the bullet points or enumerated items. Some languages rely on indentation for nested lists, others use repeated parent list markers.
Microsoft Word automatically converts paragraphs that start with an asterisk *
, hyphen-minus -
or greater-than bracket >
followed by a space or horizontal tabulator as bullet list items. It will also start an enumerated list for the digit ''1'' and the case-insensitive letters ''a'' (for alphabetic lists) or ''i'' (for roman numerals), if they are followed by a period .
, a closing round parenthesis )
, a greater-than sign >
or a hyphen-minus -
and a space or tab; in case of the round parenthesis an optional opening one (
before the list marker is also supported.
Languages differ on whether they support optional or mandatory digits in numbered list items, which kinds of enumerators they understand (e.g. decimal digit ''1'', roman numerals ''i'' or ''I'', alphabetic letters ''a'' or ''A'') and whether they support to keep explicit values in the output format. Some Markdown dialects, for instance, will respect a start value other than 1, but ignore any other explicit value.
! (1)
! /nowiki>
!
!
!
!
!
!
!
! nest
, -
! , •
in front of a line.
Historical formats
The following lightweight markup languages, while similar to some of those already mentioned, have not yet been added to the comparison tables in this article: * EtText: circa 2000 * Grutatext: circa 2002See also
*References
External links
* {{Markup languages Computing-related lists Data serialization formats Markup language comparisons Markup languages de:Auszeichnungssprache#Lightweight Markup Language