Markup (computer programming)
   HOME

TheInfoList



OR:

Markup language refers to a text-encoding system consisting of a set of symbols inserted in a
text document A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operatin ...
to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitating automated processing. A markup language is a set of rules governing what markup information may be included in a document and how it is combined with the content of the document in a way to facilitate use by humans and computer programs. The idea and
terminology Terminology is a group of specialized words and respective meanings in a particular field, and also the study of such terms and their use; the latter meaning is also known as terminology science. A ''term'' is a word, compound word, or multi-wo ...
evolved from the "marking up" of paper
manuscript A manuscript (abbreviated MS for singular and MSS for plural) was, traditionally, any document written by hand – or, once practical typewriters became available, typewritten – as opposed to mechanically printed or reproduced i ...
s (i.e., the revision instructions by editors), which is traditionally written with a red pen or blue pencil on authors' manuscripts. Older markup languages, which typically focus on typography and presentation, include
troff troff (), short for "typesetter roff", is the major component of a document processing system developed by Bell Labs for the Unix operating system. troff and the related nroff were both developed from the original roff. While nroff was inte ...
, TeX, and
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
.
Scribe A scribe is a person who serves as a professional copyist, especially one who made copies of manuscripts before the invention of automatic printing. The profession of the scribe, previously widespread across cultures, lost most of its promi ...
and most modern markup languages, for example
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
, identify document components (for example headings, paragraphs, and tables), with the expectation that technology such as stylesheets will be used to apply formatting or other processing. Some markup languages, such as the widely used
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
, have pre-defined
presentation semantics In computer science, particularly in human-computer interaction, presentation semantics specify how a particular piece of a formal language is represented in a distinguished manner accessible to human senses, usually human vision. For example, sayi ...
, meaning that their specification prescribes some aspects of how to present the
structured data A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
on particular media. HTML, like DocBook,
Open eBook Open eBook (OEB), or formally, the Open eBook Publication Structure (OEBPS), is a legacy e-book format which has been superseded by the EPUB format. It was "based primarily on technology developed by SoftBook Press". and on XML. OEB was released ...
,
JATS The Jat people ((), ()) are a traditionally agricultural community in Northern India and Pakistan. Originally pastoralists in the lower Indus river-valley of Sindh, Jats migrated north into the Punjab region in late medieval times, and su ...
, and many others is based on the markup meta-languages
SGML The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should ...
and
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
. That is, SGML and XML allow designers to specify particular
schema The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms. Schema may refer to: Science and technology * SCHEMA ...
s, which determine which elements, attributes, and other features are permitted, and where. One extremely important characteristic of most markup languages is that they allow intermingling markup with document content such as text and pictures. For example, if a few words in a sentence need to be emphasized, or identified as a proper name, defined term, or another special item, the markup may be inserted between the characters of the sentence. This is quite different structurally from traditional
databases In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
, where it is by definition impossible to have data that is within a record but not within any field. Furthermore, markup for human-readable texts must maintain order: it would not suffice to make each paragraph of a book into a "paragraph" record, where those records do not maintain order.


Etymology

The noun ''markup'' is derived from the traditional publishing practice called ''"marking up"'' a
manuscript A manuscript (abbreviated MS for singular and MSS for plural) was, traditionally, any document written by hand – or, once practical typewriters became available, typewritten – as opposed to mechanically printed or reproduced i ...
, which involves adding handwritten annotations in the form of conventional symbolic printer's instructions — in the margins and the text of a paper or a printed manuscript. For centuries, this task was done primarily by skilled typographers known as "markup men" or "d markers" who marked up text to indicate what
typeface A typeface (or font family) is the design of lettering that can include variations in size, weight (e.g. bold), slope (e.g. italic), width (e.g. condensed), and so on. Each of these variations of the typeface is a font. There are thousands o ...
, style, and size should be applied to each part, and then passed the manuscript to others for
typesetting Typesetting is the composition of text by means of arranging physical ''type'' (or ''sort'') in mechanical systems or '' glyphs'' in digital systems representing '' characters'' (letters and other symbols).Dictionary.com Unabridged. Random ...
by hand or machine. The markup was also commonly applied by editors,
proofreader Proofreading is the reading of a galley proof or an electronic copy of a publication to find and correct reproduction errors of text or art. Proofreading is the final step in the editorial cycle before publication. Professional Traditiona ...
s, publishers, and graphic designers, and indeed by document authors, all of whom might also mark other things, such as corrections, changes, etc.


Types of markup language

There are three main general categories of electronic markup, articulated in Coombs, Renear, and DeRose (1987), and Bray (2003).


Presentational markup

:The kind of markup used by traditional
word-processing A word processor is an electronic device (later a computer software application) for text, composing, editing, formatting, and printing. The word processor was a stand-alone office machine in the 1960s, combining the keyboard text-entry and pri ...
systems: binary codes embedded within document text that produce the
WYSIWYG In computing, WYSIWYG ( ), an acronym for What You See Is What You Get, is a system in which editing software allows content to be edited in a form that resembles its appearance when printed or displayed as a finished product, such as a printed d ...
("what you see is what you get") effect. Such markup is usually hidden from human users, even authors and editors. Properly speaking, such systems use procedural and/or descriptive markup underneath but convert it to "present" to the user as geometric arrangements of type.


Procedural markup

:Markup is embedded in text which provides instructions for
programs Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of managing several related projects * Time management * Program, a part of planning Arts and entertainment Audio * Progra ...
to process the text. Well-known examples include
troff troff (), short for "typesetter roff", is the major component of a document processing system developed by Bell Labs for the Unix operating system. troff and the related nroff were both developed from the original roff. While nroff was inte ...
, TeX, and
Markdown Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber and Aaron Swartz created Markdown in 2004 as a markup language that is appealing to human readers in its source code form. Markdown i ...
. It is assumed that software processes the text sequentially from beginning to end, following the instructions as encountered. Such text is often edited with the markup visible and directly manipulated by the author. Popular procedural markup systems usually include
programming constructs Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of managing several related projects * Time management * Program, a part of planning Arts and entertainment Audio * Progr ...
, especially macros, allowing complex sets of instructions to be invoked by a simple name (and perhaps a few parameters). This is much faster, less error-prone, and more maintenance-friendly than re-stating the same or similar instructions in many places.


Descriptive markup

: Markup is specifically used to label parts of the document for what they are, rather than how they should be processed. Well-known systems that provide many such labels include
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
,
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
, and
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
. The objective is to decouple the
structure A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such a ...
of the document from any particular treatment or rendition of it. Such markup is often described as "
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
". An example of a descriptive markup would be HTML's <cite> tag, which is used to label a citation. Descriptive markup — sometimes called ''logical markup'' or ''conceptual markup'' — encourages authors to write in a way that describes the material conceptually, rather than visually. There is a considerable blurring of the lines between the types of markup. In modern word-processing systems, presentational markup is often saved in descriptive-markup-oriented systems such as
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
, and then processed procedurally by implementations. The programming in procedural-markup systems, such as TeX, may be used to create higher-level markup systems that are more descriptive in nature, such as
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
. In recent years, a number of markup languages have been developed with ease of use as a key goal, and without input from standards organizations, aimed at allowing authors to create formatted text via
web browsers A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
, for example in
wiki A wiki ( ) is an online hypertext publication collaboratively edited and managed by its own audience, using a web browser. A typical wiki contains multiple pages for the subjects or scope of the project, and could be either open to the pub ...
s and in
web forums An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are often longer than one line of text, and are at least temporar ...
. These are sometimes called
lightweight markup language A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightwei ...
s.
Markdown Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber and Aaron Swartz created Markdown in 2004 as a markup language that is appealing to human readers in its source code form. Markdown i ...
,
BBCode BBCode ("Bulletin Board Code") is a lightweight markup language used to format messages in much Internet forum software, first introduced in 1998. The available "tags" of BBCode are usually indicated by square brackets ( _and_.html" ;"title="/code> ...
, and the
markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
used by
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...
are examples of such languages.


History of markup languages


GenCode

The first well-known public presentation of markup languages in computer text processing was made by William W. Tunnicliffe at a conference in 1967, although he preferred to call it ''generic coding.'' It can be seen as a response to the emergence of programs such as
RUNOFF Runoff, run-off or RUNOFF may refer to: * RUNOFF, the first computer text-formatting program * Runoff or run-off, another name for bleed, printing that lies beyond the edges to which a printed sheet is trimmed * Runoff or run-off, a stock marke ...
that each used their own control notations, often specific to the target typesetting device. In the 1970s, Tunnicliffe led the development of a standard called GenCode for the publishing industry and later was the first chairman of the
International Organization for Standardization The International Organization for Standardization (ISO ) is an international standard development organization composed of representatives from the national standards organizations of member countries. Membership requirements are given in A ...
committee that created
SGML The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should ...
, the first standard descriptive markup language.
Book designer Book design is the art of incorporating the content, style, format, design, and sequence of the various components and elements of a book into a coherent unit. In the words of renowned typographer Jan Tschichold (1902–1974), book design, "though ...
Stanley Rice published speculation along similar lines in 1970. Brian Reid, in his 1980 dissertation at
Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania. One of its predecessors was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools; it became the Carnegie Institute of Technology ...
, developed the theory and a working implementation of descriptive markup in actual use. However, IBM researcher
Charles Goldfarb Charles F. Goldfarb is known as the father of Standard Generalized Markup Language (SGML) and grandfather of HTML and the World Wide Web. He co-invented the concept of markup languages. In 1969 Charles Goldfarb, leading a small team at IBM, dev ...
is more commonly seen today as the "father" of markup languages. Goldfarb hit upon the basic idea while working on a primitive document management system intended for law firms in 1969, and helped invent IBM GML later that same year. GML was first publicly disclosed in 1973. In 1975, Goldfarb moved from
Cambridge, Massachusetts Cambridge ( ) is a city in Middlesex County, Massachusetts, United States. As part of the Boston metropolitan area, the cities population of the 2020 U.S. census was 118,403, making it the fourth most populous city in the state, behind Boston, ...
to
Silicon Valley Silicon Valley is a region in Northern California that serves as a global center for high technology and innovation. Located in the southern part of the San Francisco Bay Area, it corresponds roughly to the geographical areas San Mateo Cou ...
and became a product planner at the
IBM Almaden Research Center IBM Research is the research and development division for IBM, an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. IBM Research is the largest industrial research o ...
. There, he convinced IBM's executives to deploy GML commercially in 1978 as part of IBM's Document Composition Facility product, and it was widely used in business within a few years. SGML, which was based on both GML and GenCode, was an ISO project worked on by Goldfarb beginning in 1974. Goldfarb eventually became chair of the SGML committee. SGML was first released by ISO as the ISO 8879 standard in October 1986.


troff and nroff

Some early examples of computer markup languages available outside the publishing industry can be found in typesetting tools on
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
systems such as
troff troff (), short for "typesetter roff", is the major component of a document processing system developed by Bell Labs for the Unix operating system. troff and the related nroff were both developed from the original roff. While nroff was inte ...
and
nroff nroff (short for "new roff") is a text-formatting program on Unix and Unix-like operating systems. It produces output suitable for simple fixed-width printers and terminal windows. It is an integral part of the Unix help system, being used t ...
. In these systems, formatting commands were inserted into the document text so that typesetting software could format the text according to the editor's specifications. It was a
trial and error Trial and error is a fundamental method of problem-solving characterized by repeated, varied attempts which are continued until success, or until the practicer stops trying. According to W.H. Thorpe, the term was devised by C. Lloyd Morgan (18 ...
iterative process to get a document printed correctly. Availability of
WYSIWYG In computing, WYSIWYG ( ), an acronym for What You See Is What You Get, is a system in which editing software allows content to be edited in a form that resembles its appearance when printed or displayed as a finished product, such as a printed d ...
("what you see is what you get") publishing software supplanted much use of these languages among casual users, though serious publishing work still uses markup to specify the non-visual structure of texts, and WYSIWYG editors now usually save documents in a markup-language-based format.


TeX

Another major publishing standard is TeX, created and refined by
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
in the 1970s and '80s. TeX concentrated on the detailed layout of text and font descriptions to typeset mathematical books. This required Knuth to spend considerable time investigating the art of
typesetting Typesetting is the composition of text by means of arranging physical ''type'' (or ''sort'') in mechanical systems or '' glyphs'' in digital systems representing '' characters'' (letters and other symbols).Dictionary.com Unabridged. Random ...
. TeX is mainly used in
academia An academy (Attic Greek: Ἀκαδήμεια; Koine Greek Ἀκαδημία) is an institution of secondary education, secondary or tertiary education, tertiary higher education, higher learning (and generally also research or honorary membershi ...
, where it is a '' de facto'' standard in many scientific disciplines. A TeX macro package known as
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
provides a descriptive markup system on top of TeX, and is widely used both among the scientific community and the publishing industry.


Scribe, GML, and SGML

The first language to make a clean distinction between structure and presentation was
Scribe A scribe is a person who serves as a professional copyist, especially one who made copies of manuscripts before the invention of automatic printing. The profession of the scribe, previously widespread across cultures, lost most of its promi ...
, developed by Brian Reid and described in his doctoral thesis in 1980. Scribe was revolutionary in a number of ways, not least that it introduced the idea of styles separated from the marked-up document, and of a
grammar In linguistics, the grammar of a natural language is its set of structural constraints on speakers' or writers' composition of clauses, phrases, and words. The term can also refer to the study of such constraints, a field that includes doma ...
controlling the usage of descriptive elements. Did scribe influence the development of
Generalized Markup Language Generalized Markup Language (GML) is a set of macros that implement intent-based (procedural) markup tags for the IBM text formatter, SCRIPT. SCRIPT/VS is the main component of IBM's Document Composition Facility (DCF). A ''starter set'' of ...
(later SGML), and is a direct ancestor to
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
and
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
. In the early 1980s, the idea that markup should focus on the structural aspects of a document and leave the visual presentation of that structure to the interpreter led to the creation of
SGML The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should ...
. The language was developed by a committee chaired by Goldfarb. It incorporated ideas from many different sources, including Tunnicliffe's project, GenCode. Sharon Adler, Anders Berglund, and James A. Marke were also key members of the SGML committee. SGML specified a syntax for including the markup in documents, as well as one for separately describing ''what'' tags were allowed, and ''where'' (the Document Type Definition ( DTD), later known as a
schema The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms. Schema may refer to: Science and technology * SCHEMA ...
). This allowed authors to create and use any markup they wished, selecting tags that made the most sense to them and were named in their own natural languages, while also allowing automated verification. Thus, SGML is properly a meta-language, and many particular markup languages are derived from it. From the late '80s onward, most substantial new markup languages have been based on the SGML system, including for example TEI and DocBook. SGML was promulgated as an International Standard by
International Organization for Standardization The International Organization for Standardization (ISO ) is an international standard development organization composed of representatives from the national standards organizations of member countries. Membership requirements are given in A ...
, ISO 8879, in 1986. SGML found wide acceptance and use in fields with very large-scale documentation requirements. However, many found it cumbersome and difficult to learn — a side effect of its design attempting to do too much and being too flexible. For example, SGML made end tags (or start-tags, or even both) optional in certain contexts, because its developers thought markup would be done manually by overworked support staff who would appreciate saving keystrokes.


HTML

In 1989, computer scientist
Sir Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web. He is a Professorial Fellow of Computer Science at the University of Oxford and a profes ...
wrote a memo proposing an
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, p ...
-based
hypertext Hypertext is text displayed on a computer display or other electronic devices with references ( hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typically ...
system, then specified HTML and wrote the browser and server software in the last part of 1990. The first publicly available description of HTML was a document called "HTML Tags", first mentioned on the Internet by Berners-Lee in late 1991. It describes 18 elements comprising the initial, relatively simple design of HTML. Except for the hyperlink tag, these were strongly influenced by
SGMLguid SGMLguid, also known as "CERN SGML", "Waterloo based SGML", and "Waterloo SGML", was an early SGML application developed and used at CERN between 1986 and 1990. It served as a model of the earliest HTML specifications. History In 1984, CERN star ...
, an in-house
SGML The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should ...
-based documentation format at
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gen ...
, and very similar to the sample schema in the SGML standard. Eleven of these elements still exist in HTML 4. Berners-Lee considered HTML an SGML application. The
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements an ...
(IETF) formally defined it as such with the mid-1993 publication of the first proposal for an HTML specification
"Hypertext Markup Language (HTML)" Internet-Draft
by Berners-Lee and Dan Connolly, which included an SGML
Document Type Definition A document type definition (DTD) is a set of ''markup declarations'' that define a ''document type'' for an SGML-family markup language ( GML, SGML, XML, HTML). A DTD defines the valid building blocks of an XML document. It defines the document s ...
to define the grammar. Many of the HTML text elements are found in the 1988 ISO technical report TR 9537 ''Techniques for using SGML'', which in turn covers the features of early text formatting languages such as that used by the RUNOFF command developed in the early 1960s for the CTSS (Compatible Time-Sharing System) operating system. These formatting commands were derived from those used by typesetters to manually format documents.
Steven DeRose Stephen or Steven is a common English first name. It is particularly significant to Christians, as it belonged to Saint Stephen ( grc-gre, Στέφανος ), an early disciple and deacon who, according to the Book of Acts, was stoned to death; ...
argues that HTML's use of descriptive markup (and the influence of SGML in particular) was a major factor in the success of the Web, because of the flexibility and extensibility that it enabled. HTML became the main markup language for creating web pages and other information that can be displayed in a web browser and is quite likely the most used markup language in the world today.


XML

XML (Extensible Markup Language) is a meta markup language that is very widely used. XML was developed by the
World Wide Web Consortium The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working ...
, in a committee created and chaired by Jon Bosak. The main purpose of XML was to simplify SGML by focusing on a particular problem — documents on the Internet. XML remains a meta-language like SGML, allowing users to create any tags needed (hence "extensible") and then describing those tags and their permitted uses. XML adoption was helped because every XML document can be written in such a way that it is also an SGML document, and existing SGML users and software could switch to XML fairly easily. However, XML eliminated many of the more complex features of SGML to simplify implementation environments such as documents and publications. It appeared to strike a happy medium between simplicity and flexibility, as well as supporting very robust schema definition and validation tools, and was rapidly adopted for many other uses. XML is now widely used for communicating
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
between applications, for serializing program data, for hardware communications protocols, vector graphics, and many other uses as well as documents.


XHTML

From January 2000 until HTML 5 was released, all
W3C Recommendation The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working t ...
s for HTML have been based on XML, using the abbreviation
XHTML Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated. While HTML, prior ...
(Extensible HyperText Markup Language). The language specification requires that XHTML Web documents be ''well-formed'' XML documents. This allows for more rigorous and robust documents, by avoiding many syntax errors which historically led to incompatible browser behaviors, while still using document components that are familiar with HTML. One of the most noticeable differences between HTML and XHTML is the rule that ''all tags must be closed'': empty HTML tags such as
must either be ''closed'' with a regular end-tag, or replaced by a special form: (the space before the '/' on the end tag is optional, but frequently used because it enables some pre-XML Web browsers, and SGML parsers, to accept the tag). Another difference is that all
attribute Attribute may refer to: * Attribute (philosophy), an extrinsic property of an object * Attribute (research), a characteristic of an object * Grammatical modifier, in natural languages * Attribute (computing), a specification that defines a prope ...
values in tags must be quoted. Both these differences are commonly criticized as verbose but also praised because they make it far easier to detect, localize, and repair errors. Finally, all tag and attribute names within the XHTML namespace must be lowercase to be valid. HTML, on the other hand, was case-insensitive.


Other XML-based applications

Many XML-based applications now exist, including the
Resource Description Framework The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of ...
as
RDF/XML RDF/XML is a syntax,RDF/XML Syntax Specification
XForms XForms is an XML format used for collecting inputs from web forms. XForms was designed to be the next generation of HTML / XHTML forms, but is generic enough that it can also be used in a standalone manner or with presentation languages other th ...
, DocBook,
SOAP Soap is a salt of a fatty acid used in a variety of cleansing and lubricating products. In a domestic setting, soaps are surfactants usually used for washing, bathing, and other types of housekeeping. In industrial settings, soaps are us ...
, and the
Web Ontology Language The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for vario ...
(OWL). For a partial list of these, see
List of XML markup languages This is a list of notable XML markup languages. A *AdsML Markup language used for interchange of data between advertising systems. *aecXML: a mark-up language which uses Industry Foundation Classes to create a vendor-neutral means to access da ...
.


Features of markup languages

A common feature of many markup languages is that they intermix the text of a document with markup instructions in the same data stream or file. This is not necessary; it is possible to isolate markup from text content, using pointers, offsets, IDs, or other methods to coordinate the two. Such "standoff markup" is typical for the internal representations that programs use to work with marked-up documents. However, embedded or "inline" markup is much more common elsewhere. Here, for example, is a small section of text marked up in HTML: My test page

Mozilla is cool

The Firefox logo: a flaming fox surrounding the Earth.

At Mozilla, we’re a global community of

working together to keep the Internet alive and accessible, so people worldwide can be informed contributors and creators of the Web. We believe this act of human collaboration across an open platform is essential to individual growth and our collective future.

Read the Mozilla Manifesto to learn even more about the values and principles that guide the pursuit of our mission.

The codes enclosed in angle-brackets < like this> are markup instructions (known as tags), while the text between these instructions is the actual text of the document. The codes h1, p, and em are examples of ''semantic'' markup, in that they describe the intended purpose or the meaning of the text they include. Specifically, h1 means "this is a first-level heading", p means "this is a paragraph", and em means "this is an emphasized word or phrase". A program interpreting such structural markup may apply its own rules or styles for presenting the various pieces of text, using different typefaces, boldness, font size, indentation, color, or other styles, as desired. For example, a tag such as "h1" (header level 1) might be presented in a large bold sans-serif typeface in an article, or it might be underscored in a monospaced (typewriter-style) document – or it might simply not change the presentation at all. In contrast, the i tag in HTML 4 is an example of ''presentational'' markup, which is generally used to specify a particular characteristic of the text without specifying the reason for that appearance. In this case, the i element dictates the use of an italic typeface. However, in HTML 5, this element has been repurposed with a more semantic usage: to denote a span of text in an alternate voice or mood, or otherwise offset from the normal prose in a manner indicating a different quality of text. For example, it is appropriate to use the i element to indicate a taxonomic designation or a phrase in another language. The change was made to ease the transition from HTML 4 to HTML 5 as smoothly as possible so that deprecated uses of presentational elements would preserve the most likely intended semantics. The
Text Encoding Initiative The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and main ...
(TEI) has published extensive guidelines for how to encode texts of interest in the humanities and social sciences, developed through years of international cooperative work. These guidelines are used by projects encoding historical documents, the works of particular scholars, periods, genres, and so on.


Language

While the idea of markup language originated with text documents, there is increasing use of markup languages in the presentation of other types of information, including
playlist A playlist is a list of video or audio files that can be played back on a media player either sequentially or in a shuffled order. In its most general form, an audio playlist is simply a list of songs, but sometimes a loop. The term has sev ...
s,
vector graphics Vector graphics is a form of computer graphics in which visual images are created directly from geometric shapes defined on a Cartesian plane, such as points, lines, curves and polygons. The associated mechanisms may include vector display ...
, web services, content syndication, and
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine f ...
s. Most of these are XML applications because XML is a well-defined and extensible language. The use of XML has also led to the possibility of combining multiple markup languages into a single profile, like XHTML+SMIL and XHTML+MathML+SVG.An XHTML + MathML + SVG Profile
W3C. August 9, 2002. Retrieved 2021-08-16.


See also

*
Lightweight markup language A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightwei ...
*
Comparison of document markup languages The following tables compare general and technical information for a number of document markup languages. Please see the individual markup languages' articles for further information. General information Basic general information about the marku ...
*
Curl (programming language) Curl is a reflective object-oriented programming language for interactive web applications whose goal is to provide a smoother transition between formatting and programming. It makes it possible to embed complex objects in simple documents witho ...
*
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
*
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
*
List of markup languages This is a list of markup languages. This page directly lists markup languages that have not yet been assigned to more specific categories. However, many specific markup language are instead listed only under the narrower lists referenced below. Bu ...
*
Markdown Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber and Aaron Swartz created Markdown in 2004 as a markup language that is appealing to human readers in its source code form. Markdown i ...
*
Programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
*
Modelling language A modeling language is any artificial language that can be used to express information or knowledge or systems in a structure that is defined by a consistent set of rules. The rules are used for interpretation of the meaning of components in the st ...
*
Plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
* Formatted text *
ReStructuredText reStructuredText (RST, ReST, or reST) is a file format for textual data used primarily in the Python programming language community for technical documentation. It is part of the Docutils project of the Python Doc-SIG (Documentation Special Inte ...
*
Style language A style sheet language, or style language, is a computer language that expresses the presentation of structured documents. One attractive feature of structured documents is that the content can be reused in many contexts and presented in various w ...
* Tag (markup) *
WYSIWYG In computing, WYSIWYG ( ), an acronym for What You See Is What You Get, is a system in which editing software allows content to be edited in a form that resembles its appearance when printed or displayed as a finished product, such as a printed d ...
*
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...


References


External links

{{Authority control Formal languages American inventions