HOME

TheInfoList



OR:

The Standard Generalized Markup Language (SGML;
ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Is ...
8879:1986) is a standard for defining generalized
markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the documen ...
s for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two
postulate An axiom, postulate, or assumption is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word (), meaning 'that which is thought worthy or ...
s": * Declarative: Markup should describe a document's structure and other attributes rather than specify the processing that needs to be performed, because it is less likely to conflict with future developments. * Rigorous: In order to allow markup to take advantage of the techniques available for processing, markup should rigorously define objects like programs and
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
s.
DocBook DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation. As a semantic languag ...
SGML and LinuxDoc are examples which used SGML tools.


Standard versions

SGML is an
ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Is ...
standard: "ISO 8879:1986 Information processing – Text and office systems – Standard Generalized Markup Language (SGML)", of which there are three versions: * Original ''SGML'', which was accepted in October 1986, followed by a minor Technical Corrigendum. * ''SGML (ENR)'', in 1996, resulted from a Technical Corrigendum to add ''extended naming rules'' allowing arbitrary-language and -script markup. * ''SGML (ENR+WWW or WebSGML)'', in 1998, resulted from
Technical Corrigendum
to better support XML and WWW requirements. SGML is part of a trio of enabling ISO standards for
electronic document An electronic document is any electronic media content (other than computer programs or system files) that is intended to be used in either an electronic form or as printed output. Originally, any computer data were considered as something int ...
s developed by
ISO/IEC JTC 1/SC 34 ISO/IEC JTC 1/SC 34, Document description and processing languages is a subcommittee of the ISO/IEC JTC 1 joint technical committee, which is a collaborative effort of both the International Organization for Standardization and the International El ...
(ISO/IEC Joint Technical Committee 1, Subcommittee 34 – Document description and processing languages) : * SGML (ISO 8879) – Generalized markup language ** SGML was reworked in 1998 into
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
, a successful
profile Profile or profiles may refer to: Art, entertainment and media Music * ''Profile'' (Jan Akkerman album), 1973 * ''Profile'' (Githead album), 2005 * ''Profile'' (Pat Donohue album), 2005 * ''Profile'' (Duke Pearson album), 1959 * '' ''Profi ...
of SGML. Full SGML is rarely found or used in new projects. *
DSSSL The Document Style Semantics and Specification Language (DSSSL) is an international standard developed to provide stylesheets for SGML documents. DSSSL consists of two parts: a tree transformation process that can be used to manipulate the tree ...
(ISO/IEC 10179) – Document processing and styling language based on
Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...
. ** DSSSL was reworked into
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequ ...
and
XSL-FO XSL-FO (XSL Formatting Objects) is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL (Extensible Stylesheet Language), a set of W3C technologies designed for the transformation and ...
which use an XML syntax. Nowadays, DSSSL is rarely used in new projects apart from
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which in ...
documentation. *
HyTime HyTime (''Hypermedia/Time-based Structuring Language'') is a markup language that is an application of SGML. HyTime defines a set of hypertext-oriented element types that, in effect, supplement SGML and allow SGML document authors to build hyperte ...
– Generalized
hypertext Hypertext is text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typically ac ...
and scheduling. ** HyTime was partially reworked into
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
XLink XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links. The XLink specification XLink 1.1 is ...
. HyTime is rarely used in new projects. SGML is supported by various technical reports, in particular * ISO/IEC TR 9573 – Information processing – SGML support facilities – Techniques for using SGML ** Part 13: Public entity sets for mathematics and science *** In 2007, the W3C MathML working group agreed to assume the maintenance of these entity sets.


History

SGML descended from IBM's
Generalized Markup Language Generalized Markup Language (GML) is a set of macros that implement intent-based (procedural) markup tags for the IBM text formatter, SCRIPT. SCRIPT/VS is the main component of IBM's Document Composition Facility (DCF). A ''starter set'' of ...
(GML), which
Charles Goldfarb Charles F. Goldfarb is known as the father of Standard Generalized Markup Language (SGML) and grandfather of HTML and the World Wide Web. He co-invented the concept of markup languages. In 1969 Charles Goldfarb, leading a small team at IBM, dev ...
, Edward Mosher, and Raymond Lorie developed in the 1960s. Goldfarb, editor of the international standard, coined the "GML" term using their surname initials. Goldfarb also wrote the definitive work on SGML syntax in "The SGML Handbook". The syntax of SGML is closer to the COCOA format. As a document markup language, SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry. Many such documents must remain readable for several decades—a long time in the
information technology Information technology (IT) is the use of computers to create, process, store, retrieve, and exchange all kinds of data . and information. IT forms part of information and communications technology (ICT). An information technology system (I ...
field. SGML also was extensively applied by the military, and the aerospace, technical reference, and industrial publishing industries. The advent of the
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
profile has made SGML suitable for widespread application for small-scale, general-purpose use.


Document validity

SGML (ENR+WWW) defines two kinds of validity. According to the revised Terms and Definitions of ISO 8879 (from the public draft):
A conforming SGML document must be either a type-valid SGML document, a tag-valid SGML document, or both. Note: A user may wish to enforce additional constraints on a document, such as whether a document instance is integrally-stored or free of entity references.
A type-valid SGML document is defined by the standard as
An SGML document in which, for each document instance, there is an associated
document type declaration #REDIRECT Document type declaration #REDIRECT Document type declaration {{redirect category shell, {{R move{{R from other capitalisation{{R up ...
{{redirect category shell, {{R move{{R from other capitalisation{{R up ...
(DTD) to whose DTD that instance conforms. A tag-valid SGML document is defined by the standard as
An SGML document, all of whose document instances are fully tagged. There need not be a
document type declaration #REDIRECT Document type declaration #REDIRECT Document type declaration {{redirect category shell, {{R move{{R from other capitalisation{{R up ...
{{redirect category shell, {{R move{{R from other capitalisation{{R up ...
associated with any of the instances. Note: If there is a
document type declaration #REDIRECT Document type declaration #REDIRECT Document type declaration {{redirect category shell, {{R move{{R from other capitalisation{{R up ...
{{redirect category shell, {{R move{{R from other capitalisation{{R up ...
, the instance can be parsed with or without reference to it.


Terminology

''Tag-validity'' was introduced in SGML (ENR+WWW) to support
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
which allows documents with no DOCTYPE declaration but which can be parsed without a grammar, or documents which have a DOCTYPE declaration that makes no
XML Infoset XML Information Set (XML Infoset) is a W3C specification describing an abstract data model of an XML document in terms of a set of ''information items''. The definitions in the XML Information Set specification are meant to be used in ''other'' s ...
contributions to the document. The standard calls this ''fully tagged''. ''Integrally stored'' reflects the
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
requirement that elements end in the same entity in which they started. ''Reference-free'' reflects the
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...
requirement that entity references are for special characters and do not contain markup. SGML validity commentary, especially commentary that was made before 1997 or that is unaware of SGML (ENR+WWW), covers ''type-validity'' only. The SGML emphasis on validity supports the requirement for generalized markup that ''markup should be rigorous.'' (ISO 8879 A.1)


Syntax

An SGML document may have three parts: # the SGML Declaration, # the Prologue, containing a DOCTYPE declaration with the various ''markup declarations'' that together make a
Document Type Definition A document type definition (DTD) is a set of ''markup declarations'' that define a ''document type'' for an SGML-family markup language ( GML, SGML, XML, HTML). A DTD defines the valid building blocks of an XML document. It defines the document s ...
(DTD), and # the instance itself, containing one top-most element and its contents. An SGML document may be composed from many entities (discrete pieces of text). In SGML, the entities and element types used in the document may be specified with a DTD, the different character sets, features, delimiter sets, and keywords are specified in the SGML Declaration to create the ''concrete syntax'' of the document. Although full SGML allows implicit markup and some other kinds of tags, the
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
specification (s4.3.1) states: For introductory information on a basic, modern SGML syntax, see
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
. The following material concentrates on features not in XML and is not a comprehensive summary of SGML syntax.


Optional features

SGML generalizes and supports a wide range of markup languages as found in the mid 1980s. These ranged from terse
Wiki A wiki ( ) is an online hypertext publication collaboratively edited and managed by its own audience, using a web browser. A typical wiki contains multiple pages for the subjects or scope of the project, and could be either open to the pub ...
-like syntaxes to RTF-like bracketed languages to
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...
-like matching-tag languages. SGML did this by a relatively simple default ''reference concrete syntax'' augmented with a large number of optional features that could be enabled in the SGML Declaration. Not every SGML parser can necessarily process every SGML document. Because each processor's ''System Declaration'' can be compared to the document's ''SGML Declaration'' it is always possible to know whether a document is supported by a particular processor. Many SGML features relate to markup minimization. Other features relate to concurrent (parallel) markup (CONCUR), to linking processing attributes (LINK), and to embedding SGML documents within SGML documents (SUBDOC). The notion of customizable features was not appropriate for Web use, so one goal of
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
was to minimize optional features. However, XML's well-formedness rules cannot support Wiki-like languages, leaving them unstandardized and difficult to integrate with non-text information systems.


Concrete and abstract syntaxes

The usual (default) SGML ''concrete syntax'' resembles this example, which is the default
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...
concrete syntax: typically something like this SGML provides an ''abstract syntax'' that can be implemented in many different types of ''concrete syntax''. Although the markup norm is using
angle brackets A bracket is either of two tall fore- or back-facing punctuation marks commonly used to isolate a segment of text or data from its surroundings. Typically deployed in symmetric pairs, an individual bracket may be identified as a 'left' or 'r ...
as start- and end- tag
delimiter A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts ...
s in an SGML document (per the standard-defined ''reference concrete syntax''), it is possible to use other characters—provided a suitable ''concrete syntax'' is defined in the document's SGML declaration. For example, an SGML interpreter might be programmed to parse GML, wherein the tags are delimited with a left colon and a right
full stop The full stop (Commonwealth English), period ( North American English), or full point , is a punctuation mark. It is used for several purposes, most often to mark the end of a declarative sentence (as distinguished from a question or exclamatio ...
, thus, an '':e'' prefix denotes an end tag: :xmp.Hello, world:exmp.. According to the reference syntax, letter-case (upper- or lower-) is not distinguished in tag names, thus the three tags: (i) <quote>, (ii) <QUOTE>, and (iii) <quOtE> are equivalent. (''NOTE:'' A concrete syntax might ''change'' this rule via the NAMECASE NAMING declarations).


Markup minimization

SGML has features for reducing the number of characters required to mark up a document, which must be enabled in the SGML Declaration. SGML processors need not support every available feature, thus allowing applications to tolerate many types of inadvertent markup omissions; however, SGML systems usually are intolerant of invalid structures. XML is intolerant of syntax omissions, and does not require a DTD for checking well-formedness.


OMITTAG

Both start tags and end tags may be omitted from a document instance, provided: # the OMITTAG feature is enabled in the SGML Declaration, # the DTD indicates that the tags are permitted to be omitted, # (for start tags) the element has no associated required (#REQUIRED) attributes, and # the tag can be unambiguously inferred by context. For example, if OMITTAG YES is specified in the SGML Declaration (enabling the OMITTAG feature), and the DTD includes the following declarations: then this excerpt: Introduction to SGML
The SGML Declaration ... which omits two tags and two tags, would represent valid markup. Omitting tags is optional – the same excerpt could be tagged like this: Introduction to SGML
The SGML Declaration ... and would still represent valid markup. Note: The OMITTAG feature is unrelated to the tagging of elements whose declared content is EMPTY as defined in the DTD: Elements defined like this have no end tag, and specifying one in the document instance would result in invalid markup. This is syntactically different from
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
empty elements in this regard.


SHORTREF

Tags can be replaced with delimiter strings, for a terser markup, via the SHORTREF feature. This markup style is now associated with
wiki markup A wiki ( ) is an online hypertext publication Collaborative editing, collaboratively edited and managed by its own audience, using a web browser. A typical wiki contains multiple pages for the subjects or scope of the project, and could be ...
, e.g. wherein two equals-signs (

), at the start of a line, are the "heading start-tag", and two equals signs (

) after that are the "heading end-tag".


SHORTTAG

SGML markup languages whose concrete syntax enables the SHORTTAG VALUE feature, do not require attribute values containing only alphanumeric characters to be enclosed within quotation marks—either double " " (LIT) or single ' ' (LITA)—so that the previous markup example could be written: typically something like this One feature of SGML markup languages is the "presumptuous empty tagging", such that the empty end tag </> in <ITALICS>this</> "inherits" its value from the nearest previous full start tag, which, in this example, is <ITALICS> (in other words, it closes the most recently opened item). The expression is thus equivalent to <ITALICS>this</ITALICS>.


NET

Another feature is the ''NET'' (Null End Tag) construction: <ITALICS/this/, which is structurally equivalent to <ITALICS>this</ITALICS>.


Other features

Additionally, the SHORTTAG NETENABL IMMEDNET feature allows shortening tags surrounding an empty text value, but forbids shortening full tags: can be written as slash Slash may refer to: * Slash (punctuation), the "/" character Arts and entertainment Fictional characters * Slash (Marvel Comics) * Slash (''Teenage Mutant Ninja Turtles'') Music * Harry Slash & The Slashtones, an American rock band * Nash ...
( / ) stands for the NET-enabling "start-tag close" (NESTC), and the second slash stands for the NET. NOTE: XML defines NESTC with a /, and NET with an > (angled bracket)—hence the corresponding construct in XML appears as . The third feature is 'text on the same line', allowing a markup item to be ended with a line-end; especially useful for headings and such, requiring using either SHORTREF or DATATAG minimization. For example, if the DTD includes the following declarations: "> (and "&#RE;&#RS;" is a short-reference delimiter in the concrete syntax), then: first line second line is equivalent to: first line second line


Formal characterization

SGML has many features that defied convenient description with the popular formal
automata theory Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science. The word ''automata'' comes from the Greek word αὐτόματ� ...
and the contemporary
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Latin ...
technology of the 1980s and the 1990s. The standard warns in Annex H: A report on an early implementation of a parser for basic SGML, the Amsterdam SGML Parser, notes and specifies various differences. There appears to be no definitive classification of full SGML against a known class of
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
. Plausible classes may include
tree-adjoining grammar Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free gram ...
s and
adaptive grammar An adaptive grammar is a formal grammar that explicitly provides mechanisms within the formalism to allow its own production rules to be manipulated. Overview John N. Shutt defines adaptive grammar as a grammatical formalism that allows rule set ...
s. XML is described as being generally parsable like a
two-level grammar A two-level grammar is a formal grammar that is used to generate another formal grammar, such as one with an infinite rule set. This is how a Van Wijngaarden grammar was used to specify Algol 68. A context-free grammar that defines the rules for a ...
for non-validated XML and a Conway-style pipeline of
coroutines Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed. Coroutines are well-suited for implementing familiar program components such as cooperativ ...
( lexer,
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Latin ...
, validator) for valid XML. The SGML productions in the ISO standard are reported to be LL(3) or LL(4). XML-class subsets are reported to be expressible using a
W-grammar In computer science, a Van Wijngaarden grammar (also vW-grammar or W-grammar) is a two-level grammar which provides a technique to define potentially infinite context-free grammars in a finite number of rules. The formalism was invented by Adriaan ...
. According to one paper, and probably considered at an '' information set'' or
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in comp ...
level rather than a character or delimiter level: The SGML standard does not define SGML with formal data structures, such as
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in comp ...
s; however, an SGML document is constructed of a rooted directed acyclic graph (RDAG) of physical storage units known as " entities", which is parsed into a RDAG of structural units known as "elements". The physical graph is loosely characterized as an ''entity tree'', but entities might appear multiple times. Moreover, the structure graph is also loosely characterized as an ''element tree'', but the ID/IDREF markup allows arbitrary arcs. The results of parsing can also be understood as a data tree in different notations; where the document is the root node, and entities in other notations (text, graphics) are child nodes. SGML provides apparatus for linking to and annotating external non-SGML entities. The SGML standard describes it in terms of ''maps'' and ''recognition modes'' (s9.6.1). Each entity, and each element, can have an associated ''notation'' or ''declared content type'', which determines the kinds of references and tags which will be recognized in that entity and element. Also, each element can have an associated ''delimiter map'' (and ''short reference map''), which determines which characters are treated as delimiters in context. The SGML standard characterizes parsing as a
state machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
switching between recognition modes. During parsing, there is a stack of maps that configure the scanner, while the
tokenizer In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
relates to the recognition modes. Parsing involves traversing the dynamically-retrieved entity graph, finding/implying tags and the element structure, and validating those tags against the grammar. An unusual aspect of SGML is that the grammar (DTD) is used both passively — to ''recognize'' lexical structures, and actively — to ''generate'' missing structures and tags that the DTD has declared optional. End- and start- tags can be omitted, because they can be inferred. Loosely, a series of tags can be omitted only if there is a single, possible path in the grammar to imply them. It was this active use of grammars that made concrete SGML parsing difficult to formally characterize. SGML uses the term ''validation'' for both recognition and generation. XML does not use the grammar (DTD) to change delimiter maps or to inform the parse modes, and does not allow tag omission; consequently, XML validation of elements is not active in the sense that SGML validation is active. SGML ''without'' a DTD (e.g. simple XML), is a grammar or a language; SGML ''with'' a DTD is a
metalanguage In logic and linguistics, a metalanguage is a language used to describe another language, often called the ''object language''. Expressions in a metalanguage are often distinguished from those in the object language by the use of italics, quot ...
. SGML with an SGML declaration is, perhaps, a meta-metalanguage, since it is a metalanguage whose declaration mechanism ''is'' a metalanguage. SGML has an abstract syntax implemented by many possible concrete syntaxes; however, this is not the same usage as in an
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
and as in a
concrete syntax tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in comp ...
. In the SGML usage, a concrete syntax is a set of specific delimiters, while the abstract syntax is the set of names for the delimiters. The
XML Infoset XML Information Set (XML Infoset) is a W3C specification describing an abstract data model of an XML document in terms of a set of ''information items''. The definitions in the XML Information Set specification are meant to be used in ''other'' s ...
corresponds more to the programming language notion of abstract syntax introduced by John McCarthy.


Derivatives


XML

The
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
XML (Extensible Markup Language) is a profile (subset) of SGML designed to ease the implementation of the parser compared to a full SGML parser, primarily for use on the World Wide Web. In addition to disabling many SGML options present in the reference syntax (such as omitting tags and nested subdocuments) XML adds a number of additional restrictions on the kinds of SGML syntax. For example, despite enabling SGML shortened tag forms, XML does not allow unclosed start or end tags. It also relied on many of the additions made by the WebSGML Annex. XML currently is more widely used than full SGML. XML has lightweight
internationalization In economics, internationalization or internationalisation is the process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization. Internationalization is a crucial strateg ...
based on
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, whic ...
. Applications of XML include
XHTML Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated. While HTML, prior ...
,
XQuery XQuery (XML Query) is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats (JSON, b ...
,
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequ ...
,
XForms XForms is an XML format used for collecting inputs from web forms. XForms was designed to be the next generation of HTML / XHTML forms, but is generic enough that it can also be used in a standalone manner or with presentation languages other th ...
,
XPointer XPointer is a system for addressing components of XML-based Internet media. It is divided among four specifications: a " framework" that forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespace ...
, JSP, SVG,
RSS RSS ( RDF Site Summary or Really Simple Syndication) is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many di ...
, Atom,
XML-RPC XML-RPC is a remote procedure call (RPC) protocol which uses XML to encode its calls and HTTP as a transport mechanism.Simon St. Laurent, Joe Johnston, Edd Dumbill. (June 2001) ''Programming Web Services with XML-RPC.'' O'Reilly. First Editio ...
,
RDF/XML RDF/XML is a syntax,RDF/XML Syntax Specification
SOAP Soap is a salt of a fatty acid used in a variety of cleansing and lubricating products. In a domestic setting, soaps are surfactants usually used for washing, bathing, and other types of housekeeping. In industrial settings, soaps are used ...
.


HTML

While HTML (Hyper Text Markup Language) was developed partially independently and in parallel with SGML, its creator,
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web. He is a Professorial Fellow of Computer Science at the University of Oxford and a profess ...
, intended it to be an application of SGML. The design of HTML was therefore inspired by SGML tagging, but, since no clear expansion and parsing guidelines were established, most actual HTML documents are not valid SGML documents. Later, HTML was reformulated (version 2.0) to be more of an SGML application; however, the HTML markup language has many legacy- and exception-handling features that differ from SGML's requirements. HTML 4 is an SGML application that fully conforms to ISO 8879 – SGML. The charter for the 2006 revival of the
World Wide Web Consortium The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
HTML Working Group says, "the Group will not assume that an SGML parser is used for 'classic HTML'". Although HTML syntax closely resembles SGML syntax with the default ''reference concrete syntax'',
HTML5 HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
abandons any attempt to define HTML as an SGML application, explicitly defining its own parsing rules, which more closely match existing implementations and documents. It does, however, define an alternative
XHTML Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated. While HTML, prior ...
serialization, which conforms to XML and therefore to SGML as well.


OED

The second edition of the '' Oxford English Dictionary'' (OED) is entirely marked up with an SGML-based markup language using the
LEXX ''Lexx'' (also known as ''LEXX: The Dark Zone Stories'' and ''Tales from a Parallel Universe'') is a science fiction television series created by Lex Gigeroff and brothers Paul and Michael Donovan. It originally aired on April 18, 1997, on Ca ...
text editor. The third edition is marked up as XML.


Others

Other document markup languages are partly related to SGML and XML, but—because they cannot be parsed or validated or other-wise processed using standard SGML and XML tools—they are not considered either SGML or XML languages; the Z Format markup language for typesetting and documentation is an example. Several modern programming languages support tags as primitive token types, or now support Unicode and
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
pattern-matching. An example is the
Scala programming language Scala ( ) is a strong statically typed general-purpose programming language that supports both object-oriented programming and functional programming. Designed to be concise, many of Scala's design decisions are aimed to address criticisms of J ...
.


Applications

Document markup languages defined using SGML are called "applications" by the standard; many pre-XML SGML applications were proprietary property of the organizations which developed them, and thus unavailable in the World Wide Web. The following list is of pre-XML SGML applications. *
Text Encoding Initiative The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and main ...
(TEI) is an academic consortium that designs, maintains, and develops technical standards for digital-format textual representation applications. *
DocBook DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation. As a semantic languag ...
is a markup language originally created as an SGML application, designed for authoring technical documentation; DocBook currently is an XML application. * CALS (Continuous Acquisition and Life-cycle Support) is a US Department of Defense (DoD) initiative for electronically capturing military documents and for linking related data and information. *
HyTime HyTime (''Hypermedia/Time-based Structuring Language'') is a markup language that is an application of SGML. HyTime defines a set of hypertext-oriented element types that, in effect, supplement SGML and allow SGML document authors to build hyperte ...
defines a set of hypertext-oriented element types that allow SGML document authors to build hypertext and multimedia presentations. *
EDGAR Edgar is a commonly used English given name, from an Anglo-Saxon name ''Eadgar'' (composed of '' ead'' "rich, prosperous" and ''gar'' "spear"). Like most Anglo-Saxon names, it fell out of use by the later medieval period; it was, however, rev ...
(Electronic Data-Gathering, Analysis, and Retrieval) system effects automated collection, validation, indexing, acceptance, and forwarding of submissions, by companies and others, who are legally required to file data and information forms with the US Securities and Exchange Commission (SEC). * LinuxDoc. Documentation for Linux packages has used the LinuxDoc SGML DTD and Docbook XML DTD. *
AAP DTD In computing, AAP DTD (variously known as AAP Electronic Manuscript Standard, AAP standard, AAP/EPSIG standard, and ANSI/NISO Z39.59) is a set of three SGML Document Type Definitions (book, journal, and article) for scientific documents, defined by ...
is a
document type definition A document type definition (DTD) is a set of ''markup declarations'' that define a ''document type'' for an SGML-family markup language ( GML, SGML, XML, HTML). A DTD defines the valid building blocks of an XML document. It defines the document s ...
for
scientific Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe. Science may be as old as the human species, and some of the earliest archeological evidence f ...
documents, defined by the
Association of American Publishers The Association of American Publishers (AAP) is the national trade association of the American book publishing industry. AAP lobbies for book, journal, and education publishers in the United States. AAP members include most of the major commercial ...
. * ISO 12083, a successor to AAP DTP, is an international SGML standard for document interchange between authors and publishers. *
SGMLguid SGMLguid, also known as "CERN SGML", "Waterloo based SGML", and "Waterloo SGML", was an early SGML application developed and used at CERN between 1986 and 1990. It served as a model of the earliest HTML specifications. History In 1984, CERN start ...
was an early SGML document type definition created, developed and used at
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gene ...
.


Open-source implementations

Significant
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
implementations of SGML have included:
ASP-SGML

ARC-SGML
by Standard Generalized Markup Language Users', 1991, C language
SGMLS
by James Clark, 1993, C language
Project YAO
by Yuan-ze Institute of Technology, Taiwan, with Charles Goldfarb, 1994, object

by James Clark, C++ language SP and Jade, the associated DSSSL processors, are maintained by th
OpenJade
project, and are common parts of Linux distributions. A general archive of SGML software and materials resides a
SUNET
The original HTML parser class, in Sun System's implementation of Java, is a limited-features SGML parser, using SGML terminology and concepts.


See also

* Organization for the Advancement of Structured Information Standards (OASIS) *
S-expression In computer programming, an S-expression (or symbolic expression, abbreviated as sexpr or sexp) is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming l ...
*
DSSSL The Document Style Semantics and Specification Language (DSSSL) is an international standard developed to provide stylesheets for SGML documents. DSSSL consists of two parts: a tree transformation process that can be used to manipulate the tree ...
 – a
Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...
-based processing language similar to XSL *
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
* List of general purpose markup languages *
Markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the documen ...
*
SGML entity The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should de ...
*
HyTime HyTime (''Hypermedia/Time-based Structuring Language'') is a markup language that is an application of SGML. HyTime defines a set of hypertext-oriented element types that, in effect, supplement SGML and allow SGML document authors to build hyperte ...
* Tag omission *
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...


References


External links


Overview of SGML Resources
at W3C's website.
SC34 Committee Records
Charles Babbage Institute The IT History Society (ITHS) is an organization that supports the history and scholarship of information technology by encouraging, fostering, and facilitating archival and historical research. Formerly known as the Charles Babbage Foundation ...
 – Collection on the development of SGML and other standards influential in the development of current XML tools; documents include early drafts of SGML administrative materials, documentation, working group papers, and standards for computer languages.
SGML Syntax Summary by Charles Goldfarb
in SGML and HTML Explained, Martin Bryan (1997) (the original URL is broken at http://www.is-thought.co.uk/book/sgml-4.htm#Fig4-2)

Wayne Wohler, IBM Corporation, 1994.

*[http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=16645 ISO/IEC 9070:1991 – Information technology – SGML support facilities – Registration procedures for public text owner identifiers] {{Authority control Data modeling languages ISO standards Markup languages Technical communication