XML Information Set (XML Infoset) is a
W3C specification that defines an abstract
data model of an
XML document in terms of a set of ''information items''. The XML Infoset provides a standardized way to refer to the components of XML documents, serving as a foundation for XML-related standards and tools.
The XML Infoset identifies eleven different types of information items, including the document, elements, attributes,
processing instructions,
characters, and
namespaces. Each information item has a set of named properties, which represent specific aspects of the XML document being modeled. For example, an element information item has properties such as the element's namespace name,
local name,
children, and
attributes.
An XML document has an information set if it is
well-formed and satisfies the
namespace constraints. There is no requirement for an XML document to be
valid according to a
DTD or
XML Schema in order to have an information set.
XML was initially developed without a formal definition of its infoset. This conceptual foundation was only formalized by later work beginning in 1999, first published as a separate W3C Working Draft at the end of December that year. The Infoset Recommendation Second Edition was adopted on February 4, 2004.
The XML Information Set specification has become a cornerstone of the XML technology stack, enabling higher-level specifications such as
XPath,
XSLT,
DOM,
XQuery, and many others to describe their functionality in terms of the XML Infoset rather than the concrete XML syntax. This abstraction allows these technologies to operate on XML content regardless of its specific serialization format. If a 2.0 version of the XML standard is ever published, it is likely that this would absorb the Infoset recommendation as an integral part of that standard.
Information items
An information set can contain up to eleven different types of information items:
#The Document Information Item (always present)
#Element Information Items
#Attribute Information Items
#
Processing Instruction Information Items
#Unexpanded Entity Reference Information Items
#Character Information Items
#Comment Information Items
#The Document Type Declaration Information Item
#Unparsed Entity Information Items
#Notation Information Items
#
Namespace Information Items
Infoset augmentation
Infoset augmentation or infoset modification refers to the process of modifying the infoset during
schema
Schema may refer to:
Science and technology
* SCHEMA (bioinformatics), an algorithm used in protein engineering
* Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity
* Schema.org, a web markup vocab ...
validation, for example by adding default attributes. The augmented infoset is called the post-schema-validation infoset, or
PSVI.
Infoset augmentation is somewhat controversial, with claims that it is a violation of modularity and tends to cause interoperability problems, since applications get different information depending on whether or not validation has been performed.
Infoset augmentation is supported by
XML Schema but not
RELAX NG.
Serialization
Typically, XML Information Set is serialized as XML. There are also serialization formats for
Binary XML,
CSV, and
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
.
Apache CXF JSON Support
/ref>
See also
XML Information Set instances:
* Document Object Model
* Xpath data model
* SXML
References
External links
*
World Wide Web Consortium standards
XML-based standards
{{www-stub
ja:Extensible Markup Language#XMLインフォメーションセット