Office MathML
   HOME

TheInfoList



OR:

The Office Open XML file formats are a set of
file format A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
s that can be used to represent electronic
office An office is a space where the employees of an organization perform Business administration, administrative Work (human activity), work in order to support and realize the various goals of the organization. The word "office" may also denote a po ...
documents. There are formats for
word processing A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features. Word processor (electronic device), Early word processors were stand-alone devices dedicate ...
documents,
spreadsheets A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in ce ...
and
presentations A presentation conveys information from a speaker to an audience. Presentations are typically demonstrations, introduction, lecture, or speech meant to inform, persuade, inspire, motivate, build goodwill, or present a new idea/product. Presenta ...
as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc. The formats were developed by
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
and first appeared in
Microsoft Office 2007 Microsoft Office 2007 (codenamed Office 12) is an office suite for Windows, developed and published by Microsoft. It was officially revealed on March 9, 2006 and was the 12th version of Microsoft Office. It was released to manufacturing on Novemb ...
. They were standardized between December 2006 and November 2008, first by the
Ecma International Ecma International () is a Nonprofit organization, nonprofit standards organization for information and communication systems. It acquired its current name in 1994, when the European Computer Manufacturers Association (ECMA) changed its name to ...
consortium, where they became ECMA-376, and subsequently, after a contentious standardization process, by the ISO/IEC's Joint Technical Committee 1, where they became ISO/IEC 29500:2008.


Container

Office Open XML documents are stored in
Open Packaging Conventions The Open Packaging Conventions (OPC) is a container-file technology initially created by Microsoft to store a combination of XML and non-XML files that together form a single entity such as an Open XML Paper Specification (OpenXPS) document. OPC- ...
(OPC) packages, which are
ZIP file ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is t ...
s containing
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
and other data files, along with a specification of the relationships between them. Depending on the type of the document, the packages have different internal directory structures and names. An application will use the relationships files to locate individual sections (files), with each having accompanying metadata, in particular
MIME A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...
metadata. A basic package contains an XML file called '' ontent_Typesxml'' at the root, along with three directories: ''_rels'', ''docProps'', and a directory specific for the document type (for example, in a .docx word processing package, there would be a ''word'' directory). The ''word'' directory contains the ''document.xml'' file which is the core content of the document. ; ontent_Typesxml: This file provided MIME type information for parts of the package, using defaults for certain file extensions and overrides for parts specified by
IRI IRI or I.R.I. refers to: Businesses and organizations * Iringa Airport, an airport in Tanzania serving Iringa and the surrounding Iringa Region by IATA airport code * India Rejuvenation Initiative, an Indian anti-corruption organization form ...
. ; _rels: This directory contains relationships for the files within the package. To find the relationships for a specific file, look for the ''_rels'' directory that is a sibling of the file, and then for a file that has the original file name with a ''.rels'' appended to it. For example, if the content types file had any relationships, there would be a file called '' ontent_Typesxml.rels'' inside the ''_rels'' directory. ; _rels/.rels: This file is where the package relationships are located. Applications look here first. Viewing in a text editor, one will see it outlines each relationship for that section. In a minimal document containing only the basic ''document.xml'' file, the relationships detailed are
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
and ''document.xml''. ; docProps/core.xml: This file contains the core properties for any Office Open XML document. ; word/document.xml: This file is the main part for any Word document.


Relationships

An example relationship file (''word/_rels/document.xml.rels''), is: As such, images referenced in the document can be found in the relationship file by looking for all relationships that are of type http://schemas.microsoft.com/office/2006/relationships/image. To change the used image, edit the relationship. The following code shows an example of inline markup for a
hyperlink In computing, a hyperlink, or simply a link, is a digital reference providing direct access to Data (computing), data by a user (computing), user's point and click, clicking or touchscreen, tapping. A hyperlink points to a whole document or to ...
: In this example, the
Uniform Resource Locator A uniform resource locator (URL), colloquially known as an address on the World Wide Web, Web, is a reference to a web resource, resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific t ...
(URL) is in the Target attribute of the Relationship referenced through the relationship Id, "rId2" in this case. Linked images, templates, and other items are referenced in the same way. Pictures can be embedded or linked using a tag: This is the reference to the image file. All references are managed via relationships. For example, a document.xml has a relationship to the image. There is a _rels directory in the same directory as document.xml, inside _rels is a file called document.xml.rels. In this file there will be a relationship definition that contains type, ID and location. The ID is the referenced ID used in the XML document. The type will be a reference schema definition for the media type and the location will be an internal location within the ZIP package or an external location defined with a URL.


Document properties

Office Open XML uses the
Dublin Core 140px, Logo of DCMI, maintenance agency for Dublin Core Terms The Dublin Core vocabulary, also known as the Dublin Core Metadata Terms (DCMT), is a general purpose metadata vocabulary for describing resources of any type. It was first developed ...
Metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
Element Set and DCMI Metadata Terms to store document properties. Dublin Core is a standard for cross-domain information resource description and is defined i
ISO 15836:2003
An example document properties file (''docProps/core.xml'') that uses Dublin Core metadata, is: Office Open XML File format and structure Wikipedia Office Open XML, Metadata, Dublin Core Office Open XML uses ISO 15836:2003 Wikipedia 1 2008-06-19T20:00:00Z 2008-06-19T20:42:00Z Document file format Final


Document markup languages

An Office Open XML file may contain several documents encoded in specialized
markup language A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
s corresponding to applications within the Microsoft Office product line. Office Open XML defines multiple vocabularies using 27
namespaces In computing, a namespace is a set of signs (''names'') that are used to identify and refer to objects of various kinds. A namespace ensures that all of a given set of objects have unique names so that they can be easily identified. Namespaces ...
and 89
schema Schema may refer to: Science and technology * SCHEMA (bioinformatics), an algorithm used in protein engineering * Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity * Schema.org, a web markup vocab ...
modules. The primary markup languages are: * WordprocessingML for word-processing * SpreadsheetML for spreadsheets * PresentationML for presentations Shared markup language materials include: * Office Math Markup Language (OMML) * DrawingML used for vector drawing, charts, and for example, text art (additionally, though deprecated,
VML VML may refer to: * Varnish microlamination, a dating methodology * Vastus medialis longus, the muscle * Vector Markup Language, an obsolete XML-based file format for two-dimensional vector graphics * Veturimiesten liitto, a trade union represen ...
is supported for drawing) * Extended properties * Custom properties * Variant Types * Custom XML data properties * Bibliography In addition to the above markup languages custom XML schemas can be used to extend Office Open XML.


Design approach

Patrick Durusau, the editor of ODF, has viewed the markup style of OOXML and ODF as representing two sides of a debate: the "element side" and the "attribute side". He notes that OOXML represents "the element side of this approach" and singles out the KeepNext element as an example: In contrast, he notes ODF would use the single attribute fo:keep-next, rather than an element, for the same semantic. The
XML Schema An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constrai ...
of Office Open XML emphasizes reducing load time and improving
parsing Parsing, syntax analysis, or syntactic analysis is a process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gramm ...
speed. In a test with applications current in April 2007, XML-based office documents were slower to load than binary formats. To enhance performance, Office Open XML uses very short element names for common elements and spreadsheets save dates as index numbers (starting from 1900 or from 1904). In order to be systematic and generic, Office Open XML typically uses separate child elements for data and metadata (element names ending in ''Pr'' for ''properties'') rather than using multiple attributes, which allows structured properties. Office Open XML does not use mixed content but uses elements to put a series of text runs (element name ''r'') into paragraphs (element name ''p''). The result is terse and highly nested in contrast to
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
, for example, which is fairly flat, designed for humans to write in
text editors A text editor is a type of computer program that edits plain text. An example of such program is "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be used to ...
and is more congenial for humans to read. The naming of elements and attributes within the text has attracted some criticism. There are three different syntaxes in OOXML (ECMA-376) for specifying the color and alignment of text depending on whether the document is a text, spreadsheet, or presentation. Rob Weir (an
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
employee and co-chair of the
OASIS In ecology, an oasis (; : oases ) is a fertile area of a desert or semi-desert environmentOpenDocument Format The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML ...
TC) asks "What is the engineering justification for this horror?". He contrasts with
OpenDocument The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processor, word processing documents, spreadsheets, Presentation program, presentations and ...
: "ODF uses the W3C's XSL-FO vocabulary for text styling, and uses this vocabulary consistently". Some have argued the design is based too closely on Microsoft applications. In August 2007, the
Linux Foundation The Linux Foundation (LF) is a non-profit organization established in 2000 to support Linux development and open-source software projects. Background The Linux Foundation started as Open Source Development Labs in 2000 to standardize and prom ...
published a blog post calling upon ISO National Bodies to vote "No, with comments" during the International Standardization of OOXML. It said, "OOXML is a direct port of a single vendor's binary document formats. It avoids the re-use of relevant existing international standards (e.g. several cryptographic algorithms, VML, etc.). There are literally hundreds of technical flaws that should be addressed before standardizing OOXML including continued use of binary code tied to platform specific features, propagating bugs in MS-Office into the standard, proprietary units, references to proprietary/confidential tags, unclear IP and patent rights, and much more". The version of the standard submitted to
JTC 1 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and ...
was 6546 pages long. The need and appropriateness of such length has been questioned.
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
stated that "the ODF standard, which achieves the same goal, is only 867 pages"


WordprocessingML (WML)

Word processing documents use the XML vocabulary known as WordprocessingML normatively defined by the schema wml.xsd which accompanies the standard. This vocabulary is defined in clause 11 of Part 1.


SpreadsheetML (SML)

Spreadsheet documents use the XML vocabulary known as SpreadsheetML normatively defined by the schema sml.xsd which accompanies the standard. This vocabulary is described in clause 12 of Part 1. Each worksheet in a spreadsheet is represented by an XML document with a root element named in the Namespace. The representation of date and time values in SpreadsheetML has attracted some criticism. ECMA-376 1st edition does not conform to ISO 8601:2004 "Representation of Dates and Times". It requires that implementations replicate a
Lotus 1-2-3 Lotus 1-2-3 is a discontinued spreadsheet program from Lotus Software (later part of IBM). It was the first killer application of the IBM PC, was hugely popular in the 1980s, and significantly contributed to the success of IBM PC-compatibles ...
bug that erroneously treats 1900 as a leap year. Products complying with ECMA-376 would be required to use the WEEKDAY() spreadsheet function, and therefore assign incorrect dates to some days of the week, and also miscalculate the number of days between certain dates. ECMA-376 2nd edition (ISO/IEC 29500) allows the use of 8601:2004 "Representation of Dates and Times" in addition to the Lotus 1-2-3 bug-compatible form.


Office MathML (OMML)

Office Math Markup Language is a mathematical markup language which can be embedded in WordprocessingML, with intrinsic support for including word processing markup like revision markings, footnotes, comments, images and elaborate formatting and styles. The OMML format is different from the
World Wide Web Consortium The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
(W3C)
MathML Mathematical Markup Language (MathML) is a pair of mathematical markup languages, an application of XML for describing mathematical notations and capturing both its structure and content. Its aim is to natively integrate mathematical formulae ...
recommendation that does not support those office features, but is partially compatible through
XSL Transformations XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text, or XSL Formatting Objects. These formats ca ...
; tools are provided with office suite and are automatically used via clipboard transformations. The following Office MathML example defines the
fraction A fraction (from , "broken") represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight-fifths, thre ...
: \frac π 2 Some have queried the need for Office MathML (OMML) instead advocating the use of
MathML Mathematical Markup Language (MathML) is a pair of mathematical markup languages, an application of XML for describing mathematical notations and capturing both its structure and content. Its aim is to natively integrate mathematical formulae ...
, a
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
recommendation for the "inclusion of mathematical expressions in Web pages" and "machine to machine communication". Murray Sargent has answered some of these issues in a blog post, which details some of the philosophical differences between the two formats.


DrawingML

DrawingML is the
vector graphics markup language An image file format is a file format for a digital image. There are many formats that can be used, such as JPEG, PNG, and GIF. Most formats up until 2022 were for storing 2D images, not 3D ones. The data stored in an image file format may be c ...
used in Office Open XML documents. Its major features are the graphics rendering of text elements, graphical vector-based shape elements, graphical tables and charts. The DrawingML table is the third table model in Office Open XML (next to the table models in WordprocessingML and SpreadsheetML) and is optimized for graphical effects and its main use is in presentations created with PresentationML markup. DrawingML contains graphics effects (like shadows and reflection) that can be used on the different graphical elements that are used in DrawingML. In DrawingML you can also create 3d effects, for instance to show the different graphical elements through a flexible camera viewpoint. It is possible to create separate DrawingML theme parts in an Office Open XML package. These themes can then be applied to graphical elements throughout the Office Open XML package. DrawingML is unrelated to the other
vector graphics Vector graphics are a form of computer graphics in which visual images are created directly from geometric shapes defined on a Cartesian plane, such as points, lines, curves and polygons. The associated mechanisms may include vector displ ...
formats such as SVG. These can be converted to DrawingML to include natively in an Office Open XML document. This is a different approach to that of the
OpenDocument The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processor, word processing documents, spreadsheets, Presentation program, presentations and ...
format, which uses a subset of SVG, and includes vector graphics as separate files. A DrawingML graphic's dimensions are specified in ''English Metric Units'' (EMUs). It is so called because it allows an exact common representation of dimensions originally in either English or
metric Metric or metrical may refer to: Measuring * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics ...
units—defined as 1/360,000 of a
centimeter upright=1.35, Different lengths as in respect to the electromagnetic spectrum, measured by the metre and its derived scales. The microwave is in-between 1 meter to 1 millimeter. A centimetre (International spelling) or centimeter (American ...
, and thus there are 914,400 EMUs per
inch The inch (symbol: in or prime (symbol), ) is a Units of measurement, unit of length in the imperial units, British Imperial and the United States customary units, United States customary System of measurement, systems of measurement. It is eq ...
, and 12,700 EMUs per point, to prevent round-off in calculations.
Rick Jelliffe Richard (Rick) Alan Jelliffe (born 1960) is an Australian programmer and standards activist (ISO, W3C, IETF), particularly associated with web standards, markup languages, internationalization and schema languages. He is the founder and Chief ...
favors EMUs as a rational solution to a particular set of design criteria. Some have criticised the use of DrawingML (and the transitional-use-only
VML VML may refer to: * Varnish microlamination, a dating methodology * Vastus medialis longus, the muscle * Vector Markup Language, an obsolete XML-based file format for two-dimensional vector graphics * Veturimiesten liitto, a trade union represen ...
) instead of
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
recommendation SVG. VML did not become a W3C recommendation.


Foreign resources


Non-XML content

OOXML documents are typically composed of other resources in addition to XML content (graphics, video, etc.). Some have criticised the choice of permitted format for such resources: ECMA-376 1st edition specifies "Embedded Object Alternate Image Requests Types" and "Clipboard Format Types", which refer to
Windows Metafile Windows Metafile (WMF) is an image file format originally designed for Microsoft Windows in the 1990s. The original Windows Metafile format was not device-independent (though could be made more so with placement headers) and may contain both vector ...
s or
Enhanced Metafile Windows Metafile (WMF) is an image file format originally designed for Microsoft Windows in the 1990s. The original Windows Metafile format was not device-independent (though could be made more so with placement headers) and may contain both vector ...
s – each of which are proprietary formats that have hard-coded dependencies on
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
itself. The critics state the standard should instead have referenced the platform neutral standard ISO/IEC 8632 "
Computer Graphics Metafile Computer Graphics Metafile (CGM) is a free and open international standard file format for 2D vector graphics, raster graphics, and text, and is defined by ISO/ IEC 8632. Overview All graphical elements can be specified in a textual source fi ...
".


Foreign markup

The Standard provides three mechanisms to allow foreign markup to be embedded within content for editing purposes: * Smart tags * Custom XML markup * Structured Document Tags These are defined in clause 17.5 of Part 1.


Compatibility settings

Versions of Office Open XML contain what are termed "compatibility settings". These are contained in Part 4 ("Markup Language Reference") of ECMA-376 1st Edition, but during standardization were moved to become a new part (also called Part 4) of ISO/IEC 29500:2008 ("Transitional Migration Features"). These settings (including element with names such as ''autoSpaceLikeWord95'', ''footnoteLayoutLikeWW8'', ''lineWrapLikeWord6'', ''mwSmallCaps'', ''shapeLayoutLikeWW8'', ''suppressTopSpacingWP'', ''truncateFontHeightsLikeWP6'', ''uiCompat97To2003'', ''useWord2002TableStyleRules'', ''useWord97LineBreakRules'', ''wpJustification'' and ''wpSpaceWidth'') were the focus of some controversy during the standardisation of DIS 29500. As a result, new text was added to ISO/IEC 29500 to document them. An article in ''
Free Software Magazine ''Free Software Magazine'' (also known as ''FSM'' and originally titled ''The Open Voice'') is a Web site that produces a (generally bi-monthly) mostly free-content online magazine about free software. It was started in November 2004 by Austral ...
'' has criticized the markup used for these settings. Office Open XML uses distinctly named elements for each compatibility setting, each of which is declared in the schema. The repertoire of settings is thus limited — for new compatibility settings to be added, new elements may need to be declared, "potentially creating thousands of them, each having nothing to do with interoperability".


Extensibility

The standard provides two types of extensibility mechanism, Markup Compatibility and Extensibility (MCE) defined in Part 3 (ISO/IEC 29500-3:2008) and Extension Lists defined in clause 18.2.10 of Part 1.


References

{{Office document file formats * Computer-related introductions in 2006 XML Document-centric XML-based standards Markup languages Open formats Computer file formats Microsoft Office