Document file format
   HOME

TheInfoList



OR:

A document file format is a
text Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
or
binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fil ...
format for storing
document A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" o ...
s on a storage media, especially for use by
computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
s. There currently exist a multitude of incompatible document file formats. Examples of XML-based
open Open or OPEN may refer to: Music * Open (band), Australian pop/rock band * The Open (band), English indie rock band * Open (Blues Image album), ''Open'' (Blues Image album), 1969 * Open (Gotthard album), ''Open'' (Gotthard album), 1999 * Open (C ...
standards are DocBook,
XHTML Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated. While HTML, prior ...
, and, more recently, the ISO/
IEC The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
standards
OpenDocument The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was develope ...
(ISO 26300:2006) and
Office Open XML Office Open XML (also informally known as OOXML) is a ZIP (file format), zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized th ...
(ISO 29500:2008). In 1993, the
ITU-T The ITU Telecommunication Standardization Sector (ITU-T) is one of the three sectors (divisions or units) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunications and Information Co ...
tried to establish a standard for document file formats, known as the Open Document Architecture (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.
Page description language In digital printing, a page description language (PDL) is a computer language that describes the appearance of a printed page in a higher level than an actual output bitmap (or generally raster graphics). An overlapping term is printer control la ...
s such as
PostScript PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, Do ...
and
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
have become the ''
de facto ''De facto'' ( ; , "in fact") describes practices that exist in reality, whether or not they are officially recognized by laws or other formal norms. It is commonly used to refer to what happens in practice, in contrast with '' de jure'' ("by l ...
'' standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of ISO/
IEC The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
standards for PDF began to be published, including the specification for PDF itself, ISO-32000.
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
is the most used and open international standard and it is also used as document file format. It has also become ISO/
IEC The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
standard (ISO 15445:2000). The default binary file format used by
Microsoft Word Microsoft Word is a word processor, word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other pla ...
( .doc) has become widespread ''
de facto ''De facto'' ( ; , "in fact") describes practices that exist in reality, whether or not they are officially recognized by laws or other formal norms. It is commonly used to refer to what happens in practice, in contrast with '' de jure'' ("by l ...
'' standard for office documents, but it is a
proprietary format A proprietary file format is a file format of a company, organization, or individual that contains data that is ordered and stored according to a particular encoding-scheme, designed by the company or organization to be secret, such that the decodi ...
and is not always fully supported by other word processors.


Common document file formats

*
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
,
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of e ...
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
formats *
Amigaguide AmigaGuide is a hypertext document file format designed for the Amiga. Files are stored in ASCII so it is possible to read and edit a file without the need for special software. Since Workbench 2.1 an Amiga Guide system for O.S. inline help files ...
* .doc for
Microsoft Word Microsoft Word is a word processor, word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other pla ...
— Structural binary format developed by Microsoft (specifications available since 2008 under the
Open Specification Promise The Microsoft Open Specification Promise (or OSP) is a promise by Microsoft, published in September 2006, to not assert its patents, in certain conditions, against implementations of a certain list of specifications. The OSP is not a licence, but ...
) *
DjVu DjVu ( , like French " déjà vu") is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs. It uses technologies such as im ...
— file format designed primarily to store scanned documents * DocBook — an XML format for technical documentation *
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
(.html, .htm), (open standard, ISO from 2000), in combination with possible image files referred to. * FictionBook (.fb2) — open XML-based e-book format *
Markdown Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber and Aaron Swartz created Markdown in 2004 as a markup language that is appealing to human readers in its source code form. Markdown i ...
(.md) — markup language for creating formatted text using plain text *
Office Open XML Office Open XML (also informally known as OOXML) is a ZIP (file format), zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized th ...
— .docx (XML-based standard for office documents) *
OpenDocument The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was develope ...
— .odt (XML-based standard for office documents) * OpenOffice.org XML — .sxw (open, XML-based format for office documents) * OXPS — Open XML Paper Specification (Windows 8.1 and above, older version is XPS used in Windows 7) * PalmDoc
handheld A mobile device (or handheld computer) is a computer small enough to hold and operate in the hand. Mobile devices typically have a flat LCD or OLED screen, a touchscreen interface, and digital or physical buttons. They may also have a physical ...
document format * .pages for
Pages Page most commonly refers to: * Page (paper), one side of a leaf of paper, as in a book Page, PAGE, pages, or paging may also refer to: Roles * Page (assistance occupation), a professional occupation * Page (servant), traditionally a young mal ...
*
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
— Open standard for document exchange. ISO standards include
PDF/X PDF/X is a subset of the PDF ISO standard. The purpose of PDF/X is to facilitate graphics exchange, and it therefore has a series of printing-related requirements which do not apply to standard PDF files. For example, in PDF/X-1a all fonts need ...
(eXchange),
PDF/A PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archivi ...
(Archive),
PDF/E ISO 24517-1:2008 is an ISO Standard published in 2008. * Document management—Engineering document format using PDF—Part 1: Use of PDF 1.6 (PDF/E-1) This standard defines a format (PDF/E) for the creation of documents used in geospatial, c ...
(Engineering),
ISO 32000 Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
(PDF),
PDF/UA PDF/UA (PDF/Universal Accessibility), formally ISO 14289, is an International Organization for Standardization (ISO) standard for accessible PDF technology. A technical specification intended for developers implementing PDF writing and processing ...
(Accessibility) and
PDF/VT PDF/VT is an international standard published by ISO in August 2010 as ISO 16612-2. It defines the use of PDF as an exchange format optimized for variable and transactional printing. Built on top of PDF/X-4, it is the first variable-data printi ...
(Variable data and transactional printing). PDF is readable on almost every platform with free or open source readers. Open source PDF creators are also available. *
PostScript PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, Do ...
— .ps * Rich Text Format (RTF) — meta data format being developed by Microsoft since 1987 for Microsoft products and
cross-platform In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software ...
document interchange * SYmbolic LinK (SYLK) *
Scalable Vector Graphics Scalable Vector Graphics (SVG) is an XML-based vector image format for defining two-dimensional graphics, having support for interactivity and animation. The SVG specification is an open standard developed by the World Wide Web Consortium sinc ...
(SVG) - Graphics format primarily for vector-based images. * TeX — Open-source typesetting program and format. First successful mathematical notation language. * TEI — XML format for digital publication *
Troff troff (), short for "typesetter roff", is the major component of a document processing system developed by Bell Labs for the Unix operating system. troff and the related nroff were both developed from the original roff. While nroff was inte ...
*
Uniform Office Format Uniform Office Format (UOF; Chinese 标文通, literally "standard text general"), sometimes known as Unified Office Format, is an open standard for office applications developed in China. It includes word processing, presentation, and spreadsh ...
— Chinese standard * WordPerfect (.wpd, .wp, .wp7, .doc) (Note: possible confusion with Word format extension)


See also

*
List of document file formats This is a list of file formats used by computers, organized by type. Filename extension it is usually noted in parentheses if they differ from the file format name or abbreviation. Many operating systems do not limit filenames to one extension s ...
*
List of document markup languages The following is a list of document markup languages. You may also find the List of markup languages of interest. Well-known document markup languages * HyperText Markup Language (HTML) – the original markup language that was defined as a part ...
* Comparison of document markup languages *
Open format An open file format is a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. Open file format is licensed with open li ...
*
Word Processor A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features. Early word processors were stand-alone devices dedicated to the function, but current ...
*
Desktop Publishing Desktop publishing (DTP) is the creation of documents using page layout software on a personal ("desktop") computer. It was first used almost exclusively for print publications, but now it also assists in the creation of various forms of online ...
*
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...


References


External links


Lost in Translation: Interoperability Issues for Open Standards - ODF and OOXML as Examples
{{Office document file formats Computer file formats