A document file format is a
text
Text may refer to:
Written word
* Text (literary theory)
In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothi ...
or
binary file
A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files ...
format for storing
document
A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of nonfiction, non-fictional, as well as fictional, content. The word originates from the Latin ', which denotes ...
s on a
storage media, especially for use by
computer
A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
s.
There currently exists a multitude of incompatible document file formats.
Examples of
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
-based
open
Open or OPEN may refer to:
Music
* Open (band), Australian pop/rock band
* The Open (band), English indie rock band
* ''Open'' (Blues Image album), 1969
* ''Open'' (Gerd Dudek, Buschi Niebergall, and Edward Vesala album), 1979
* ''Open'' (Go ...
standards are
DocBook
DocBook is a Semantics (computer science), semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of docume ...
,
XHTML
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
While HTML, pr ...
, and, more recently, the
ISO
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
/
IEC standards
OpenDocument
The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processor, word processing documents, spreadsheets, Presentation program, presentations and ...
(ISO 26300:2006) and
Office Open XML (ISO 29500:2008).
In 1993, the
ITU-T
The International Telecommunication Union Telecommunication Standardization Sector (ITU-T) is one of the three Sectors (branches) of the International Telecommunication Union (ITU). It is responsible for coordinating Standardization, standards fo ...
tried to establish a standard for document file formats, known as the
Open Document Architecture (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.
Page description languages such as
PostScript
PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...
and
PDF
Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
have become the ''
de facto'' standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of
ISO
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
/
IEC standards for PDF began to be published, including the specification for PDF itself,
ISO-32000.
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
is the most used and open international standard and it is also used as document file format. It has also become
ISO
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
/
IEC standard (ISO 15445:2000).
The default binary file format used by
Microsoft Word
Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...
(
.doc) has become a widespread ''
de facto'' standard for office documents, but it is a
proprietary format and is not always fully supported by other word processors.
Common document file formats
Below is a list of some of the more common document file formats, common
file extensions used by the formats in parentheses:
*
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
,
UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8.
UTF-8 supports all 1,112,0 ...
(.txt, others) — any of some plain text encodings that may have differing
line endings depending on what system they were created or edited on
*
Amigaguide (.guide) — a
hypertext
Hypertext is E-text, text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typic ...
document format designed for the
Amiga
Amiga is a family of personal computers produced by Commodore International, Commodore from 1985 until the company's bankruptcy in 1994, with production by others afterward. The original model is one of a number of mid-1980s computers with 16-b ...
that is used to document Amiga programs
*
Microsoft Word
Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...
(.doc, .docx) — structural binary (
.doc) and XML-based text formats (
.docx) developed primarily by Microsoft, both of which are subject to the
Microsoft Open Specification Promise and are used to store
word processing A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features.
Word processor (electronic device), Early word processors were stand-alone devices dedicate ...
documents
*
DjVu (.djv, .djvu) — a file format designed primarily to store scanned documents, especially ones containing a mixture of text, line drawings, and images
*
DocBook
DocBook is a Semantics (computer science), semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of docume ...
(.dbk, .xml) — an XML-based format intended for writing technical documentation
*
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
(.html, .htm) — an
ad hoc
''Ad hoc'' is a List of Latin phrases, Latin phrase meaning literally for this. In English language, English, it typically signifies a solution designed for a specific purpose, problem, or task rather than a Generalization, generalized solution ...
hypertext document format originally created for the
World Wide Web
The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
, initially developed as an
open standard
An open standard is a standard that is openly accessible and usable by anyone. It is also a common prerequisite that open standards use an open license that provides for extensibility. Typically, anybody can participate in their development due to ...
by the
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
and currently being developed as one by the
WHATWG
The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, ...
*
FictionBook (.fb2, .fb3) — an open, XML-based
e-book
An ebook (short for electronic book), also spelled as e-book or eBook, is a book publication made available in electronic form, consisting of text, images, or both, readable on the flat-panel display of computers or other electronic devices. Al ...
format that originated and gained popularity in Russia
*
Markdown (.md) — a simple, plain text markup language with several different implementations that is popular on blogs and content management systems
*
OpenDocument
The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processor, word processing documents, spreadsheets, Presentation program, presentations and ...
(.odt, .fodt) — an open, XML-based standard for office documents, including word processing documents, spreadsheets, presentations, and graphics
*
OpenOffice.org XML (.sxw, .sxc, .sxd, .sxi, others) — an open, XML-based format for office documents including word processing documents, spreadsheets, presentations, graphics, and formulas
*
Open XML Paper Specification
Open XML Paper Specification (also referred to as OpenXPS) is an open specification for a page description language and a fixed-document format. Microsoft developed it as the XML Paper Specification (XPS). In June 2009, Ecma International adop ...
(.xps, .oxps) — an
XAML-based page description format designed by Microsoft (.xps), intended to compete with the Portable Document Format (PDF) and was later standardized by
Ecma International
Ecma International () is a Nonprofit organization, nonprofit standards organization for information and communication systems. It acquired its current name in 1994, when the European Computer Manufacturers Association (ECMA) changed its name to ...
as ECMA-388 (.oxps)
* PalmDOC (.pdb) — a special version of the
PDB record database format used by
Palm OS
Palm OS (also known as Garnet OS) is a discontinued mobile operating system initially developed by Palm, Inc., for personal digital assistants (PDAs) in 1996. Palm OS was designed for ease of use with a touchscreen-based graphical user interface. ...
used to store e-books and other text documents for handheld devices
*
Pages (.pages) — a document file format used to store word processing documents for
Apple
An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
's Pages app, as a part of its
iWork office suite
*
Portable Document Format
Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating syste ...
(.pdf) — a now standardized (ISO 32000), open format based on
PostScript
PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...
, developed by
Adobe in 1992, that is able to store documents, forms,
rich media, and graphics (PDF and
PDF/UA) for document exchange (
PDF/X and
PDF/VT), archival (
PDF/A
PDF/A is an International Organization for Standardization, ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archive, archiving and long-term digital preservation, preservation of electronic documents. PDF ...
), and engineering (
PDF/E)
*
PostScript
PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...
(.ps) — a page description and programming language designed by Adobe for use with printing, display systems, and storing documents
*
Rich Text Format
)
As an example, the following RTF code
would be rendered as follows:
This is some bold text.
Character encoding
A standard RTF file can only consist of 7-bit ASCII characters, but can use escape sequences to encode other characters. ...
(.rtf) — a
proprietary document format developed by Microsoft for
cross-platform
Within computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several Computing platform, computing platforms. Some ...
document interchange with Microsoft products
*
Symbolic Link
In computing, a symbolic link (also symlink or soft link) is a file whose purpose is to point to a file or directory (called the "target") by specifying a path thereto.
Symbolic links are supported by POSIX and by most Unix-like operating syste ...
(.slk) — a plain text ASCII format created by Microsoft in the 1980s to exchange data between spreadsheet applications
*
Scalable Vector Graphics (.svg) — an XML-based
vector image format for defining two-dimensional graphics that has support for animations and interactive content
*
TeX
Tex, TeX, TEX, may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Tex Earnhardt (1930–2020), U.S. businessman
* Joe Tex (1933–1982), stage name of American soul singer ...
(.tex) — a plain text format for describing complex types and page layouts that is often used for mathematical, technical, and academic publications
*
Text Encoding Initiative
The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and ma ...
(.xml) — a primarily XML-based format for semantically marking up text, used primarily in the field of
digital humanities
Digital humanities (DH) is an area of scholarly activity at the intersection of computing or Information technology, digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanitie ...
to provide detailed representations of the components and concepts that make up a document
*
troff
troff (), short for "typesetter roff", is the major component of a document processing system developed by Bell Labs for the Unix operating system. troff and the related nroff were both developed from the original roff (software), roff.
Whil ...
(.tmac, .man, others) — short for "typesetter roff", a typesetting markup language developed by
Bell Labs
Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, Murray Hill, New Jersey, the compa ...
from the original
roff program for
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
*
Uniform Office Format (.uof, .uot, .uos, .uop) — a standardized, XML-based open format designed for use with office applications developed in China, with support for word processing documents, presentations, and spreadsheets
*
WordPerfect (.wpd, .wp, .wp7) — a proprietary format now owned by
Alludo
Cascade Parent Limited, trade name, doing business as Alludo ( ), is a Canadian software company headquartered in Ottawa, Ontario, specializing in graphics processing. Formerly called the Corel Corporation ( ; from the abbreviation "Cowpland Rese ...
used to store and represent word processing documents
See also
*
List of document file formats
*
List of document markup languages
The following is a list of document markup languages. You may also find the List of markup languages of interest.
Well-known document markup languages
* HyperText Markup Language (HTML) – an ad hoc markup language that was originally created f ...
*
Comparison of document markup languages
*
Open format
An open file format is a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. An open file format is licensed with a ...
*
Word Processor A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features.
Early word processors were stand-alone devices dedicated to the function, but current word ...
*
Desktop Publishing
Desktop publishing (DTP) is the creation of documents using dedicated software on a personal ("desktop") computer. It was first used almost exclusively for print publications, but now it also assists in the creation of various forms of online co ...
*
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latices are found in nature, but synthetic latices are common as well.
In nature, latex is found as a wikt:milky, milky fluid, which is present in 10% of all floweri ...
References
External links
Lost in Translation: Interoperability Issues for Open Standards - ODF and OOXML as Examples
{{Office document file formats
Computer file formats