file format
A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.
Some file formats ...
that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes styled ''ePub''. EPUB is supported by many
e-reader
An e-reader, also called an e-book reader or e-book device, is a mobile electronic device that is designed primarily for the purpose of reading digital e-books and periodicals.
Any device that can display text on a screen may act as an e-re ...
s, and compatible software is available for most smartphones, tablets, and computers. EPUB is a technical standard published by the International Digital Publishing Forum (IDPF). It became an official standard of the IDPF in September 2007, superseding the older
Open eBook
Open eBook (OEB), or formally, the Open eBook Publication Structure (OEBPS), is a legacy e-book format which has been superseded by the EPUB format. It was "based primarily on technology developed by SoftBook Press". and on XML. OEB was released wi ...
(OEB) standard.
The
Book Industry Study Group
The Book Industry Study Group, Inc. (BISG) is a U.S. trade association for policy, technical standards and research related to books and similar products. The mission of BISG is to simplify logistics for publishers, manufacturers, suppliers, whol ...
endorses EPUB 3 as the format of choice for packaging content and has stated that the global book publishing industry should rally around a single standard. The EPUB format is implemented as an archive file consisting of XHTML files carrying the content, along with images and other supporting files. EPUB is the most widely supported vendor-independent XML-based e-book format; that is, it is supported by almost all hardware readers.
History
A successor to the Open eBook Publication Structure, EPUB 2.0 was approved in October 2007, with a maintenance update (2.0.1) approved in September 2010.
The EPUB 3.0 specification became effective in October 2011, superseded by a minor maintenance update (3.0.1) in June 2014. New major features include support for precise layout or specialized formatting (Fixed Layout Documents), such as for comic books, and MathML support. The current version of EPUB is 3.2, effective May 8, 2019. The (text of) format specification underwent reorganization and clean-up; format supports remotely hosted resources and new font formats ( WOFF 2.0 and
SFNT
SFNT is a font file format which can contain other fonts, such as PostScript, TrueType, OpenType, Web Open Font Format (WOFF) fonts and other. SFNT stands for '' spline font'' or ''scalable font'', and was originally developed for TrueType fonts o ...
CSS
Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a document written in a markup language such as HTML or XML (including XML dialects such as SVG, MathML or XHTML). CSS is a cornerstone techno ...
.
In May 2016
IDPF
The International Digital Publishing Forum (IDPF) was a trade and standards association for the digital publishing industry, set up to establish a standard for electronic book publishing. It was responsible for the EPUB standard currently used b ...
members approved World Wide Web Consortium (W3C) merger, "to fully align the publishing industry and core Web technology".
Version 2.0.1
EPUB 2.0 was approved in October 2007, with a maintenance update (2.0.1) intended to clarify and correct errata in the specifications being approved in September 2010. EPUB version 2.0.1 consists of three specifications:
* ''Open Publication Structure'' (OPS) 2.0.1, contains the formatting of its content.
* ''Open Packaging Format'' (OPF) 2.0.1, describes the structure of the .epub file in XML.
* ''Open Container Format'' (OCF) 2.0.1, collects all files as a
ZIP
Zip, Zips or ZIP may refer to:
Common uses
* ZIP Code, USPS postal code
* Zipper or zip, clothing fastener
Science and technology Computing
* ZIP (file format), a compressed archive file format
** zip, a command-line program from Info-ZIP
* Zi ...
DTBook DTBook (an acronym for ''DAISY Digital Talking Book'') or DAISY XML is a XML-based document file format. It is used in EPUB 2.0 e-books and DAISY Digital Talking Book, as well as other places. Unlike other document file formats such as ODF DTBook ...
(an XML standard provided by the DAISY Consortium) to represent the text and structure of the content document, and a subset of
CSS
Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a document written in a markup language such as HTML or XML (including XML dialects such as SVG, MathML or XHTML). CSS is a cornerstone techno ...
to provide layout and formatting. XML is used to create the document manifest, table of contents, and EPUB
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
. Finally, the files are bundled in a
zip
Zip, Zips or ZIP may refer to:
Common uses
* ZIP Code, USPS postal code
* Zipper or zip, clothing fastener
Science and technology Computing
* ZIP (file format), a compressed archive file format
** zip, a command-line program from Info-ZIP
* Zi ...
file as a packaging format.
Open Publication Structure 2.0.1
An EPUB file uses XHTML 1.1 (or DTBook) to construct the content of a book as of version 2.0.1. This is different from previous versions (OEBPS 1.2 and earlier), which used a subset of XHTML. There are, however, a few restrictions on certain elements. The
mimetype
A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication o ...
for XHTML documents in EPUB is application/xhtml+xml.
Styling and layout are performed using a subset of CSS 2.0, referred to as ''OPS Style Sheets''. This specialized syntax requires that reading systems support only a portion of CSS properties and adds a few custom properties. Custom properties include oeb-page-head, oeb-page-foot, and oeb-column-number. Font-embedding can be accomplished using the @font-face property, as well as including the font file in the OPF's manifest (see below). The
mimetype
A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication o ...
for CSS documents in EPUB is text/css.
EPUB also requires that PNG,
JPEG
JPEG ( ) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and imag ...
,
GIF
The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released on 15 June 1987. ...
, and SVG images be supported using the mimetypesimage/png, image/jpeg, image/gif, image/svg+xml. Other media types are allowed, but creators must include alternative renditions using supported types. For a table of all required mimetypes, se Section 1.3.7 of the specification.
Unicode is required, and content producers must use either UTF-8 or UTF-16 encoding. This is to support international and multilingual books. However, reading systems are not required to provide the fonts necessary to display every Unicode character, though they are required to display at least a placeholder for characters that cannot be displayed fully.
An example skeleton of an XHTML file for EPUB looks like this:
Pride and Prejudice
...
Open Packaging Format 2.0.1
The OPF specification's purpose is to " efinethe mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication". This is accomplished by two XML files with the extensions .opf and .ncx.
; .opf file
The OPF file, traditionally named content.opf, houses the EPUB book's
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
, file manifest, and linear reading order. This file has a root element package and four child elements: metadata, manifest, spine, and guide. Furthermore, the package node must have the unique-identifier attribute. The .opf file's mimetype is application/oebps-package+xml.
The metadata element contains all the metadata information for a particular EPUB file. Three metadata tags are required (though many more are available): title, language, and identifier. title contains the title of the book, language contains the language of the book's contents in RFC 3066 format ''or'' its successors, such as the newer RFC 4646 and identifier contains a unique identifier for the book, such as its ISBN or a
URL
A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifie ...
. The identifier's id attribute should equal the unique-identifier attribute from the package element.
The manifest element lists all the files contained in the package. Each file is represented by an item element, and has the attributes id, href, media-type. All XHTML (content documents), stylesheets, images or other media, embedded fonts, and the NCX file should be listed here. Only the .opf file itself, the container.xml, and the mimetype files should not be included.
The spine element lists all the XHTML content documents in their linear reading order. Also, any content document that can be reached through linking or the table of contents must be listed as well. The toc attribute of spine must contain the id of the NCX file listed in the manifest. Each itemref element's idref is set to the id of its respective content document.
The guide element is an optional element for the purpose of identifying fundamental structural components of the book. Each reference element has the attributes type, title, href. Files referenced in href must be listed in the manifest, and are allowed to have an element identifier (e.g. #figures in the example).
An example OPF file:
Pride and Prejudiceen123456789XJane Austen
; .ncx file
The NCX file (Navigation Control file for XML), traditionally named toc.ncx, contains the hierarchical table of contents for the EPUB file. The specification for NCX was developed for Digital Talking Book (DTB), is maintained by the
DAISY Consortium
Daisy, Daisies or DAISY may refer to:
Plants
* ''Bellis perennis'', the common daisy, lawn daisy or English daisy, a European species
Other plants known as daisy
* Asteraceae, daisy family
** ''Euryops chrysanthemoides'', African bush daisy
** '' ...
, and is not a part of the EPUB specification. The NCX file has a mimetype of application/x-dtbncx+xml.
Of note here is that the values for the docTitle, docAuthor, and meta name="dtb:uid" elem