LaTeXML is a free
public domain
The public domain (PD) consists of all the creative work
A creative work is a manifestation of creative effort including fine artwork (sculpture, paintings, drawing, sketching, performance art), dance, writing (literature), filmmaking, ...
software package which converts
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
documents to
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
,
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
,
EPUB
EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes styled ''ePub''. EPUB is supported by many e-readers, and compatible software is available for most smartphones ...
,
JATS
The Jat people ((), ()) are a traditionally agricultural community in Northern India and Pakistan. Originally pastoralists in the lower Indus river-valley of Sindh, Jats migrated north into the Punjab region in late medieval times, and subse ...
and
TEI.
Workflow
LaTeXML's primary output format is an XML representation of (La)
TeX
Tex may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr.
Entertainment
* ''Tex'', the Italian ...
's document model. A postprocessor can convert these XML documents into other structured formats. Common use cases create
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
with mathematical formulas as images or
XHTML
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
While HTML, prior ...
,
HTML5
HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
, and
EPUB
EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes styled ''ePub''. EPUB is supported by many e-readers, and compatible software is available for most smartphones ...
with formulas as
MathML
Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content. It aims at integrating mathematical formulae into World Wide Web ...
. Compared to other LaTeX-to-XML processors, LaTeXML aims to conserve the semantic structures of the
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
markup. This makes it a good basis for semantic services lik
Math search
Conversion times range from 30 milliseconds for a single formula (in the LaTeXML daemon) to minutes for book-size documents.
History
LaTeXML was started in the context of the
Digital Library of Mathematical Functions
The Digital Library of Mathematical Functions (DLMF) is an online project at the National Institute of Standards and Technology (NIST) to develop a database of mathematical reference data for special functions and their applications. It is intend ...
at
NIST
The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical sci ...
, where
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
documents needed to be prepared for publication on the Web. The system has been under active development for over a decade, and has attracted a small, but dedicated community of developers and users centered on Bruce Miller, the original project author.
The current released version is LaTeXML 0.8.7. It was released in December 2022, and development remains active on th
public repository
Notable usage
LaTeXML was used to convert 90% (60% without errors) of 530,000 documents from the
arXiv
arXiv (pronounced "archive"—the X represents the Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer review. It consists of ...
to XML. As a result of this ongoing effort for enhancing coverage, LaTeXML supports a large range of LaTeX packages. The ACL 2014 conference used LaTeXML to convert submitted papers to XML. This followed existing work which has been trying to convert the ACL Anthology papers to high-quality semantic markup for further analysis. Since February, 2013, LaTeXML has been used as to render the web pages on the peer produced mathematics website,
PlanetMath
PlanetMath is a free, collaborative, mathematics online encyclopedia. The emphasis is on rigour, openness, pedagogy, real-time content, interlinked content, and also community of about 24,000 people with various maths interests. Intended to be c ...
. Since July, 2015, it was adopted by
Authorea
Authorea is an online collaborative writing tool that allows researchers to write, cite, collaborate, host data and publish. It has been described as "Google Docs for Scientists".
It has been owned by the commercial publishing company Wiley throug ...
for their advanced LaTeX support. In 2018, the second data release of the European Space Agency's
Gaia
In Greek mythology, Gaia (; from Ancient Greek , a poetical form of , 'land' or 'earth'),, , . also spelled Gaea , is the personification of the Earth and one of the Greek primordial deities. Gaia is the ancestral mother—sometimes parthenog ...
project was realized via LaTeXML.
In February of 2022,
arXiv
arXiv (pronounced "archive"—the X represents the Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer review. It consists of ...
announced an experimental service based on LaTeXML, offering 1.78 million documents as HTML5. A LaTeXML developer claimed successful conversion of 74% of arXiv, with 97% of articles "at least partially viewable". As of the start of 2024, that experiment has been promoted to arXiv's main article pages.
Implementation
The core of LaTeXML is a
Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
reimplementation of
TeX
Tex may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr.
Entertainment
* ''Tex'', the Italian ...
's parsing and digestion algorithm coupled with a customizable XML emitter. To conserve the semantic structures in the
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
markup, LaTeXML needs XML bindings for all
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
packages with high-level macro definitions. The LaTeXML distribution currently provides XML bindings for over 200 commonly used LaTeX packages such as
AMSTeX
AMS-LaTeX is a collection of LaTeX document classes and packages developed for the American Mathematical Society (AMS). Its additions to LaTeX include the typesetting of multi-line and other mathematical statements, document classes, and fonts co ...
, Babel
and
PGF/TikZ
PGF/Ti''k''Z is a pair of languages for producing vector graphics (e.g., technical illustrations and drawings) from a geometric/algebraic description, with standard features including the drawing of points, lines, arrows, paths, circles, ellipse ...
(which only has experimental support).
The LaTeXML conversion consists of two stages:
* the first one parses
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
and converts that into a
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
-near XML document type, and
* the second (post-processing) transforms the XML into one of the standardized structured output formats.
LaTeXML 0.8 added daemon functionality which enabled multiple conversions and easy embedding into web services.
LaTeXML 0.8.7 was the first version emitting the "
MathML
Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content. It aims at integrating mathematical formulae into World Wide Web ...
Core" markup language for mathematical syntax, new in MathML 4.
See also
*
pdfTeX
__NOTOC__
The computer program pdfTeX is an extension of Knuth's typesetting program TeX, and was originally written and developed into a publicly usable product by Hàn Thế Thành as a part of the work for his PhD thesis at the Faculty of In ...
References
External links
Official Homepage for LaTeXMLLaTeXML source codeLaTeXML web server, services, and demos
{{LaTeX navbox
Free TeX software
Free mathematics software
Public-domain software with source code
MathML
TeX software for Windows
TeX software for macOS
Free software programmed in Perl