The Journal Article Tag Suite (JATS) is format used to
describe scientific literature
Scientific literature encompasses a vast body of academic papers that spans various disciplines within the natural and social sciences. It primarily consists of academic papers that present original empirical research and theoretical ...
published online. It is a
technical standard
A technical standard is an established Social norm, norm or requirement for a repeatable technical task which is applied to a common and repeated use of rules, conditions, guidelines or characteristics for products or related processes and producti ...
developed by the
National Information Standards Organization
The National Information Standards Organization (NISO; ) is a United States non-profit standards organization that develops, maintains and publishes technical standards related to publishing, bibliographic and library applications. It was found ...
(NISO) and approved by the
American National Standards Institute
The American National Standards Institute (ANSI ) is a private nonprofit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organiz ...
with the code Z39.96-2012.
The NISO project was a continuation of the work done by
NLM/NCBI, and popularized by the NLM's
PubMed Central
PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Cente ...
as a
''de facto'' standard for archiving and interchange of
scientific open-access journals and its contents with
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
.
With the NISO standardization the NLM initiative has gained a wider reach, and several other repositories, such as
SciELO
SciELO (Scientific Electronic Library Online) is a bibliographic database, digital library, and cooperative electronic publishing model of open access journals. SciELO was created to meet the scientific communication needs of developing countrie ...
and
Redalyc
The Scientific Information System Redalyc is a bibliographic database and a digital library of Open Access journal
Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are del ...
, adopted the XML formatting for
scientific articles
Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...
.
The JATS provides a set of XML elements and attributes for describing the textual and graphical content of journal articles
as well as some non-article material such as letters, editorials, and book and product reviews.
JATS allows for descriptions of the full article content or just the article header metadata;
and allows other kinds of contents, including research and non-research articles, letters, editorials, and book and product reviews.
History
Since its introduction, NCBI's NLM Archiving and Interchange
DTD suite has become the
''de facto'' standard for journal article markup in
scholarly publishing
Academic publishing is the subfield of publishing
Publishing is the activities of making information, literature, music, software, and other content, physical or digital, available to the public for sale or free of charge. Traditionally, t ...
. With the introduction of NISO JATS, it has been elevated to a
true standard.
Even without public data interchange, the advantages of NISO JATS adoption affords publishers in terms of streamlining production workflows and optimizing system interoperability.
Timeline
; NLM JATS
: NLM JATS, version 1
:*
:*
[
: NLM JATS, version 2
:* ][
:* ]
:* [
:* ][
: NLM JATS, version 3
:* ][
; NISO JATS
: NISO JATS, version 1.0
:* ]
:*
:* [
:* ][
: NISO JATS, version 1.1
:*
:*
:*
:* ]
:* [
:* ][
: NISO JATS, version 1.2
:*
:*
:* ]
: NISO JATS, version 1.3
:*
Technical scope
By design, this is a model for journal articles, such as the typical research article found in an STM journal, and not a model for complete journals.
Tag sets
There are three tag sets:
; Journal ''Archiving'' and Interchange ()
: "The most permissive of the Tag Sets," primarily intended for the capture and archiving of extant journal data.
; Journal ''Publishing'' ()
: "A moderately prescriptive Tag Set,"[ intended for general use in journal production and publication.
: Formally this model is a subset of the ''Archiving'' model. This is the most frequently used JATS variant.
; Article ''Authoring'' ()
: "The most prescriptive ightest and smallestof the Tag Sets,"][ intended for the relatively lightweight creation of journal articles valid to JATS.
: Formally this model a subset of the ''Publishing'' model.
Document type definitions (also released in the form of ]RELAX NG
In computing, RELAX NG (REgular LAnguage for XML Next Generation) is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also ...
and XML schema
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constrai ...
) define each set and incorporate other standards such as MathML
Mathematical Markup Language (MathML) is a pair of mathematical markup languages, an application of XML for describing mathematical notations and capturing both its structure and content. Its aim is to natively integrate mathematical formulae ...
and XHTML Tables (although not in the XHTML
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
While HTML, pr ...
namespace
In computing, a namespace is a set of signs (''names'') that are used to identify and refer to objects of various kinds. A namespace ensures that all of a given set of objects have unique names so that they can be easily identified.
Namespaces ...
).
Document structure
JATS ''Publishing'' set defines a document that is a top-level component of a journal such as an article, a book or product review, or a letter to the editor. Each such document is composed of front matter (required) and up to three optional parts.[ These must appear in the following order:
; Front matter
: The article front matter contains the metadata for the article (also called article header information), for example, the article title, the journal in which it appears, the date and issue of publication for that issue of that journal, a copyright statement, etc. Both article-level and issue-level metadata (in the element ]
) and journal-level metadata (in the element
) may be captured.
; Body (of the article)
: The body of the article is the main textual and graphic content of the article. This usually consists of paragraphs and sections, which may themselves contain figures, tables, sidebars (boxed text), etc. The body of the article is optional to accommodate those repositories that just keep article header information and do not tag the textual content.
; Back matter
: If present, the article back matter contains information that is ancillary to the main text, such as a glossary, appendix, or list of cited references.
; Floating material
: A publisher may choose to place all the floating objects in an article and its back matter (such as tables, figures, boxed text sidebars, etc.) into a separate container element outside the narrative flow for convenience of processing.[
Following the front, body, back, and floating material, there may be either one or more responses to the article or one or more subordinate articles.][
]
Example
This is the minimal article's structure,
...
...
...
The DOCTYPE
header is optional, a legacy from SGML
The Standard Generalized Markup Language (SGML; International Organization for Standardization, ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on t ...
and DTD-oriented validators. The dtd-version
attribute can be used even without a DTD header.
The root element article
is common for any version of JATS or "JATS family", as NLM DTDs. The rules for front
, body
and back
tags validation, depends on the JATS version, but all versions have similar structure, with good compatibility in a range of years. The evolution of the schema preserves an overall stability.
Less common, "only front
", "only front
and back
" variations are also used for other finalities than full-content representation. The general article composition (as an DTD-content expression) is
(front, body?, back?, floats-group?, (sub-article* , response*))
Tools
There are a variety of tools for create, edit, convert and transform JATS.
They range from simple forms to complete conversion automation:
Conversion to JATS
Take as input a scientific document, and, with some human support, produce a JATS output.
* OpenOffice OpenOffice or open office may refer to:
Computing Software
* OpenOffice.org (OOo), a discontinued open-source office software suite, originally based on StarOffice
* Apache OpenOffice (AOO), a derivative of OOo by the Apache Software Foundation, ...
(LibreOffice
LibreOffice () is a free and open-source office productivity software suite developed by The Document Foundation (TDF). It was created in 2010 as a fork of OpenOffice.org, itself a successor to StarOffice. The suite includes applications ...
) and MS Word documents to JATS:
** Typeset: provides automated set of converters fo
MS-Word to JATS XML
**''OxGarage'':[http://www.oucs.ox.ac.uk/oxgarage/ ]
documentation
can convert documents from various formats into "National Library of Medicine (NLM) DTD 3.0".
**''meTypeset'': meTypeset "is a fork of the OxGarage stack" "to convert from Microsoft Word .docx format to NLM/JATS-XML".
**''eXtyles'': automates time-consuming aspects of document editing in Microsoft Word and exports to JATS XML (as well as many other DTDs).
* Markdown
Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber created Markdown in 2004 as an easy-to-read markup language. Markdown is widely used for blogging and instant messaging, and also used ...
to JATS: Pandoc
Pandoc is a free-software document converter, widely used as a writing tool (especially by scholars)- - - and as a basis for publishing workflows. It was created by John MacFarlane, a philosophy professor at the University of California, Berk ...
2.0 can convert a number of input formats to JATS.
* PDF to JATS: this is a very difficult problem to solve. Success depends on how well structured your PDFs are and, for batch conversion, how consistently structured your PDFs are.
** Shabash Merops
**Typeset'
PDF to JATS XML Converter
** The ''Public Knowledge Project'' is developing a pipeline for converting PDF to JATS. It will include use of ''pdfx''.
** CERMINE Content ExtRactor and MINEr
Conversion from JATS
Take JATS as input, produce another kind of document as output.
* from JATS to HTML
** JATS Preview Stylesheets (canonical XSLT
XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text, or XSL Formatting Objects. These formats c ...
conversion), see classical (2013) conversor.
** ''eLife Lens'' converts NLM XML to JSON for displaying using HTML and Javascript.
* from JATS to PDF: some JATS Preview Stylesheets, XSLT + XSL-FO conversion.
* from JATS to EPUB.
* Generic (from JATS DTD): ''DtdAnalyzer'' — compare JATS with other DTDs and helps into create a XML representation, XSLT and Schematron generation, and other tools.
Editors
*Typeset provides a WYSIWYM
In computing, What You See Is What You Mean (WYSIWYM, ) is a paradigm for editing a structured document. It is an adjunct to the WYSIWYG (What You See Is What You Get) paradigm, which displays the result of a formatted document as it will appear ...
editor for scholarly articles. Supports XML exports in NISO JATS and NLM JATS standards. It is mostly used by Journals and Publishers looking to convert author submitted MS-Word files to XML, PDF, HTML and ePuB.
*JATS Framework for oXygen XML Editor: users of oXygen XML Editor and oXygen XML Author can now install support for current versions of NISO JATS (and as a bonus, NLM BITS). Based on an identifier given in a DOCTYPE declaration, oXygen will detect that you are editing a JATS document and provide stylesheets and utilities.
* FontoXML for JATS: WYSIWYS editor for editing and reviewing JATS content:
* PubRef "Pipeline": Browser-based realtime-preview JATS editor:
* ''Annotum'': a WordPress theme that contains WYSIWYG authoring in JATS (Kipling subset), peer-review and editorial management, and publishing.
* JATS edition for web-based XML editor Xeditor.
* ''Texture Editor'' of the Substance Consortium. The first online "born to JATS" editor.
*Libero Editor, developed by eLife
''eLife'' is a not-for-profit, peer-reviewed, open access, scientific journal, science publisher for the Biomedicine, biomedical and life sciences. It was established at the end of 2012 by the Howard Hughes Medical Institute, Max Planck Society, ...
describes itself as 'A user-friendly editing interface designed for publishing staff and authors for the production of high-quality JATS XML.'
Preview
Tools that render JATS as HTML, usually on fly.
* JATS Preview Stylesheets: the JATS Preview Stylesheets are a series of .xsl, .xpl, .css, and .sch files that will create .html or .pdf versions of valid NISO Z39.96-2012 JATS 1.0 files. It is primarily intended for internal use by publishers and a basis for customization.
*Typeset - Allows to generate HTML from JATS XML within a click. Also, offers capacity to generate custom HTML based on the requirements of the journal.
* ''PubReader'' – "The PubReader view is an alternative web presentation ... Designed particularly for enhancing readability on tablet and other small screen devices, PubReader can also be used on desktops and laptops and from multiple web browsers".
Customization
* Jatsdoc - Produces documentation for any particular JATS customization. Jatsdoc is integrated with NCBI's ''DtdAnalyzer''.
JATS central repositories
As NISO JATS began the ''de facto'' and ''de jure'' standard for open access journal
Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 de ...
s, the scientific community
The scientific community is a diverse network of interacting scientists. It includes many "working group, sub-communities" working on particular scientific fields, and within particular institutions; interdisciplinary and cross-institutional acti ...
has adopted the JATS repositories as a kind of legal deposit
Legal deposit is a legal requirement that a person or group submit copies of their publications to a repository, usually a library. The number of copies required varies from country to country. Typically, the national library is the primary reposit ...
, sometimes deemed more valuable than the traditional digital libraries where only a PDF version is stored. Open knowledge
Open knowledge (or free knowledge) is knowledge that is free to use, reuse, and redistribute without legal, social, or technological restriction. Open knowledge organizations and activists have proposed principles and methodologies related to the ...
need richer and structured formats as JATS: PDF and JATS must be certified as "same content", and the set "PDF+JATS" forming the unit of legal deposit.
List of ''JATS repositories'' and its contained:
* PubMed Central
PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Cente ...
: (please check these numbers)
** US PubMed Central: in 2016 ~3.8 million articles
** Europe PubMed Central
Europe PubMed Central (Europe PMC) is an open-access repository that contains millions of biomedical research works. It was known as UK PubMed Central until 1 November 2012.
Service
Europe PMC provides free access to more than 9.3 million full-te ...
: in 2016 ~3,7 million articles
* SciELO
SciELO (Scientific Electronic Library Online) is a bibliographic database, digital library, and cooperative electronic publishing model of open access journals. SciELO was created to meet the scientific communication needs of developing countrie ...
: in 2016 ~0.6 million articles
These repositories do overlap and the same article can be held by more than one repository.
Alternatives and semantic
There are some effort and experiments using RDF conversion in the 2012, with no impact in the JATS community.
Later, in ~2016, for Semantic Web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
context, with SchemaOrg initiative, the clas
ScholarlyArticle
was defined, receiving better reception. It is an initial "JATS-like standardization" for RDF contexts of use.
See also
Related to
* IMRAD
In scientific writing, IMRAD or IMRaD () (Introduction, Methods, Results, and Discussion) is a common organizational structure for the format of a document. IMRaD is the most prominent norm for the structure of a scientific journal article of the o ...
(Introduction, Methods, Results, and Discussion)
* NISO
* Open science data
* Scientific literature
Scientific literature encompasses a vast body of academic papers that spans various disciplines within the natural and social sciences. It primarily consists of academic papers that present original empirical research and theoretical ...
* Semantic publishing
Semantic publishing on the Web, or semantic web publishing, refers to publishing information on the web as documents accompanied by semantic markup. Semantic publication provides a way for computers to understand the structure and even the meaning ...
* Separation of presentation and content
Separation of content and presentation (or separation of content and style) is the separation of concerns design principle as applied to the authoring and presentation of content. Under this principle, visual and design aspects (presentation and s ...
* XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
Used by (digital preservation)
* PubMed Central
PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Cente ...
* SciELO
SciELO (Scientific Electronic Library Online) is a bibliographic database, digital library, and cooperative electronic publishing model of open access journals. SciELO was created to meet the scientific communication needs of developing countrie ...
Used by (publishing)
* Elsevier
Elsevier ( ) is a Dutch academic publishing company specializing in scientific, technical, and medical content. Its products include journals such as ''The Lancet'', ''Cell (journal), Cell'', the ScienceDirect collection of electronic journals, ...
* NPG
* Open Journal Systems
* PLOS
PLOS (for Public Library of Science; PLoS until 2012) is a nonprofit publisher of open-access journals in science, technology, and medicine and other scientific literature, under an open-content license. It was founded in 2000 and launched it ...
Similar to
* DocBook
DocBook is a Semantics (computer science), semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of docume ...
* Text Encoding Initiative
The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and ma ...
* SchemaOrg ()
* XHTML
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
While HTML, pr ...
References
Further reading
*
* {{cite web , first=Molly , last=Sharp , date=4 June 2013 , title=Structured Documents for Science: JATS XML as Canonical Content Format , website=PLOS Tech , url=http://blogs.plos.org/tech/structured-documents-for-science-jats-xml-as-canonical-content-format/
External links
NLM Journal Article Tag Suite
– NCBI's information and documentation site.
* NISO JATS Version 1.1 (current standard):
*
Archiving and Interchange
tag library
*
Publishing
tag library
*
Article Authoring
tag library
* Styles and customization:
*
SciELO Publishing Schema
(SPS) – SciELO's customization.
*
** ttp://www.iso.org/schema/isosts/v1.0/doc/ ISO Standards Tag Set (ISOSTS) as a customization of NISO JATS** NISO
Book Interchange Tag Suite (BITS)
', based on JATS.
**
TextureJATS
a minimal coherent subset of JATS.
* JATS open community:
*
"JATS for Reuse" (JATS4R) community, validator
*
SchemaOrg community, ScholarlyArticle
*
PeerJ's XML-JATS to HTML5-ScholarlyArticle
Markup languages
XML-based standards
Academic publishing
Open science
Open data