PubMed Central (PMC) is a free
digital repository that archives
open access
Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 de ...
full-text scholarly articles that have been published in
biomedical
Biomedicine (also referred to as Western medicine, mainstream medicine or conventional medicine) and
life sciences
This list of life sciences comprises the branches of science that involve the scientific study of life – such as microorganisms, plants, and animals including human beings. This science is one of the two major branches of natural science, ...
journals. As one of the major research databases developed by the
National Center for Biotechnology Information
The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is lo ...
(NCBI), PubMed Central is more than a document repository. Submissions to PMC are indexed and formatted for enhanced
metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
, medical
ontology
Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...
, and unique identifiers which enrich the
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
structured data for each article. Content within PMC can be linked to other NCBI databases and accessed via
Entrez
The Entrez () Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCB ...
search and retrieval systems, further enhancing the public's ability to discover, read and build upon its biomedical knowledge.
PubMed Central is distinct from
PubMed
PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...
. PubMed Central is a free digital archive of full articles, accessible to anyone from anywhere via a web browser (with varying provisions for reuse). Conversely, although PubMed is a searchable database of biomedical citations and abstracts, the full-text article resides elsewhere (in print or online, free or behind a subscriber
paywall
A paywall is a method of restricting access to content (media), content, with a purchase or a subscription business model, paid subscription, especially news. Beginning in the mid-2010s, newspapers started implementing paywalls on their website ...
).
History
PubMed Central began as
E-biomed, initially proposed in May 1999 by then-
NIH director Harold Varmus.
The idea came to him "abruptly" in December 1998, inspired by the early use of
arXiv
arXiv (pronounced as "archive"—the X represents the Chi (letter), Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not Scholarly pee ...
for
preprints after a presentation from Pat Brown of
Stanford and
David Lipman, director of
NCBI:
The goal of E-biomed was to provide free access to all biomedical research. Papers submitted to E-biomed could take one of two routes: either immediately published as a preprint, or through a traditional
peer review
Peer review is the evaluation of work by one or more people with similar competencies as the producers of the work (:wiktionary:peer#Etymology 2, peers). It functions as a form of self-regulation by qualified members of a profession within the ...
process. The peer review process was to resemble contemporary
overlay journals, with an external editorial board retaining control over the process of reviewing, curating, and listing papers which would otherwise be freely accessible on the central E-biomed server. Varmus intended to realize the new possibilities presented by communicating scientific results digitally, imagining continuous conversation about published work, versioned documents, and enriched "layered" formats allowing for multiple levels of detail.
The proposal to create a central index of biomedical research was a radical departure from prevailing publishing norms. Prior to the internet, publication indexes operated largely like
ISBN
The International Standard Book Number (ISBN) is a numeric commercial book identifier that is intended to be unique. Publishers purchase or receive ISBNs from an affiliate of the International ISBN Agency.
A different ISBN is assigned to e ...
s: allocated by registration agencies to secondary publishers. The idea that anyone could own their own address space via a
domain name
In the Internet, a domain name is a string that identifies a realm of administrative autonomy, authority, or control. Domain names are often used to identify services provided through the Internet, such as websites, email services, and more. ...
and create their own indexing system was a wholly new idea. Major commercial publishers had begun experimenting with an indexing system for scientific papers shared across publishers as early as 1993, and were spurred to action following the E-biomed proposal. At the October 1999
STM Annual Frankfurt Conference, several publishers led by
Springer-Verlag
Springer Science+Business Media, commonly known as Springer, is a German multinational publishing company of books, e-books and peer-reviewed journals in science, humanities, technical and medical (STM) publishing.
Originally founded in 1842 in ...
reached a hurried conference room consensus to launch their competitor prototype:
At the Board meeting of the STM association, held the afternoon of Monday, October 11, before the fair's Wednesday opening, discussion focused on an emerging U.S. National Library of Medicine (NLM) initiative called E-Biomed (later PubMed Central) that had been proposed by Harold Varmus of the National Institutes of Health in the spring of 1999. Varmus envisioned a digital archive of journals, accessible free of charge and with the added value of reference linking. "Our consensus was that publishers should be the ones doing the linking," said Bob Campbell, who chaired the meeting. "Since we were 'higher up the stream,' so to speak, we should be able to link our articles ahead of the NLM as part of the process of producing them. Stefan von Holtzbrinck then set the ball rolling by offering to link Nature publications with anyone else's. We decided to issue an announcement of a broad STM reference linking initiative. It was, of course, a strategic move only, since we had neither plan nor prototype."
A small group led by Arnoud de Kemp of Springer-Verlag met in an adjacent room immediately following the Board meeting to draft the announcement, which was distributed to all attendees of the STM annual meeting the following day and published in an STM membership publication. ../nowiki> The potential benefit of the service that would become CrossRef was immediately apparent. Organizations such as AIP and IOP (Institute of Physics) had begun to link to each other's publications, and the impossibility of replicating such one-off arrangements across the industry was obvious. As Tim Ingoldsby later put it, "All those linking agreements were going to kill us."
Under pressure from vigorous lobbying from commercial publishers and scientific societies who feared for lost profits, NIH officials announced a revised PubMed Central proposal in August 1999.
PMC would receive submissions from publishers, rather than from authors as in E-biomed. Publications were allowed time-embargoed
paywalls up to one year. PMC would only allow peer-reviewed work — no preprints. The then-unnamed publisher-led linking system shortly thereafter became
CrossRef and the larger
DOI system. Varmus, Brown, and others including
Michael Eisen
Michael Bruce Eisen (born April 13, 1967) is an American computational biologist and the former editor-in-chief of the journal eLife. He is a professor of genetics, genomics and Developmental biology, development at University of California, Berkel ...
went on to found the Public Library of Science (
PLoS) in 2001, reaching the conclusion "that if we really want to change the publication of scientific research, we must do the publishing ourselves."
Adoption
Launched in February 2000, the repository has grown rapidly as the
NIH Public Access Policy is designed to make all research funded by the
National Institutes of Health
The National Institutes of Health (NIH) is the primary agency of the United States government responsible for biomedical and public health research. It was founded in 1887 and is part of the United States Department of Health and Human Service ...
(NIH) freely accessible to anyone, and, in addition, many publishers are working cooperatively with the NIH to provide free access to their works. In late 2007, the Consolidated Appropriations Act of 2008 (H.R. 2764) was signed into law and included a provision requiring the NIH to modify its policies and require inclusion into PubMed Central complete electronic copies of their peer-reviewed research and findings from NIH-funded research. These articles are required to be included within 12 months of publication. This is the first time the US government has required an agency to provide
open access
Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 de ...
to research and is an evolution from the 2005 policy, in which the NIH asked researchers to voluntarily add their research to PubMed Central.
A UK version of the PubMed Central system,
UK PubMed Central (UKPMC), has been developed by the
Wellcome Trust
The Wellcome Trust is a charitable foundation focused on health research based in London, United Kingdom. It was established in 1936 with legacies from the pharmaceutical magnate Henry Wellcome (founder of Burroughs Wellcome, one of the predec ...
and the
British Library
The British Library is the national library of the United Kingdom. Based in London, it is one of the largest libraries in the world, with an estimated collection of between 170 and 200 million items from multiple countries. As a legal deposit li ...
as part of a nine-strong group of UK research funders. This system went live in January 2007. On 1 November 2012, it became
Europe PubMed Central
Europe PubMed Central (Europe PMC) is an open-access repository that contains millions of biomedical research works. It was known as UK PubMed Central until 1 November 2012.
Service
Europe PMC provides free access to more than 9.3 million full-te ...
. The Canadian member of the PubMed Central International network,
PubMed Central Canada
PubMed Central Canada (PMC Canada) was a Canadian national digital repository of peer-reviewed health and life sciences literature. It operated from 2010 to 2018. It joined Europe PubMed Central (formerly UK PubMed Central) as a member of the PubM ...
, was launched in October 2009.
The
National Library of Medicine
The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library.
Located in Bethesda, Maryland, the NLM is an institute within the National Institutes of Health. I ...
"NLM Journal Publishing Tag Set" journal article
markup language
A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
is freely available. The
Association of Learned and Professional Society Publishers comments that "it is likely to become the standard for preparing scholarly content for both books and journals". A related
DTD is available for books. The
Library of Congress
The Library of Congress (LOC) is a research library in Washington, D.C., serving as the library and research service for the United States Congress and the ''de facto'' national library of the United States. It also administers Copyright law o ...
and the British Library have announced support for the NLM DTD. It has also been popular with journal service providers.
With the release of public access plans for many agencies beyond NIH, PMC is in the process of becoming the repository for a wider variety of articles. This includes NASA content, with the interface branded as "PubSpace".
, the PMC archive contained over 5.2 million articles, with contributions coming from publishers or authors depositing their manuscripts into the repository per the
NIH Public Access Policy. Earlier data shows that from January 2013 to January 2014 author-initiated deposits exceeded 103,000 papers during a 12-month period. PMC identifies about 4,000 journals which participate in some capacity to deposit their published content into the PMC repository. Some publishers delay the release of their articles on PubMed Central for a set time after publication, referred to as an "embargo period", ranging from a few months to a few years depending on the journal. (Embargoes of six to twelve months are the most common.) PubMed Central is a key example of "systematic external distribution by a third party", which some publishers purport to prohibit through private contracts.
Technology
Articles are sent to PubMed Central by publishers in
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
or
SGML
The Standard Generalized Markup Language (SGML; International Organization for Standardization, ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on t ...
, using a variety of article
DTDs. Older and larger publishers may have their own established in-house DTDs, but many publishers use the NLM Journal Publishing DTD (see above).
Received articles are converted via
XSLT
XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text, or XSL Formatting Objects. These formats c ...
to the very similar NLM Archiving and Interchange DTD. This process may reveal errors that are reported back to the publisher for correction. Graphics are also converted to standard formats and sizes. The original and converted forms are archived. The converted form is moved into a relational database, along with associated files for graphics, multimedia, or other associated data. Many publishers also provide
PDF
Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
of their articles, and these are made available without change.
Bibliographic citations are parsed and automatically linked to the relevant abstracts in PubMed, articles in PubMed Central, and resources on publishers' Web sites. PubMed links also lead to PubMed Central. Unresolvable references, such as to journals or particular articles not yet available at one of these sources, are tracked in the database and automatically come "live" when the resources become available.
An in-house indexing system provides search capability, and is aware of biological and
medical terminology
Medical terminology is a language used to precisely describe the human body including all its components, processes, conditions affecting it, and procedures performed upon it. Medical terminology is used in the field of medicine.
Medical terminolo ...
, such as generic vs.
proprietary drug names, and alternate names for organisms, diseases and anatomical parts.
When a user accesses a journal issue, a table of contents is automatically generated by retrieving all articles, letters, editorials, etc. for that issue. When an actual item such as an article is reached, PubMed Central converts the NLM markup to
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
for delivery, and provides links to related data objects. This is feasible because the variety of incoming data has first been converted to standard DTDs and graphic formats.
In a separate submission stream, NIH-funded authors may deposit articles into PubMed Central using the NIH Manuscript Submission (NIHMS). Articles thus submitted typically go through XML markup in order to be converted to NLM DTD.
Reception
Reactions to PubMed Central among the scholarly publishing community range between a genuine enthusiasm by some, to cautious concern by others.
While PMC is a welcome partner to open access publishers in its ability to augment the discovery and dissemination of biomedical knowledge, that same truth causes others to worry about traffic being diverted from the published
version of record
The version of record of an article is the fully copyedited, typeset and formatted copy of a manuscript as published, in contrast with earlier versions such as preprints (unaccepted manuscripts) and postprints (accepted manuscripts). The termin ...
, the economic consequences of less readership, as well as the effect on maintaining a community of scholars within learned societies.
A 2013 analysis found strong evidence that public repositories of published articles were responsible for "drawing significant numbers of readers away from journal websites" and that "the effect of PMC is growing over time".
Libraries, universities, open access supporters, consumer health advocacy groups, and patient rights organizations have applauded PubMed Central, and hope to see similar public access repositories developed by other federal funding agencies so to freely share any research publications that were the result of taxpayer support.
The Antelman study of open access publishing found that in philosophy, political science, electrical and electronic engineering and mathematics,
open access
Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 de ...
papers had a greater research impact. A randomised trial found an increase in content downloads of open access papers, with no citation advantage over subscription access one year after publication.
The NIH policy and open access repository work has inspired a
2013 presidential directive which has sparked action in other federal agencies as well.
In March 2020, PubMed Central accelerated its deposit procedures for the full text of publications on
coronavirus
Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the comm ...
. The NLM did so upon request from the White House
Office of Science and Technology Policy
The Office of Science and Technology Policy (OSTP) is a department of the United States government, part of the Executive Office of the President of the United States, Executive Office of the President (EOP), established by United States Congres ...
and international scientists to improve access for scientists, healthcare providers,
data mining innovators,
AI healthcare researchers, and the general public.
PMCID
The PMCID (PubMed Central
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
), also known as the PMC reference number, is a
bibliographic identifier for the PubMed Central open access database, much like the
PMID is the bibliographic identifier for the
PubMed
PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...
database. The two identifiers are distinct however. It consists of "PMC" followed by a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
of numbers. The format is:
* PMCID: PMC1852221
Authors applying for
NIH awards must include the PMCID in their application.
See also
*
Europe PubMed Central
Europe PubMed Central (Europe PMC) is an open-access repository that contains millions of biomedical research works. It was known as UK PubMed Central until 1 November 2012.
Service
Europe PMC provides free access to more than 9.3 million full-te ...
*
JATS (technology)
*
MEDLINE
MEDLINE (Medical Literature Analysis and Retrieval System Online, or MEDLARS Online) is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medic ...
, an international literature database of life sciences and biomedical information
*
PMID (PubMed Identifier)
*
PubMed Central Canada
PubMed Central Canada (PMC Canada) was a Canadian national digital repository of peer-reviewed health and life sciences literature. It operated from 2010 to 2018. It joined Europe PubMed Central (formerly UK PubMed Central) as a member of the PubM ...
*
Redalyc (similar project focused on Latin America)
*
SciELO (similar service)
References
External links
*
{{authority control
Internet properties established in 2000
2000 establishments in the United States
Biological databases
Bibliographic databases and indexes
Central
Open-access archives
Medical databases
Full-text scholarly online databases
Central