arXiv (pronounced "
archive"—the X represents the
Greek letter chi ⟨χ⟩)
is an
open-access repository of electronic
preprint
In academic publishing, a preprint is a version of a scholarly or scientific paper that precedes formal peer review and publication in a peer-reviewed scholarly or scientific journal. The preprint may be available, often as a non-typeset versio ...
s and
postprints (known as
e-prints
In academic publishing, an eprint or e-print is a digital version of a research document (usually a journal article, but could also be a thesis, conference paper, book chapter, or a book) that is accessible online, usually as green open access, w ...
) approved for posting after moderation, but not
peer review. It consists of
scientific papers in the fields of
mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
,
physics,
astronomy,
electrical engineering
Electrical engineering is an engineering discipline concerned with the study, design, and application of equipment, devices, and systems which use electricity, electronics, and electromagnetism. It emerged as an identifiable occupation in the l ...
,
computer science,
quantitative biology,
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
,
mathematical finance
Mathematical finance, also known as quantitative finance and financial mathematics, is a field of applied mathematics, concerned with mathematical modeling of financial markets.
In general, there exist two separate branches of finance that require ...
and
economics, which can be accessed online. In many fields of mathematics and physics, almost all scientific papers are
self-archived on the arXiv repository before publication in a peer-reviewed journal. Some publishers also grant permission for authors to archive the peer-reviewed
postprint. Begun on August 14, 1991, arXiv.org passed the half-million-article milestone on October 3, 2008, and had hit a million by the end of 2014.
As of April 2021, the submission rate is about 16,000 articles per month.
History
arXiv was made possible by the compact
TeX file format, which allowed scientific papers to be easily transmitted over the
Internet and rendered
client-side. Around 1990,
Joanne Cohn began emailing
physics preprints to colleagues as TeX files, but the number of papers being sent soon filled mailboxes to capacity.
Paul Ginsparg recognized the need for central storage, and in August 1991 he created a central
repository mailbox stored at the
Los Alamos National Laboratory (LANL) which could be accessed from any computer. Additional modes of access were soon added:
FTP in 1991,
Gopher in 1992, and the
World Wide Web in 1993.
The term
e-print was quickly adopted to describe the articles.
It began as a physics archive, called the
LANL
Los Alamos National Laboratory (often shortened as Los Alamos and LANL) is one of the sixteen research and development laboratories of the United States Department of Energy (DOE), located a short distance northwest of Santa Fe, New Mexico, in ...
preprint archive, but soon expanded to include astronomy, mathematics, computer science, quantitative biology and, most recently, statistics. Its original
domain name was xxx.lanl.gov. Due to LANL's lack of interest in the rapidly expanding technology, in 2001 Ginsparg changed institutions to
Cornell University and changed the name of the repository to arXiv.org. It is now hosted principally by Cornell, with five
mirrors around the world.
ArXiv was an early adopter and promoter of
preprints.
Its success in sharing preprints was one of the precipitating factors that led to the later movement in
scientific publishing
: ''For a broader class of literature, see Academic publishing.''
Scientific literature comprises scholarly publications that report original empirical and theoretical work in the natural and social sciences. Within an academic field, scien ...
known as
open access
Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
.
Mathematicians and scientists regularly upload their papers to arXiv.org for worldwide access and sometimes for reviews before they are published in
peer-reviewed
Peer review is the evaluation of work by one or more people with similar competencies as the producers of the work (peers). It functions as a form of self-regulation by qualified members of a profession within the relevant field. Peer review ...
journals. Ginsparg was awarded a
MacArthur Fellowship in 2002 for his establishment of arXiv.
The annual budget for arXiv was approximately $826,000 for 2013 to 2017, funded jointly by Cornell University Library, the
Simons Foundation (in both gift and
challenge grant Challenge grants are funds disbursed by one party (the grant maker), usually a government agency, corporation, foundation or trust (sometimes anonymously), typically to a non-profit entity or educational institution (the grantee) upon completion of ...
forms) and annual fee income from member institutions. This model arose in 2010, when Cornell sought to broaden the financial funding of the project by asking institutions to make annual voluntary contributions based on the amount of download usage by each institution. Each member institution pledges a five-year funding commitment to support arXiv. Based on institutional usage ranking, the annual fees are set in four tiers from $1,000 to $4,400. Cornell's goal is to raise at least $504,000 per year through membership fees generated by approximately 220 institutions.
In September 2011, Cornell University Library took overall administrative and financial responsibility for arXiv's operation and development. Ginsparg was quoted in the ''
Chronicle of Higher Education'' as saying it "was supposed to be a three-hour tour, not a life sentence".
However, Ginsparg remains on the arXiv's Scientific Advisory Board and its Physics Advisory Committee.
Moderation process and endorsement
Although arXiv is not
peer reviewed, a collection of moderators for each area review the
submissions; they may recategorize any that are deemed off-topic,
or reject submissions that are not scientific papers, or sometimes for undisclosed reasons.
[ The lists of moderators for many sections of arXiv are publicly available, but moderators for most of the physics sections remain unlisted.
Additionally, an "endorsement" system was introduced in 2004 as part of an effort to ensure content is relevant and of interest to current research in the specified disciplines. Under the system, for categories that use it, an author must be endorsed by an established arXiv author before being allowed to submit papers to those categories. Endorsers are not asked to review the paper for errors, but to check whether the paper is appropriate for the intended subject area.] New authors from recognized academic institutions generally receive automatic endorsement, which in practice means that they do not need to deal with the endorsement system at all. However, the endorsement system has attracted criticism for allegedly restricting scientific inquiry.
A majority of the e-prints are also submitted to journals for publication, but some work, including some very influential papers, remain purely as e-prints and are never published in a peer-reviewed journal. A well-known example of the latter is an outline of a proof of Thurston's geometrization conjecture, including the Poincaré conjecture as a particular case, uploaded by Grigori Perelman in November 2002. Perelman appears content to forgo the traditional peer-reviewed journal process, stating: "If anybody is interested in my way of solving the problem, it's all there let them go and read about it". Despite this non-traditional method of publication, other mathematicians recognized this work by offering the Fields Medal
The Fields Medal is a prize awarded to two, three, or four mathematicians under 40 years of age at the International Congress of the International Mathematical Union (IMU), a meeting that takes place every four years. The name of the award ho ...
and Clay Mathematics Millennium Prizes to Perelman, both of which he refused.
While arXiv does contain some dubious e-prints, such as those claiming to refute famous theorems or proving famous conjectures such as Fermat's Last Theorem using only high-school mathematics, a 2002 article which appeared in '' Notices of the American Mathematical Society'' described those as "surprisingly rare". arXiv generally re-classifies these works, e.g. in "General mathematics", rather than deleting them; however, some authors have voiced concern over the lack of transparency in the arXiv screening process.
Submission formats
Papers can be submitted in any of several formats, including LaTeX, and PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
printed from a word processor other than TeX or LaTeX. The submission is rejected by the arXiv software if generating the final PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
file fails, if any image file is too large, or if the total size of the submission is too large. arXiv now allows one to store and modify an incomplete submission, and only finalize the submission when ready. The time stamp on the article is set when the submission is finalized.
Access
The standard access route is through the arXiv.org website or one of several mirrors. Other interfaces and access routes have also been created by other un-associated organisations.
Metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
for arXiv is made available through OAI-PMH, the standard for open access repositories. Content is therefore indexed in all major consumers of such data, such as BASE, CORE and Unpaywall. As of 2020, the Unpaywall dump links over 500,000 arxiv URLs as the open access
Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
version of a work found in CrossRef data from the publishers, making arXiv a top 10 global host of green open access.
Finally, researchers can select sub-fields and receive daily e-mailings or RSS feeds of all submissions in them.
Copyright status of files
Files on arXiv can have a number of different copyright statuses:
#Some are public domain, in which case they will have a statement saying so.
#Some are available under either the Creative Commons
Creative Commons (CC) is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. The organization has release ...
4.0 Attribution-ShareAlike license or the Creative Commons 4.0 Attribution-Noncommercial-ShareAlike license.
#Some are copyright to the publisher, but the author has the right to distribute them and has given arXiv a non-exclusive irrevocable license to distribute them.
#Most are copyright to the author, and arXiv has only a non-exclusive irrevocable license to distribute them.
See also
* List of preprint repositories
* List of academic databases and search engines
This article contains a representative list of notable databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repositories, archives, or other collections of scientific and ...
* List of academic journals by preprint policy
Notes
References
*
*
*
*
*
*
*
*
*
*
*
*
*
External links
*
{{Cornell
Eprint archives
Open-access archives
Open science
Physics websites
American digital libraries
Internet properties established in 1991
1991 establishments in New Mexico
Cornell University