The SAO/NASA Astrophysics Data System (ADS) is a
digital library
A digital library (also called an online library, an internet library, a digital repository, a library without walls, or a digital collection) is an online database of digital resources that can include text, still images, audio, video, digital ...
portal for researchers on
astronomy
Astronomy is a natural science that studies celestial objects and the phenomena that occur in the cosmos. It uses mathematics, physics, and chemistry in order to explain their origin and their overall evolution. Objects of interest includ ...
and
physics
Physics is the scientific study of matter, its Elementary particle, fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge whi ...
, operated for
NASA
The National Aeronautics and Space Administration (NASA ) is an independent agencies of the United States government, independent agency of the federal government of the United States, US federal government responsible for the United States ...
by the
Smithsonian Astrophysical Observatory. ADS maintains three bibliographic collections containing over 15 million records, including all
arXiv
arXiv (pronounced as "archive"—the X represents the Chi (letter), Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not Scholarly pee ...
e-prints. Abstracts and full-text of major astronomy and physics publications are indexed and searchable through the portal.
Historical context
Johann Friedrich Weidler published the first comprehensive history of astronomy in 1741 and the first astronomical bibliography in 1755. This was an effort to archive and classify earlier astronomical knowledge and works.
This effort was continued by
Jérôme de La Lande who published his ''Bibliographie astronomique'' in 1803, a work that covered the period from 480 BCE to the year of publication.
The ''Bibliographie générale de l’astronomie, Volume I and Volume II'', published by J.C. Houzeau and A. Lancaster, followed in 1882 until 1889.
As the number of astronomers and astronomical publications grew, bibliographical efforts became institutional tasks, first at the
Observatoire Royal de Belgique, where the ''Bibliography of Astronomy'' was published from 1881 to 1898, and then at the
Astronomischer Rechen-Institut in Heidelberg, where the yearly ''Astronomischer Jahresbericht'' was published from 1899 to 1968. After 1968, this was replaced by the yearly
''Astronomy and Astrophysics Abstracts'' book series, which continued until the end of the 20th century.
History
The first suggestion of a digital database of journal paper abstracts was made at a conference on ''Astronomy from Large Data-Bases'' held in
Garching bei München in 1987.
An initial version of ADS, with a database consisting of 40 papers, was created as a
proof of concept in 1988. The ADS Abstract Service became available for general use via proprietary network software in April 1993, and it was connected to
SIMBAD a few months later. In early 1994 the ADS web-based service was launched, which effectively quadrupled the number of active users in the five weeks following its introduction.
In 2011 the ADS launched ADS Labs Streamlined Search which introduce
facetsfor query refinement and selection. In 2013, ADS Labs 2.0 started featuring a new search engine, full-text search functionality, scalable facets, and an API was introduced. In 2015, the new ADS, code-named Bumblebee, was released as ADS-beta. The ADS-beta system features a micro-services API and client-side dynamic page loading served on a cloud platform. In May 2018 the beta label was dropped and Bumblebee became the default ADS interface—with some legacy features (ADS Classic) remaining available. Development continues to the present day, with an extensible
API available: enabling users to build their own utilities on top of the ADS bibliographic record.
The ADS service is distributed worldwide with twelve
mirror sites in twelve countries and with the database synchronized by weekly updates using
rsync, a mirroring utility which allows updates to only the portions of the database which have changed. All updates are triggered centrally, but they initiate scripts at the mirror sites which "pull" updated data from the main ADS servers.
Data in the system
At first, the journal articles available via ADS were exclusively
scanned
bitmaps created from the paper journals and the abstracts created using
optical character recognition
Optical character recognition or optical character reader (OCR) is the electronics, electronic or machine, mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo ...
software. Some of these scanned articles up to around 1995 are available for free by agreement with the journal publishers,
with some dating from as far back as the early 19th century. Eventually, because of a wider spread of online editions of journal publications, abstracts would start to instead be loaded into ADS directly.
Papers are indexed within the database by their bibliographic record which contains the details of the journal they were published in, and various associated
metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
, such as author lists,
reference
A reference is a relationship between objects in which one object designates, or acts as a means by which to connect to or link to, another object. The first object in this relation is said to ''refer to'' the second object. It is called a ''nam ...
s and
citation
A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose o ...
s. Originally this data was stored in
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
format but eventually the limitations of this encouraged the database maintainers to migrate all records to an
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
(Extensible Markup Language) format in 2000. Bibliographic records are now stored as an XML element with sub-elements for the various metadata.
Scanned articles are stored in
TIFF format at both medium and high
resolution. The TIFF files are converted on demand into GIF files, for on-screen viewing, and
PDF or
PostScript files for printing. The generated files are then
cached to eliminate needlessly frequent regenerations for popular articles. As of 2000, ADS contained 250
GB of scans, which consisted of 1,128,955 article pages comprising 138,789 articles. By 2005 this had grown to 650 GB and was expected to grow further to about 900 GB by 2007.
No further information has been published (2005).
The database initially contained only astronomical references, but has now grown to incorporate three databases, covering
astronomy
Astronomy is a natural science that studies celestial objects and the phenomena that occur in the cosmos. It uses mathematics, physics, and chemistry in order to explain their origin and their overall evolution. Objects of interest includ ...
references (including planetary sciences and solar physics),
physics
Physics is the scientific study of matter, its Elementary particle, fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge whi ...
references (including instrumentation and geosciences), as well as preprints of scientific papers from
arXiv
arXiv (pronounced as "archive"—the X represents the Chi (letter), Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not Scholarly pee ...
. The astronomy database is by far the most advanced and its use accounts for about 85% of the total ADS usage. Articles are assigned to the different databases according to the subject rather than the journal they are published in, so that articles from any one journal might appear in all three subject databases. The separation of the databases allows searching in each discipline to be tailored, so that words can automatically be given different
weight functions in different database searches, depending on how common they are in the relevant field.
Data in the preprint archive is updated daily from arXiv which is the dominant repository of physics and astronomy preprints. The advent of preprint servers has, like ADS, had a significant impact on the rate of astronomical research, as papers are often made available from preprint servers weeks or months before they are published in the journals. The incorporation of preprints from arXiv into ADS means that the search engine can return the most current research available, with the caveat that preprints may not have been peer-reviewed or
proofread to the required standard for publication in the main journals. The database of ADS links preprints with subsequently published articles wherever possible, so that citation and reference searches will return links to the journal article where the preprint was cited.
Software and hardware
The software runs on a system that was written specifically for the ADS, allowing for extensive customization for astronomical needs that would not have been possible with general purpose
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
software. The scripts are designed to be as
platform independent as possible, given the need to facilitate mirroring on different systems around the world, although the growing use of
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
as the
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
of choice within astronomy has led to increasing optimization of the scripts for installation on that platform.
The main ADS server is located at the
Center for Astrophysics Harvard & Smithsonian in
Cambridge, Massachusetts
Cambridge ( ) is a city in Middlesex County, Massachusetts, United States. It is a suburb in the Greater Boston metropolitan area, located directly across the Charles River from Boston. The city's population as of the 2020 United States census, ...
, and is a dual 64-bit X86
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
server with two quad-core 3.0
GHz
The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), often described as being equivalent to one event (or Cycle per second, cycle) per second. The hertz is an SI derived unit whose formal expression in ter ...
CPUs and 32 GB of
RAM, running the
CentOS 5.4
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
distribution.
As of 2022, there are mirrors located in China, Chile, France, Germany, Japan, Russia, the United Kingdom, and Ukraine.
Indexing
ADS currently (2005) receives abstracts or tables of contents from almost two hundred journal sources. The service may receive data referring to the same article from multiple sources, and creates one bibliographic reference based on the most accurate data from each source. The common use of
TeX
Tex, TeX, TEX, may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Tex Earnhardt (1930–2020), U.S. businessman
* Joe Tex (1933–1982), stage name of American soul singer ...
and
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latices are found in nature, but synthetic latices are common as well.
In nature, latex is found as a wikt:milky, milky fluid, which is present in 10% of all floweri ...
by almost all scientific journals greatly facilitates the incorporation of bibliographic data into the system in a standardized format, and importing
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
-coded web-based articles is also simple. ADS utilizes
Python and
Perl scripts for importing, processing and standardizing bibliographic data.
The apparently mundane task of converting author names into a standard ''
Surname
In many societies, a surname, family name, or last name is the mostly hereditary portion of one's personal name that indicates one's family. It is typically combined with a given name to form the full name of a person, although several give ...
, Initial'' format is actually one of the more difficult to automate, due to the wide variety of naming conventions around the world and the possibility that a given name such as Davis could be a
first name,
middle name or surname. The accurate conversion of names requires a detailed knowledge of the names of authors active in astronomy, and ADS maintains an extensive database of author names, which is also used in searching the database (see below).
For electronic articles, a list of the references given at the end of the article is easily extracted. For scanned articles, reference extraction relies on OCR. The reference database can then be "inverted" to list the citations for each paper in the database. Citation lists have been used in the past to identify popular articles missing from the database; mostly these were from before 1975 and have now been added to the system.
Coverage
The database now contains over fifteen million articles. In the cases of the major journals of astronomy (''
Astrophysical Journal'', ''
Astronomical Journal'', ''
Astronomy and Astrophysics
''Astronomy & Astrophysics (A&A)'' is a monthly peer-reviewed scientific journal covering theoretical, observational, and instrumental astronomy and astrophysics. It is operated by an editorial team under the supervision of a board of directors re ...
'', ''
Publications of the Astronomical Society of the Pacific'' and the ''
Monthly Notices of the Royal Astronomical Society''), coverage is complete, with all issues indexed from number 1 to the present. These journals account for about two-thirds of the papers in the database, with the rest consisting of papers published in over 100 other journals from around the world, as well as in conference proceedings.
While the database contains the complete contents of all the major journals and many minor ones as well, its coverage of references and citations is much less complete. References in and citations of articles in the major journals are fairly complete, but references such as "private communication", "in press" or "in preparation" cannot be matched, and author errors in reference listings also introduce potential errors. Astronomical papers may cite and be cited by articles in journals which fall outside the scope of ADS, such as
chemistry
Chemistry is the scientific study of the properties and behavior of matter. It is a physical science within the natural sciences that studies the chemical elements that make up matter and chemical compound, compounds made of atoms, molecules a ...
,
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
or
biology
Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, History of life, origin, evolution, and ...
journals.
Search engine
Since its inception, the ADS has developed a highly complex
search engine
A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
to query the abstract and
object databases. The search engine is tailor-made for searching astronomical abstracts, and the engine and its
user interface
In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine fro ...
assume that the user is well-versed in astronomy and able to interpret search results which are designed to return more than just the most relevant papers. The database can be queried for author names,
astronomical object names, title words, and words in the abstract text, and results can be filtered according to a number of criteria. It works by first gathering synonyms and simplifying search terms as described above, and then generating an "inverted file", which is a list of all the documents matching each search term. The user-selected logic and filters are then applied to this inverted list to generate the final search results.
Author name queries
The system indexes author names by surname and initials, and accounts for the possible variations in spelling of names using a list of variations. This is common in the case of names including accents such as
umlauts and transliterations from
Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
or
Cyrillic script
The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic languages, Slavic, Turkic languages, Turkic, Mongolic languages, Mongolic, Uralic languages, Uralic, C ...
. An example of an entry in the author synonym list is:
:''AFANASJEV, V''
:''AFANAS’EV, V''
:''AFANAS’IEV, V''
:''AFANASEV, V''
:''AFANASYEV, V''
:''AFANS’IEV, V''
:''AFANSEV, V''
Object name searches
The capability to search for papers on specific astronomical objects is one of ADS's most powerful tools. The system uses data from the
SIMBAD, the
NASA/IPAC Extragalactic Database, the
International Astronomical Union
The International Astronomical Union (IAU; , UAI) is an international non-governmental organization (INGO) with the objective of advancing astronomy in all aspects, including promoting astronomical research, outreach, education, and developmen ...
Circulars and the
Lunar and Planetary Institute to identify papers referring to a given object, and can also search by object position, listing papers which concern objects within a 10
arcminute radius of a given
Right Ascension
Right ascension (abbreviated RA; symbol ) is the angular distance of a particular point measured eastward along the celestial equator from the Sun at the equinox (celestial coordinates), March equinox to the (hour circle of the) point in questio ...
and
Declination
In astronomy, declination (abbreviated dec; symbol ''δ'') is one of the two angles that locate a point on the celestial sphere in the equatorial coordinate system, the other being hour angle. The declination angle is measured north (positive) or ...
. These databases combine the many catalogue designations an object might have, so that a search for the
Pleiades
The Pleiades (), also known as Seven Sisters and Messier 45 (M45), is an Asterism (astronomy), asterism of an open cluster, open star cluster containing young Stellar classification#Class B, B-type stars in the northwest of the constellation Tau ...
will also find papers which list the famous
open cluster
An open cluster is a type of star cluster made of tens to a few thousand stars that were formed from the same giant molecular cloud and have roughly the same age. More than 1,100 open clusters have been discovered within the Milky Way galaxy, and ...
in
Taurus under any of its other catalog designations or popular names, such as M45, the Seven Sisters or Melotte 22.
Title and abstract searches
The search engine first filters search terms in several ways. An M followed by a space or
hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation.
The hyphen is sometimes confused with dashes (en dash , em dash and others), which are wider, or with t ...
has the space or hyphen removed, so that searching for
Messier catalogue objects is simplified and a user input of M45, M 45 or M-45 all result in the same query being executed; similarly,
NGC designations and common search terms such as
Shoemaker Levy and
T Tauri are stripped of spaces. Unimportant words such as AT, OR and TO are stripped out, although in some cases
case sensitivity is maintained, so that while and is ignored, And is converted to "
Andromeda", and Her is converted to "
Hercules", but her is ignored.
Synonym replacement
Once search terms have been preprocessed, the database is queried with the revised search term, as well as synonyms for it. As well as simple
synonym replacement such as searching for both
plural
In many languages, a plural (sometimes list of glossing abbreviations, abbreviated as pl., pl, , or ), is one of the values of the grammatical number, grammatical category of number. The plural of a noun typically denotes a quantity greater than ...
and
singular forms, ADS also searches for a large number of specifically astronomical synonyms. For example,
spectrograph and
spectroscope have basically the same meaning, and in an astronomical context
metallicity and
abundance are also synonymous. ADS's synonym list was created manually, by grouping the list of words in the database according to similar meanings.
As well as
English language
English is a West Germanic language that developed in early medieval England and has since become a English as a lingua franca, global lingua franca. The namesake of the language is the Angles (tribe), Angles, one of the Germanic peoples th ...
synonyms, ADS also searches for English translations of foreign search terms and vice versa, so that a search for the
French word ''soleil'' retrieves references to
Sun, and papers in languages other than English can be returned by English search terms.
Synonym replacement can be disabled if required, so that a rare term which is a synonym of a much more common term (such as '
dateline' rather than '
date') can be searched for specifically.
Selection logic
The search engine allows selection
logic
Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the study of deductively valid inferences or logical truths. It examines how conclusions follow from premises based on the structure o ...
both within fields and between fields. Search terms in each field can be combined with OR, AND, simple logic or
Boolean logic
In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variable (mathematics), variables are the truth values ''true'' and ''false'', usually denot ...
, and the user can specify which fields must be matched in the search results. This allows complex searches to be built; for example, the user could search for papers concerning
NGC 6543 OR
NGC 7009, with the paper titles containing (radius OR velocity) AND NOT (abundance OR temperature).
Result filtering
Search results can be filtered according to a number of criteria, including specifying a range of years such as "1945 to 1975", "2000 to the present day" or "before 1900", and what type of journal the article appears in
��non-peer-reviewed articles such as
conference proceedings. These can be excluded or specifically searched for, or specific journals can be included in or excluded from the search.
Search results
Although it was conceived as a means of accessing abstracts and papers, ADS provides a substantial amount of ancillary information along with search results. For each abstract returned, links are provided to other papers in the database which are referenced, and which cite the paper, and a link is provided to a preprint, where one exists. The system also generates a link to "also-read" articles – that is, those which have been most commonly accessed by those reading the article. In this way, an ADS user can determine which papers are of most interest to astronomers who are interested in the subject of a given paper.
Also returned are links to the
SIMBAD and/or
NASA Extragalactic Database object name databases, via which a user can quickly find out basic observational data about the objects analyzed in a paper, and find further papers on those objects.
Impact on astronomy
ADS is almost universally used as a research tool among astronomers, and there are several studies that have estimated quantitatively how much more efficient ADS has made astronomy; one estimated that ADS increased the efficiency of astronomical research by 333 full-time equivalent research years per year,
and another found that in 2002 its effect was equivalent to 736 full-time researchers, or all the astronomical research done in France.
Preprint
ADS has allowed literature searches that would previously have taken days or weeks to carry out to be completed in seconds, and it is estimated that ADS has increased the readership and use of the astronomical literature by a factor of about three since its inception.
In monetary terms, this increase in efficiency represents a considerable amount. There are about 12,000 active astronomical researchers worldwide, so ADS is the equivalent of about 5% of the working population of astronomers. The global astronomical research budget is estimated at between 4,000 and US$5,000 million,
so the value of ADS to astronomy would be about 200–250 million USD annually. Its operating budget is a small fraction of this amount.
The great importance of ADS to astronomers has been recognized by the
United Nations
The United Nations (UN) is the Earth, global intergovernmental organization established by the signing of the Charter of the United Nations, UN Charter on 26 June 1945 with the stated purpose of maintaining international peace and internationa ...
, the
General Assembly of which has commended ADS on its work and success, particularly noting its importance to astronomers in the developing world, in reports of the
United Nations Committee on the Peaceful Uses of Outer Space. A 2002 report by a visiting committee to the Center for Astrophysics, meanwhile, said that the service had "revolutionized the use of the astronomical literature", and was "probably the most valuable single contribution to astronomy research that the CfA has made in its lifetime".
Sociological studies using ADS
Because it is used almost universally by astronomers, ADS can reveal much about how astronomical research is distributed around the world. Most users access the system from institutes of higher education, whose
IP address
An Internet Protocol address (IP address) is a numerical label such as that is assigned to a device connected to a computer network that uses the Internet Protocol for communication. IP addresses serve two main functions: network interface i ...
can easily be used to determine the user's geographical location. Studies reveal that the highest per-capita users of ADS are France and Netherlands-based astronomers, and while more developed countries (measured by
GDP per capita) use the system more than less developed countries; the relationship between GDP per capita and ADS use is not linear. The range of ADS usage per capita far exceeds the range of GDP per capita, and basic research carried out in a country, as measured by ADS usage, has been found to be proportional to the square of the country's GDP divided by its population.
Statistics also imply that there are about three times as many astronomers in countries of European culture as in countries of
Asian culture
The culture of Asia encompasses the collective and diverse customs and traditions of art, architecture, music, literature, lifestyle, philosophy, food, politics and religion that have been practiced and maintained by the numerous ethnic g ...
s, perhaps suggesting cultural differences in the importance attached to astronomical research.
The amount of basic research carried out in a country is found to be proportional to the number of astronomers in that country multiplied by its GDP per capita, with considerable scatter.
ADS has also been used to show that the fraction of single-author astronomy papers has decreased substantially since 1975 and that astronomical papers with more than 50 authors have become more common since 1990.
See also
*
List of academic databases and search engines
This page contains a representative list of major databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repository, institutional repositories, archives, or other collecti ...
*
Bibcode
*
INSPIRE-HEP
INSPIRE-HEP is an open access digital library for the field of high energy physics (HEP). It is the successor of the Stanford Physics Information Retrieval System (SPIRES) database, the main literature database for high energy physics since the 1 ...
*
NASA
The National Aeronautics and Space Administration (NASA ) is an independent agencies of the United States government, independent agency of the federal government of the United States, US federal government responsible for the United States ...
's
Planetary Data System (PDS)
*
PubMed
*
Michael J. Kurtz
References
External links
*
{{Portal bar, Physics, Astronomy, Stars, Outer space, Education, Science
NASA online
Discipline-oriented digital libraries
Bibliographic databases and indexes
Full-text scholarly online databases
Astronomical databases