Research data archiving is the
long-term storage of scholarly research
data
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
, including the natural sciences, social sciences, and life sciences. The various
academic journals
An academic journal (or scholarly journal or scientific journal) is a periodical publication in which scholarship relating to a particular academic discipline is published. They serve as permanent and transparent forums for the dissemination, scr ...
have differing policies regarding how much of their data and methods researchers are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archiving of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become increasingly strained as research in some areas depends on large datasets which cannot easily be replicated independently.
Data archiving is more important in some fields than others. In a few fields, all of the data necessary to replicate the work is already available in the journal article. In
drug development
Drug development is the process of bringing a new pharmaceutical drug to the market once a lead compound has been identified through the process of drug discovery. It includes preclinical research on microorganisms and animals, filing for regu ...
, a great deal of data is generated and must be archived so researchers can verify that the reports the drug companies publish accurately reflect the data.
The requirement of data archiving is a recent development in the
history of science
The history of science covers the development of science from ancient history, ancient times to the present. It encompasses all three major branches of science: natural science, natural, social science, social, and formal science, formal. Pr ...
. It was made possible by advances in
information technology
Information technology (IT) is a set of related fields within information and communications technology (ICT), that encompass computer systems, software, programming languages, data processing, data and information processing, and storage. Inf ...
allowing large amounts of data to be stored and accessed from central locations. For example, the
American Geophysical Union
The American Geophysical Union (AGU) is a 501(c)(3) nonprofit organization of Earth, Atmospheric science, atmospheric, Oceanography, ocean, Hydrology, hydrologic, Astronomy, space, and Planetary science, planetary scientists and enthusiasts that ...
(AGU) adopted their first policy on data archiving in 1993, about three years after the beginning of the
WWW. This policy mandates that datasets cited in AGU papers must be archived by a recognised data center; it permits the creation of "data papers"; and it establishes AGU's role in maintaining data archives. But it makes no requirements on paper authors to archive their data.
Prior to organized data archiving, researchers wanting to evaluate or replicate a paper would have to request data and methods information from the author. The academic community expects authors to
share supplemental data. This process was recognized as wasteful of time and energy and obtained mixed results. Information could become lost or corrupted over the years. In some cases, authors simply refuse to provide the information.
The need for data archiving and due diligence is greatly increased when the research deals with health issues or public policy formation.
Selected policies by journals
''Biotropica''
NB: ''Biotropica'' is one of only two journals that pays the fees for authors depositing data at Dryad.
''The American Naturalist''
''Journal of Heredity''
''Molecular Ecology''
''Nature''
''Science''
Royal Society
''Journal of Archaeological Science''
Policies by funding agencies
In the United States, the
National Science Foundation
The U.S. National Science Foundation (NSF) is an Independent agencies of the United States government#Examples of independent agencies, independent agency of the Federal government of the United States, United States federal government that su ...
(NSF) has tightened requirements on data archiving. Researchers seeking funding from NSF are now required to file a
data management plan as a two-page supplement to the grant application.
The NSF
Datanet initiative has resulted in funding of the Data Observation Network for Earth (
DataONE) project, which will provide scientific data archiving for ecological and environmental data produced by scientists worldwide. DataONE's stated goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. The community of users for DataONE includes scientists, ecosystem managers, policy makers, students, educators, and the public.
The German
DFG requires that research data should be archived in the researcher's own institution or an appropriate nationwide infrastructure for at least 10 years.
The British
Digital Curation Centre maintains an overview of funder's data policies.
Data library

Research data is archived in data libraries or data archives. A data library, data archive, or data repository is a
collection of numeric and/or
geospatial
Geographic data and information is defined in the ISO/TC 211 series of standards as data and information having an implicit or explicit association with a location relative to Earth (a geographic location or geographic position). It is also call ...
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
s for secondary use in research. A data library is normally part of a larger institution (academic, corporate, scientific, medical, governmental, etc.). established for research data archiving and to serve the data users of that organisation. The data library tends to house local data collections and provides access to them through various means (
CD-/
DVD-ROMs or central
server for download). A data library may also maintain subscriptions to licensed data resources for its users to access the information. Whether a data library is also considered a data archive may depend on the extent of unique holdings in the collection, whether long-term preservation services are offered, and whether it serves a broader community (as national data archives do). Most public data libraries are listed in the
Registry of Research Data Repositories.
Importance and services
In August 2001, the
Association of Research Libraries
The Association of Research Libraries (ARL) is a nonprofit organization of 125 research library at comprehensive, research institutions in Canada and the United States. ARL member libraries make up a large portion of the academic and research li ...
(ARL) published a report presenting results from a survey of ARL member institutions involved in collecting and providing services for numeric data resources.
Library service providing support at the institutional level for the use of numerical and other types of
datasets in research. Amongst the support activities typically available:
* Reference Assistance — locating numeric or geospatial datasets containing measurable variables on a particular topic or group of topics, in response to a user query.
* User Instruction — providing hands-on training to groups of users in locating data resources on particular topics, how to download data and read it into spreadsheet, statistical, database, or GIS packages, how to interpret codebooks and other documentation.
* Technical Assistance - including easing registration procedures, troubleshooting problems with the dataset, such as errors in the documentation, reformatting data into something a user can work with, and helping with statistical methodology.
* Collection Development & Management - acquire, maintain, and manage a collection of data files used for secondary analysis by the local user community; purchase institutional data subscriptions; act as a site representative to data providers and national data archives for the institution.
* Preservation and Data Sharing Services - act on a strategy of preservation of datasets in the collection, such as media refreshment and file format migration; download and keep records on updated versions from a central repository. Also, assist users in preparing original data for secondary use by others; either for deposit in a central or institutional repository, or for less formal ways of sharing data. This may also involve marking up the data into an appropriate XML standard, such as the Data Documentation Initiative, or adding other metadata to facilitate online discovery.
Examples of data libraries
Natural sciences
The following list refers to scientific data archives.
*
CISL Research Data Archive
*
DataONE
*
Dryad
*
ESO/ST-ECF Science Archive Facility
International Tree-Ring Data BankInter-university Consortium for Political and Social ResearchKnowledge Network for Biocomplexity*
National Archive of Computerized Data on Aging
* National Archive of Criminal Justice Dat
* NCAR Research Data Archive: http://rda.ucar.edu
*
National Climatic Data Center
The United States National Climatic Data Center (NCDC), previously known as the National Weather Records Center (NWRC), in Asheville, North Carolina, was the world's largest active archive of weather data.
In 2015, the NCDC merged with two other ...
*
National Geophysical Data Center
*
National Snow and Ice Data Center
The National Snow and Ice Data Center (NSIDC) is a United States information and referral center in support of polar and cryospheric research. NSIDC archives and distributes digital and analog snow and ice data and also maintains information ab ...
*
National Oceanographic Data Center
Oak Ridge National Laboratory Distributed Active Archive Center*
Pangaea - Data Publisher for Earth & Environmental Science
*
NASA SeaBASS - Data archive for ocean color data
*
World Data Center
The World Data Centre (WDC) system was created to archive and distribute data collected from the observational programmes of the 1957–1958 International Geophysical Year by the International Council of Scientific Unions ( ICSU). The WDCs were f ...
Social sciences
In the social sciences, data libraries are referred to as data archives. Data archives are professional institutions for the acquisition, preparation, preservation, and dissemination of social and behavioral data. Data archives in the social sciences evolved in the 1950s and have been perceived as an international movement:
By 1964 the International Social Science Council (ISSC) had sponsored a second conference on Social Science Data Archives and had a standing Committee on Social Science Data, both of which stimulated the data archives movement. By the beginning of the twenty-first century, most developed countries and some developing countries had organized formal and well-functioning national data archives. In addition, college and university campuses often have `data libraries' that make data available to their faculty, staff, and students; most of these bear minimal archival responsibility, relying for that function on a national institution (Rockwell, 2001, p. 3227).[Rockwell, R. C. (2001). Data Archives: International. IN: Smelser, N. J. & Baltes, P. B. (eds.) ''International Encyclopedia of the Social and Behavioral Sciences'' (vol. 5, pp. 3225- 3230). Amsterdam: Elsevier]
*
re3data.org is a global registry of research data repository indexing data archives from all disciplines: http://www.re3data.org
* CESSDA Members are data archives and other organisations that archive social science data and provide data for secondary use: https://www.cessda.eu/About/Consortium
* Consortium of European Social Science Data Archives: http://www.cessda.org/
* Finnish Social Science Data Archive (FSD): http://www.fsd.uta.fi/
* The Danish Data Archives: http://www.sa.dk/content/us/about_us ; specific page (only in Danish): https://web.archive.org/web/20150318230743/http://www.sa.dk/dda/default.htm
* Inter-university Consortium for Political and Social Research: http://www.icpsr.umich.edu/
* The Roper Center for Public Opinion Research: https://ropercenter.cornell.edu/
* The Social Science Data Archive: http://dataarchives.ss.ucla.edu/
* The Cornell Center for Social Sciences: https://socialsciences.cornell.edu/ciser-data-and-reproduction-archive
See also
*
Data bank
*
Data center
A data center is a building, a dedicated space within a building, or a group of buildings used to house computer systems and associated components, such as telecommunications and storage systems.
Since IT operations are crucial for busines ...
*
Data curation
*
Digital curation
*
Digital preservation
In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
*
Open Data
Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license.
The goals of the open data movement are similar to those of other "open(-so ...
References
Notes
*
Registry of Research Data Repositories ''re3data.org'
* Statistical checklist required by ''Nature'
* Policies of ''Proceedings of the National Academy of Sciences (U.S.)'
* The US National Committee for CODAT
* The Role of Data and Program Code Archives in the Future of Economic Research
* Data sharing and replication – Gary King websit
* The Case for Due Diligence When Empirical Research is Used in Policy Formation by McCullough and McKitric
* Thoughts on Refereed Journal Publication by Chuck Doswel
* “How to encourage the right behaviour” An opinion piece published in ''Nature'', March, 200
*
NASA Astrophysics Data Systembr>
*
Panton Principles for Open Data in Science, at Citizendiu
*
Inter-university Consortium for Political and Social Researchbr>
Further reading
* Clubb, J., Austin, E., and Geda, C. "'Sharing research data in the social sciences.'" In ''Sharing Research Data'', S. Fienberg, M. Martin, and M. Straf, Eds. National Academy Press, Washington, D.C., 1985, 39-88.
* Geraci, D., Humphrey, C., and Jacobs, J. ''Data Basics''. Canadian Library Association, Ottawa, ON, 2005.
*Heim, Kathleen M. "Social Scientific Information Needs for Numeric Data: The Evolution of the International Data Archive Infrastructure." ''Collection Management'' 9 (Spring 1987): 1-53.
* Martinez, Luis & Macdonald, Stuart
"'Supporting local data users in the UK academic community'" ''Ariadne'', issue 44, July 2005.
* See th
for articles tracing the history of data libraries and its relationship to the archivist profession, going back to the 1960s and '70s up to 1996.
* Se
articles from 1993 to the present, focusing on data libraries, data archives, data support, and information technology for the social sciences.
External links
University of California Irvine Machine Learnimg Repository
Associations
IASSIST(International Association for Social Science Information and Service Technology)
DISC-UK(Data Information Specialists Committee—United Kingdom)
APDU(Association of Public Data Users - USA)
CAPDU(Canadian Association of Public Data Users)
{{Data
Computer archives
Data management
Data publishing
Digital preservation
Information retrieval techniques
Knowledge representation
Structured storage