HOME

TheInfoList



OR:

Data curation is the organization and integration of
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for reuse and preservation. Data curation includes "all the processes needed for principled and controlled data creation, maintenance, and
management Management (or managing) is the administration of an organization, whether it is a business, a nonprofit organization, or a government body. It is the art and science of managing resources of the business. Management includes the activitie ...
, together with the capacity to add value to data". In science, data curation may indicate the process of extraction of important information from scientific texts, such as research articles by experts, to be converted into an electronic format, such as an entry of a
biological database Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genom ...
. In the modern era of
big data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
, the curation of data has become more prominent, particularly for
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consist ...
processing high volume and complex data systems. The term is also used in historical occasions and the humanities, where increasing cultural and scholarly data from
digital humanities Digital humanities (DH) is an area of scholarly activity at the intersection of computing or Information technology, digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanitie ...
projects requires the expertise and analytical practices of data curation. In broad terms, curation means a range of activities and processes done to create, manage, maintain, and validate a
component Circuit Component may refer to: •Are devices that perform functions when they are connected in a circuit.   In engineering, science, and technology Generic systems * System components, an entity with discrete structure, such as an assem ...
. Specifically, data curation is the attempt to determine what information is worth saving and for how long.


History and practice

The
user Ancient Egyptian roles * User (ancient Egyptian official), an ancient Egyptian nomarch (governor) of the Eighth Dynasty * Useramen, an ancient Egyptian vizier also called "User" Other uses * User (computing), a person (or software) using an ...
, rather than the database itself, typically initiates data curation and maintains
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
. According to the
University of Illinois The University of Illinois Urbana-Champaign (U of I, Illinois, University of Illinois, or UIUC) is a public land-grant research university in Illinois in the twin cities of Champaign and Urbana. It is the flagship institution of the Uni ...
' Graduate School of Library and Information Science, "Data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education; curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time." The data curation workflow is distinct from
data quality Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for tsintended uses in operations, decision making and ...
management,
data protection Information privacy is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, contextual information norms, and the legal and political issues surrounding them. It is also known as data p ...
, lifecycle management, and data movement. Census data has been available in tabulated punch card form since the early 20th century and has been electronic since the 1960s. The Inter-university Consortium for Political and Social Research (ICPSR) website marks 1962 as the date of their first Survey Data Archive. Deep background on data libraries appeared in a 1982 issue of the Illinois journal, ''Library Trends.'' For historical background on the data archive movement, see "Social Scientific Information Needs for Numeric Data: The Evolution of the International Data Archive Infrastructure." The exact curation process undertaken within any organisation depends on the volume of data, how much noise the data contains, and what the expected future use of the data means to its dissemination. The crises in space data led to the 1999 creation of the Open Archival Information System (OAIS) model, stewarded by the Consultative Committee for Space Data Systems (CCSDS), which was formed in 1982. The term ''data curation'' is sometimes used in the context of
biological databases Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genom ...
, where specific biological information is firstly obtained from a range of research articles and then stored within a specific category of database. For instance, information about anti-depressant drugs can be obtained from various sources and, after checking whether they are available as a database or not, they are saved under a drug's database's anti-depressive category. Enterprises are also utilizing data curation within their operational and strategic processes to ensure data quality and accuracy.


Projects and studies

The Dissemination Information Packages (DIPS) for Information Reuse (DIPIR) project is studying research data produced and used by quantitative social scientists, archaeologists, and zoologists. The intended audience is researchers who use secondary data and the digital curators, digital repository managers, data center staff, and others who collect, manage, and store digital information.Dissemination Information Packages for Information Reuse (DIPIR) project http://www.oclc.org/research/themes/user-studies/dipir.html The
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cr ...
was established in 1971 at
Brookhaven National Laboratory Brookhaven National Laboratory (BNL) is a United States Department of Energy national laboratory located in Upton, Long Island, and was formally established in 1947 at the site of Camp Upton, a former U.S. Army base and Japanese internment c ...
, and has grown into a global project. A database for three-dimensional structural data of proteins and other large biological molecules, the PDB contains over 120,000 structures, all standardized, validated against experimental data, and annotated.
FlyBase FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, ''Drosophila melanogaster'', a wide range of d ...
, the primary repository of genetic and molecular data for the insect family ''
Drosophilidae The Drosophilidae are a diverse, cosmopolitan family of flies, which includes species called fruit flies, although they are more accurately referred to as vinegar or pomace flies. Another distantly related family of flies, Tephritidae, are true ...
'', dates back to 1992. FlyBase annotates the entire ''
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (the taxonomic order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the " vinegar fly" or "pomace fly". Starting with ...
'' genome. The Linguistic Data Consortium is a data repository for linguistic data, dating back to 1992. The
Sloan Digital Sky Survey The Sloan Digital Sky Survey or SDSS is a major multi-spectral imaging and spectroscopic redshift survey using a dedicated 2.5-m wide-angle optical telescope at Apache Point Observatory in New Mexico, United States. The project began in 2000 ...
began surveying the night sky in 2000. Computer scientist Jim Gray, while working on the data architecture of the SDSS, championed the idea of data curation in the sciences. DataNet was a research program of the U.S. National Science Foundation Office of Cyberinfrastructure, funding data management projects in the sciences. DataONE (Data Observation Network for Earth) is one of the projects funded through DataNet, helping the environmental science community preserve and share data.


See also

*
Biocurator Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by th ...
* Data archaeology *
Data degradation Data degradation is the gradual corruption of computer data due to an accumulation of non-critical failures in a data storage device. The phenomenon is also known as data decay, data rot or bit rot. Example Below are several digital images il ...
* Data format management * Data preservation * Data stewardship *
Data wrangling Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one " raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes ...
*
Digital curation Digital curation is the selection, preservation, maintenance, collection and archiving of digital assets. Digital curation establishes, maintains and adds value to repositories of digital data for present and future use. This is often accomplished ...
the curation of published documents, rather than raw data *
Digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and ...
* Informationistan individual with extensive expertise in data curation


References


External links

* Curation of ecological and environmental data
DataONE
* Data management tools and services spanning multiple scientific disciplines
DataConservancy
{{data Archival science Data management Digital preservation Records management