HOME

TheInfoList



Data (; ) are individual
facts A fact is an occurrence in the real world. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to experience Experience is the process through which conscious organisms Perception, pe ...

facts
,
statistics Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical ...

statistics
, or items of
information Information is processed, organised and structured data. It provides context for data and enables decision making process. For example, a single customer’s sale at a restaurant is data – this becomes information when the business is able ...

information
, often numeric. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects, while a datum (singular of ''data'') is a single value of a single variable. Although the terms "data" and "information" are often used interchangeably, this term has distinct meanings. In some popular publications, data are sometimes said to be transformed into information when they are viewed in context or in post-analysis. However, in academic treatments of the subject data are simply units of information. Data are used in
scientific research The scientific method is an Empirical evidence, empirical method of acquiring knowledge that has characterized the development of science since at least the 17th century. It involves careful observation, applying rigorous skepticism about what ...
, businesses management (e.g., sales data, revenue, profits,
stock price A share price is the price of a single share of a number of saleable equity shares of a company. In layman's terms, the stock price is the highest amount someone is willing to pay for the stock, or the lowest amount that it can be bought for. Beh ...
),
finance Finance is the study of financial institutions, financial markets and how they operate within the financial system. It is concerned with the creation and management of money and investments. Savers and investors have money available which could ...

finance
, governance (e.g.,
crime rate Crime statistics refer to systematic, quantitative results about crime, as opposed to crime news or anecdotes. Notably, crime statistics can be the result of two rather different processes: * scientific research, such as criminological studies, vict ...
s,
unemployment rate Unemployment, according to the OECD The Organisation for Economic Co-operation and Development (OECD; french: Organisation de Coopération et de Développement Économiques, OCDE) is an intergovernmental organization, intergovernmental eco ...

unemployment rate
s,
literacy Literacy is popularly understood as an ability to read and write in at least one method of writing, an understanding reflected by mainstream dictionaries. Correspondingly, the term ''illiteracy'' is considered to be the inability to read an ...
rates), and in virtually every other form of human organizational activity (e.g., censuses of the number of
homeless people Homelessness is lacking stable and appropriate housing. People can be categorized as homeless if they are: living on the streets (primary homelessness); moving between temporary shelters, including houses of friends, family and emergency accomm ...

homeless people
by non-profit organizations). Data are , collected, reported, and analyzed, and used to create data visualizations such as graphs, tables or images. Data as a general
concept Concepts are defined as abstract ideas or general notions that occur in the mind, in speech, or in thought. They are understood to be the fundamental building blocks of thoughts and belief A belief is an Attitude (psychology), attitude that ...
refers to the fact that some existing
information Information is processed, organised and structured data. It provides context for data and enables decision making process. For example, a single customer’s sale at a restaurant is data – this becomes information when the business is able ...

information
or
knowledge Knowledge is a familiarity or awareness, of someone or something, such as facts A fact is an occurrence in the real world. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to exp ...
is '' represented'' or ''
code In communication Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subject (philosophy), entities or Organization, groups through the use of sufficiently mutually understood sign ...
d'' in some form suitable for better usage or processing. ''
Raw data Raw data, also known as primary data, are data (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score. If a scientist sets up a computerized the ...
'' ("unprocessed data") is a collection of
numbers A number is a mathematical object A mathematical object is an abstract concept arising in mathematics. In the usual language of mathematics, an ''object'' is anything that has been (or could be) formally defined, and with which one may do deduc ...

numbers
or
characters Character(s) may refer to: Arts, entertainment, and media Literature * Character (novel), ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * Characters (Theophrastus), ''Characters'' (Theophrastus), a classical Greek set of char ...
before it has been "cleaned" and corrected by researchers. Raw data needs to be corrected to remove
outliers Figure 1. Box plot of data from the Michelson–Morley experiment displaying four outliers in the middle column, as well as one outlier in the first column. In statistics, an outlier is a data point that differs significantly from other observ ...

outliers
or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next stage. is raw data that is collected in an uncontrolled "
in situ ''In situ'' (; often not italicized in English) is a Latin Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken in the area around Rome, known as ...

in situ
" environment.
Experimental data Experimental data in science Science (from the Latin word ''scientia'', meaning "knowledge") is a systematic enterprise that Scientific method, builds and Taxonomy (general), organizes knowledge in the form of Testability, testable explanatio ...
is data that is generated within the context of a scientific investigation by observation and recording. Data has been described as the new
oil An oil is any nonpolar chemical substance A chemical substance is a form of matter In classical physics and general chemistry, matter is any substance that has mass and takes up space by having volume. All everyday objects that can b ...

oil
of the
digital economy Digital economy refers to an economy that is based on digital computing technologies, but is often perceived as conducting business through markets based on the internet and the World Wide Web. The digital economy is also referred to as the ''Inte ...
.


Etymology and terminology

The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954. The Latin word ''data'' is the plural of ' datum', "(thing) given," neuter past participle of ''dare'' "to give". In English the word ''data'' may be used as a plural noun in this sense, with some writers—usually, those working in natural sciences, life sciences, and social sciences—using ''datum'' in the singular and ''data'' for plural, especially in the 20th century and in many cases also the 21st (for example,
APA style APA style is a writing style and format for academic documents such as scholarly journal articles and books. It is commonly used for citing sources within the field of behavioral and social sciences. It is described in the style guide of the ...
as of the 7th edition still requires "data" to be plural.). However, in everyday language and much of the usage of
software development Software development is the process of conceiving, specifying, designing, Computer programming, programming, software documentation, documenting, software testing, testing, and Software bugs, bug fixing involved in creating and maintaining applic ...
and
computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algorithm, algorithmic proc ...
, "data" is most commonly used in the singular as a
mass noun In linguistics Linguistics is the science, scientific study of language. It encompasses the analysis of every aspect of language, as well as the methods for studying and modeling them. The traditional areas of linguistic analysis include ...
(like "sand" or "rain"). The term ''
big data Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many fie ...
'' takes the singular.


Meaning

Data,
information Information is processed, organised and structured data. It provides context for data and enables decision making process. For example, a single customer’s sale at a restaurant is data – this becomes information when the business is able ...

information
,
knowledge Knowledge is a familiarity or awareness, of someone or something, such as facts A fact is an occurrence in the real world. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to exp ...
, and
wisdom Wisdom, sapience, or sagacity is the ability to contemplate and act using knowledge, experience, understanding, common sense and insight. Wisdom is associated with attributes such as unbiased judgment, compassion, experiential self-knowledge, se ...

wisdom
are closely related concepts, but each has its role concerning the other, and each term has its meaning. According to a common view, data are collected and analyzed; data only becomes information suitable for making decisions once it has been analyzed in some fashion. One can say that the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person. The amount of information contained in a data stream may be characterized by its
Shannon entropy Shannon may refer to: * Shannon (given name) Shannon ("old river") is an Irish language, Irish name, Anglicised from Sionainn. Alternative spellings include Shannen, Shanon, Shannan, Seanan, and Siannon. The variant Shanna is an Anglicisation of ' ...
.
Knowledge Knowledge is a familiarity or awareness, of someone or something, such as facts A fact is an occurrence in the real world. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to exp ...
is the understanding based on extensive experience dealing with information on a subject. For example, the height of
Mount Everest Mount Everest (Chinese characters, Chinese: ''Zhūmùlǎngmǎ''; ; Tibetic languages, Tibetan: ''Chomolungma'' ) is List of highest mountains on Earth, Earth's highest mountain above sea level, located in the Mahalangur Himal sub-range ...

Mount Everest
is generally considered data. The height can be measured precisely with an
altimeter An altimeter or an altitude meter is an instrument used to measure the altitude of an object above a fixed level. The measurement of altitude is called altimetry, which is related to the term bathymetry Bathymetry (pronounced ) is the study of un ...

altimeter
and entered into a database. This data may be included in a book along with other data on Mount Everest to describe the mountain in a manner useful for those who wish to decide on the best method to climb it. An understanding based on experience climbing mountains that could advise persons on the way to reach Mount Everest's peak may be seen as "knowledge". The practical climbing of Mount Everest's peak based on this knowledge may be seen as "wisdom". In other words, wisdom refers to the practical application of a person's knowledge in those circumstances where good may result. Thus wisdom complements and completes the series "data", "information" and "knowledge" of increasingly abstract concepts. Data are often assumed to be the least abstract concept, information the next least, and knowledge the most abstract. In this view, data becomes information by interpretation; e.g., the height of Mount Everest is generally considered "data", a book on Mount Everest geological characteristics may be considered "information", and a climber's guidebook containing practical information on the best way to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of meanings that ranges from everyday usage to technical use. This view, however, has also been argued to reverse how data emerges from information, and information from knowledge. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation. Beynon-Davies uses the concept of a
sign A sign is an object, quality, event, or entity whose presence or occurrence indicates the probable presence or occurrence of something else. A natural sign bears a causal relation to its object—for instance, thunder is a sign of storm, or ...

sign
to differentiate between data and information; data are a series of symbols, while information occurs when the symbols are used to refer to something. Before the development of computing devices and machines, people had to manually collect data and impose patterns on it. Since the development of computing devices and machines, these devices can also collect data. In the 2010s, computers are widely used in many fields to collect data and sort or process it, in disciplines ranging from
marketing Marketing is the process of intentionally stimulating demand for and purchases of goods and services; potentially including selection of a target audience; selection of certain attributes or themes to emphasize in advertising; operation of adve ...

marketing
, analysis of
social servicesSocial services are a range of public services provided by the government, private, profit and non-profit organizations. These public services aim to create more effective organizations, build stronger communities, and promote equality and opportunit ...
usage by citizens to scientific research. These patterns in data are seen as information that can be used to enhance knowledge. These patterns may be interpreted as "
truth Truth is the property of being in accord with fact or reality.Merriam-Webster's Online Dictionarytruth 2005 In everyday language, truth is typically ascribed to things that aim to represent reality or otherwise correspond to it, such as belie ...

truth
" (though "truth" can be a subjective concept) and may be authorized as aesthetic and ethical criteria in some disciplines or cultures. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken. Mechanical computing devices are classified according to how they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A
digital computer A computer is a machine A machine is a man-made device that uses power to apply forces and control movement to perform an action. Machines can be driven by animals and people A people is a plurality of person A person (plural ...

digital computer
represents a piece of data as a sequence of symbols drawn from a fixed
alphabet An alphabet is a standardized set of basic written symbols A symbol is a mark, sign, or word In linguistics, a word of a spoken language can be defined as the smallest sequence of phonemes that can be uttered in isolation with semantic ...
. The most common digital computers use a binary alphabet, that is, an alphabet of two characters typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet. Some special forms of data are distinguished. A
computer program A computer program is a collection of instructions that can be executed by a computer to perform a specific task. A computer program is usually written by a computer programmer in a programming language A programming language is a formal ...
is a collection of data, which can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably
Lisp A lisp is a speech impairment in which a person misarticulates sibilant In phonetics Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. P ...
and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata — the descriptive i ...

metadata
, that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.


Data documents

Whenever data needs to be registered, data exists in the form of a data
document A document is a written Writing is a medium of human communication Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subject (philosophy), entities or Organization, groups t ...
s. Kinds of data documents include: *
data repository A data library, data archive, or data repository is a collection of numeric and/or geospatial Geographic data and information is defined in the ISO/TC 211 series of standards as data and information having an implicit or explicit association with ...
*data study *
data set A data set (or dataset) is a collection of data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sense, data are a set of values of qualitative property, qualitative or quantity, quantita ...
*
software Software is a collection of instructions that tell a computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operatio ...

software
*
data paper Data publishing (also data publication) is the act of releasing research data in academic publishing, published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available t ...
*
database In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and softw ...
*data handbook * data journal Some of these data documents (data repositories, data studies, data sets, and software) are indexed in
Data Citation Index Data are units of information Information can be thought of as the resolution of uncertainty; it answers the question of "What an entity is" and thus defines both its essence and the nature of its characteristics. The concept of ''inform ...
es, while data papers are indexed in traditional bibliographic databases, e.g.,
Science Citation Index The Science Citation Index (SCI) is a citation index A citation index is a kind of bibliographic index, an index of citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded ...
. See further.


Data collection

Gathering data can be accomplished through a primary source (the researcher is the first person to obtain the data) or a secondary source (the researcher obtains the data that has already been collected by other sources, such as data disseminated in a scientific journal). Data analysis methodologies vary and include data triangulation and data percolation. The latter offers an articulate method of collecting, classifying, and analyzing data using five possible angles of analysis (at least three) to maximize the research's objectivity and permit an understanding of the phenomena under investigation as complete as possible: qualitative and quantitative methods, literature reviews (including scholarly articles), interviews with experts, and computer simulation. The data are thereafter "percolated" using a series of pre-determined steps so as to extract the most relevant information.


In other fields

Although data are also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given".
Peter Checkland Peter Checkland (born 18 December 1930, in Birmingham Birmingham ( ) is a City status in the United Kingdom, city and metropolitan borough in the West Midlands (county), West Midlands, England. It is the second-largest city, urban area and ES ...
introduced the term ''capta'' (from the Latin ''capere'', “to take”) to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented.
Johanna Drucker Johanna Drucker (born May 30, 1952) is an American author, book artist, visual theorist, and cultural critic. Her scholarly writing documents and critiques visual language: letterform A letterform, letter-form or letter form, is a term used especia ...

Johanna Drucker
has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using ''data'' may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent. The term ''capta'', which emphasizes the act of observation as constitutive, is offered as an alternative to ''data'' for visual representations in the humanities.


See also

*
Biological data This is a list of file formats used by computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operations known as Comp ...
*
Computer memory In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and soft ...
*
Data acquisitionData acquisition is the process of sampling signals that measure real world physical conditions and converting the resulting samples into digital numeric values that can be manipulated by a computer. Data acquisition systems, abbreviated by the initi ...
*
Data analysis Data analysis is a process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data ...
*
Data bankIn telecommunication Telecommunication is the transmission of information Information can be thought of as the resolution of uncertainty; it answers the question of "What an entity is" and thus defines both its essence and the nature of it ...
*
Data cable A data cable is any media that allows baseband transmissions (binary 1,0s) from a transmitter to a receiver. Examples Are: *Networking Media **Ethernet Cables ( Cat5, Cat5e, Cat6, Cat6a) **Token Ring Cables ( Cat4) **Coaxial cable Coaxial cab ...
*
Data curationData curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for re ...
* Dark data *
Data domain In data management Data Management comprises all List of academic disciplines, disciplines related to managing data as a valuable resource. Concept The concept of data management arose in the 1980s as technology moved from sequential access, sequ ...
*
Data elementIn metadata Metadata is " data" that provides information about other data". In other words, it is "data about data". Many distinct types of metadata exist, including descriptive metadata, structural metadata, administrative metadata, reference m ...
* Data farming *
Data governance Data governance is a term used on both a macro and a micro level. The former is a political concept and forms part of international relations and Internet governance; the latter is a data management Data Management comprises all List of academic di ...
*
Data integrityData integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The term ...
*
Data maintenance Data Management comprises all disciplines related to managing data Data are units of information Information can be thought of as the resolution of uncertainty; it answers the question of "What an entity is" and thus defines both its es ...
*
Data management Data Management comprises all List of academic disciplines, disciplines related to managing data as a valuable resource. Concept The concept of data management arose in the 1980s as technology moved from sequential access, sequential processing ( ...
*
Data mining Data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and st ...
*
Data modeling Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. Overview Data modeling is a process used to define and analyze data requirements needed to sup ...
*
Data point In statistics Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with ...
*
Data visualization Data visualization (often abbreviated data viz) is an interdisciplinary field that deals with the Graphics, graphic Representation (arts), representation of data. It is a particularly efficient way of communicating when the data is numerous as f ...
*
Computer data processing A computer is a machine A machine is a man-made device that uses power to apply forces and control movement to perform an action. Machines can be driven by animals and people A people is a plurality of person A person (plural ...
*
Data preservationData preservation is the act of conserving and maintaining both the safety and integrity of data Data are units of information Information can be thought of as the resolution of uncertainty; it answers the question of "What an entity is" ...
* *
Data protectionInformation privacy is the relationship between the collection and dissemination of data, technology Technology ("science of craft", from Ancient Greek, Greek , ''techne'', "art, skill, cunning of hand"; and , ''wikt:-logia, -logia'') is the s ...
*
Data remanence Data remanence is the residual representation of digital data Digital usually refers to something using digits, particularly binary digits. Technology and computing Hardware * Digital electronics, electronic circuits which operate using digital s ...
*
Data science Data science is an Interdisciplinarity, interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights ...
*
Data set A data set (or dataset) is a collection of data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sense, data are a set of values of qualitative property, qualitative or quantity, quantita ...
*
Data structure Image:Hash table 3 1 1 0 1 0 0 SP.svg, 315px, A data structure known as a hash table. In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a ...
*
Data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business reporting, reporting and data analysis and is considered a core component of business intelligence. DWs are central reposi ...
*
Database In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and softw ...
*
Datasheet Front page of a floppy disk controller datasheet (1979) A data sheet, data-sheet, or spec sheet is a document A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of n ...
* Environmental data rescue *
Fieldwork Field research, field studies, or fieldwork is the empirical research, collection of raw data outside a laboratory, library, or workplace setting. The approaches and methods used in field research vary across branches of science, disciplines. ...
* Information engineering *
Machine learning Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, ...

Machine learning
*
Open data Open Data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open-source data movement are similar t ...

Open data
*
Scientific data archivingResearch data archiving is the Computer_data_storage#Volatility, long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how muc ...
*
Statistics Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical ...

Statistics
* Secondary Data


References


External links


Data is a singular noun
(a detailed assessment) {{Authority control Data management