Raw data
   HOME

TheInfoList



OR:

Raw data, also known as primary data, are ''
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
'' (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score (after
test score A test score is a piece of information, usually a number, that conveys the performance of an examinee on a test. One formal definition is that it is "a summary of the evidence contained in an examinee's responses to the items of a test that are r ...
s). If a scientist sets up a computerized
thermometer A thermometer is a device that measures temperature or a temperature gradient (the degree of hotness or coldness of an object). A thermometer has two important elements: (1) a temperature sensor (e.g. the bulb of a mercury-in-glass thermometer ...
which records the temperature of a chemical mixture in a test tube every minute, the list of temperature readings for every minute, as printed out on a spreadsheet or viewed on a computer screen are "raw data". Raw data have not been subjected to processing, "cleaning" by researchers to remove
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s, obvious instrument reading errors or data entry errors, or any analysis (e.g., determining
central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...
aspects such as the
average In ordinary language, an average is a single number taken as representative of a list of numbers, usually the sum of the numbers divided by how many numbers are in the list (the arithmetic mean). For example, the average of the numbers 2, 3, 4, 7 ...
or
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
result). As well, raw data have not been subject to any other manipulation by a software program or a human researcher, analyst or technician. They are also referred to as ''primary'' data. Raw data is a relative term (see
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
), because even once raw data have been "cleaned" and processed by one team of researchers, another team may consider these processed data to be "raw data" for another stage of research. Raw data can be inputted to a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. Computer programs are one component of software, which also includes software documentation, documentation and oth ...
or used in manual procedures such as analyzing
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
from a survey. The term "raw data" can refer to the binary data on electronic storage devices, such as hard disk drives (also referred to as "low-level data").


Generating data

Data has two ways of being created or made. The first is what is called 'captured data', and is found through purposeful investigation or analysis. The second is called 'exhaust data', and is gathered usually by machines or terminals as a secondary function. For example, cash registers, smartphones, and speedometers serve a main function but may collect data as a secondary task. Exhaustive data is usually too large or of little use to process and becomes 'transient' or thrown away.


Examples

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, raw data may have the following attributes: it may possibly contain human, machine, or instrument errors, it may not be validated; it might be in different area (
colloquial Colloquialism (), also called colloquial language, everyday language or general parlance, is the linguistic style used for casual (informal) communication. It is the most common functional style of speech, the idiom normally employed in conve ...
) formats; uncoded or unformatted; or some entries might be "suspect" (e.g.,
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s), requiring
confirmation In Christian denominations that practice infant baptism, confirmation is seen as the sealing of the covenant created in baptism. Those being confirmed are known as confirmands. For adults, it is an affirmation of belief. It involves laying on ...
or
citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose o ...
. For example, a data input sheet might contain dates as raw data in many forms: "31st January 1999", "31/01/1999", "31/1/99", "31 Jan", or "today". Once captured, this raw data may be processed stored as a normalized format, perhaps a
Julian date The Julian day is the continuous count of days since the beginning of the Julian period, and is used primarily by astronomers, and in software for easily calculating elapsed days between two events (e.g. food production date and sell by date). ...
, to make it easier for computers and humans to interpret during later processing. Raw data (sometimes colloquially called "sources" data or "eggy" data, the latter a reference to the data being "uncooked", that is, "unprocessed", like a raw egg) are the data input to processing. A distinction is made between ''data'' and ''information'', to the effect that information is the ''end'' product of ''data'' processing. Raw data that has undergone processing are sometimes referred to as "cooked" data in a colloquial sense. Although raw data has the potential to be transformed into "
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
," extraction, organization, analysis, and formatting for presentation are required before raw data can be transformed into usable information. For example, a point-of-sale terminal (POS terminal, a computerized
cash register A cash register, sometimes called a till or automated money handling system, is a mechanical or electronic device for registering and calculating transactions at a point of sale. It is usually attached to a drawer for storing cash and other ...
) in a busy supermarket collects huge volumes of raw data each day about customers' purchases. However, this list of grocery items and their prices and the time and date of purchase does not yield much information until it is processed. Once processed and analyzed by a software program or even by a researcher using a pen and paper and a
calculator An electronic calculator is typically a portable electronic device used to perform calculations, ranging from basic arithmetic to complex mathematics. The first solid-state electronic calculator was created in the early 1960s. Pocket-sized ...
, this raw data may indicate the particular items that each customer buys, when they buy them, and at what price; as well, an analyst or manager could calculate the average total sales per customer or the average expenditure per day of the week by hour. This processed and analyzed data provides information for the manager, that the manager could then use to help her determine, for example, how many cashiers to hire and at what times. Such ''information'' could then become ''data'' for further processing, for example as part of a predictive
marketing Marketing is the process of exploring, creating, and delivering value to meet the needs of a target market in terms of goods and services; potentially including selection of a target audience; selection of certain attributes or themes to emph ...
campaign. As a result of processing, raw data sometimes ends up being put in a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases ...
, which enables the raw data to become accessible for further processing and analysis in any number of different ways.
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web. He is a Professorial Fellow of Computer Science at the University of Oxford and a profes ...
(inventor of the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web ...
) argues that sharing raw data is important for society.
Inspired
b
a post
by
Rufus Pollock Rufus Pollock (born 1980) is a British economist, activist and social entrepreneur. He has been a leading figure in the global open knowledge and open data movements, starting with his founding in 2004 of the non-profit Open Knowledge Foundation w ...
of the
Open Knowledge Foundation Open Knowledge Foundation (OKF) is a global, non-profit network that promotes and shares information at no charge, including both content and data. It was founded by Rufus Pollock on 20 May 2004 in Cambridge, UK. It is incorporated in England a ...
his call to action i
"Raw Data Now"
meaning that everyone should demand that governments and businesses share the data they collect as raw data. He points out that "data drives a huge amount of what happens in our lives… because somebody takes the data and does something with it." To Berners-Lee, it is essentially from this sharing of raw data, that advances in science will emerge. Advocates of
open data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements ...
argue that once citizens and civil society organizations have access to data from businesses and governments, it will enable citizens and NGOs to do their ''own'' analysis of the data, which can empower people and civil society. For example, a government may claim that its policies are reducing the
unemployment rate Unemployment, according to the OECD (Organisation for Economic Co-operation and Development), is people above a specified age (usually 15) not being in paid employment or self-employment but currently available for work during the refere ...
, but a
poverty Poverty is the state of having few material possessions or little income. Poverty can have diverse
advocacy group may be able to have its staff
econometricians Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8â ...
do their own analysis of the raw data, which may lead this group to draw different conclusions about the data set.


See also

*
Standard score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...


References


Further reading


Give Us the Data Raw, and Give it to Us Now
- the blog post from Rufus Pollock tha
inspired
Tim Berners-Lee * Tim Berners-Lee Gives the Web a New Definition {{DEFAULTSORT:Raw Data Data types Research Information