Data processing is the
collection and manipulation of digital data to produce meaningful information. Data processing is a form of ''information processing'', which is the modification (processing) of information in any manner detectable by an observer.
[Data processing is distinct from '' word processing'', which is manipulation of text specifically rather than data generally. ]
Functions
Data processing may involve various processes, including:
*
Validation – Ensuring that supplied data is correct and relevant.
*
Sorting – "arranging items in some sequence and/or in different sets."
*
Summarization (statistical) or
(automatic) – reducing detailed data to its main points.
*
Aggregation – combining multiple pieces of data.
*
Analysis – the "collection,
organization
An organization or organisation (English in the Commonwealth of Nations, Commonwealth English; American and British English spelling differences#-ise, -ize (-isation, -ization), see spelling differences) is an legal entity, entity—such as ...
, analysis, interpretation and presentation of data."
*
Reporting – list detail or summary data or computed information.
*
Classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
– separation of data into various categories.
History
The
United States Census Bureau
The United States Census Bureau, officially the Bureau of the Census, is a principal agency of the Federal statistical system, U.S. federal statistical system, responsible for producing data about the American people and American economy, econ ...
history illustrates the evolution of data processing from manual through electronic procedures.
Manual data processing
Although widespread use of the term ''data processing'' dates only from the 1950s,
data processing functions have been performed manually for millennia. For example,
bookkeeping
Bookkeeping is the recording of financial transactions, and is part of the process of accounting in business and other organizations. It involves preparing source documents for all transactions, operations, and other events of a business. T ...
involves functions such as posting transactions and producing reports like the
balance sheet
In financial accounting, a balance sheet (also known as statement of financial position or statement of financial condition) is a summary of the financial balances of an individual or organization, whether it be a sole proprietorship, a business ...
and the
cash flow statement
In financial accounting, a cash flow statement, also known as ''statement of cash flows'', is a financial statement that shows how changes in balance sheet accounts and income affect cash and cash equivalents, and breaks the analysis down to oper ...
. Completely manual methods were augmented by the application of
mechanical
Mechanical may refer to:
Machine
* Machine (mechanical), a system of mechanisms that shape the actuator input to achieve a specific application of output forces and movement
* Mechanical calculator, a device used to perform the basic operations o ...
or electronic
calculator
An electronic calculator is typically a portable electronic device used to perform calculations, ranging from basic arithmetic to complex mathematics.
The first solid-state electronic calculator was created in the early 1960s. Pocket-si ...
s. A person whose job was to perform calculations manually or using a calculator was called a "
computer
A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
."
The
1890 United States census
The 1890 United States census was taken beginning June 2, 1890. The census determined the resident population of the United States to be 62,979,766, an increase of 25.5 percent over the 50,189,209 persons enumerated during the 1880 United States ...
schedule was the first to gather data by individual rather than
household
A household consists of one or more persons who live in the same dwelling. It may be of a single family or another type of person group. The household is the basic unit of analysis in many social, microeconomic and government models, and is im ...
. A number of questions could be answered by making a check in the appropriate box on the form. From 1850 to 1880 the Census Bureau employed "a system of tallying, which, by reason of the increasing number of combinations of classifications required, became increasingly complex. Only a limited number of combinations could be recorded in one tally, so it was necessary to handle the schedules 5 or 6 times, for as many independent tallies."
"It took over 7 years to publish the results of the 1880 census"
using manual processing methods.
Automatic data processing
The term ''
automatic data processing
Automatic Data Processing, Inc. (ADP) is an American provider of human resources management software and services, headquartered in Roseland, New Jersey.
History
In 1949, Henry Taub founded Automatic Payrolls, Inc. as a manual payroll processin ...
'' was applied to operations performed by means of
unit record equipment
Starting at the end of the nineteenth century, well before the advent of electronic computers, data processing was performed using Electromechanics, electromechanical machines collectively referred to as unit record equipment, electric accounting ...
, such as
Herman Hollerith
Herman Hollerith (February 29, 1860 – November 17, 1929) was a German-American statistician, inventor, and businessman who developed an electromechanical tabulating machine for punched cards to assist in summarizing information and, later, in ...
's application of
punched card
A punched card (also punch card or punched-card) is a stiff paper-based medium used to store digital information via the presence or absence of holes in predefined positions. Developed over the 18th to 20th centuries, punched cards were widel ...
equipment for the
1890 United States census
The 1890 United States census was taken beginning June 2, 1890. The census determined the resident population of the United States to be 62,979,766, an increase of 25.5 percent over the 50,189,209 persons enumerated during the 1880 United States ...
. "Using Hollerith's punchcard equipment, the Census Office was able to complete tabulating most of the 1890 census data in 2 to 3 years, compared with 7 to 8 years for the 1880 census. It is estimated that using Hollerith's system saved some $5 million in processing costs"
[ in 1890 dollars even though there were twice as many questions as in 1880.
]
Computerized data processing
Computerized data processing, or electronic data processing
Electronic data processing (EDP) or business information processing can refer to the use of automated methods to process commercial data. Typically, this uses relatively simple, repetitive activities to process large volumes of similar information ...
represents a later development, with a computer used instead of several independent pieces of equipment. The Census Bureau first made limited use of electronic computers
A computer is a machine that can be programmed to automatically carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic sets of operations known as ''programs'', wh ...
for the 1950 United States census
The 1950 United States census, conducted by the Census Bureau, determined the resident population of the United States to be 151,325,798, an increase of 14.5 percent over the 131,669,275 persons enumerated during the 1940 census.
This was t ...
, using a UNIVAC I
The UNIVAC I (Universal Automatic Computer I) was the first general-purpose electronic digital computer design for business application produced in the United States. It was designed principally by J. Presper Eckert and John Mauchly, the invento ...
system,[ delivered in 1952.
]
Other developments
The term ''data processing'' has mostly been subsumed by the more general term ''information technology
Information technology (IT) is a set of related fields within information and communications technology (ICT), that encompass computer systems, software, programming languages, data processing, data and information processing, and storage. Inf ...
'' (IT). The older term "data processing" is suggestive of older technologies. For example, in 1996 the ''Data Processing Management Association
The Association of Information Technology Professionals (AITP) is a professional association that focuses on information technology education for business professionals. The group is a non-profit US-oriented group, but its activities are performe ...
'' (DPMA) changed its name to the ''Association of Information Technology Professionals''. Nevertheless, the terms are approximately synonymous.
Applications
Commercial data processing
Commercial data processing involves a large volume of input data, relatively few computational operations, and a large volume of output. For example, an insurance company needs to keep records on tens or hundreds of thousands of policies, print and mail bills, and receive and post payments.
Data analysis
In science and engineering, the terms ''data processing'' and ''information system
An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, Information Processing and Management, store, and information distribution, distribute information. From a sociotechnical perspective, info ...
s'' are considered too broad, and the term ''data processing'' is typically used for the initial stage followed by a data analysis
Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
in the second stage of the overall data handling.
Data analysis uses specialized algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s and statistical
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
calculations that are less often observed in a typical general business environment. For data analysis, software suites like SPSS
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Versi ...
or SAS, or their free counterparts such as DAP, gretl
gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for ''G''nu ''R''egression, ''E''conometrics and ''T''ime-series ''L''ibrary.
It has both a graphical user interface (GUI) and a command-line interf ...
, or PSPP are often used. These tools are usually helpful for processing various huge data sets, as they are able to handle enormous amount of statistical analysis.
Systems
A data processing system is a combination of machine
A machine is a physical system that uses power to apply forces and control movement to perform an action. The term is commonly applied to artificial devices, such as those employing engines or motors, but also to natural biological macromol ...
s, people, and processes that for a set of inputs produces a defined set of outputs. The inputs and outputs are interpreted as data
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
, fact
A fact is a truth, true data, datum about one or more aspects of a circumstance. Standard reference works are often used to Fact-checking, check facts. Science, Scientific facts are verified by repeatable careful observation or measurement by ...
s, information
Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
etc. depending on the interpreter's relation to the system.
A term commonly used synonymously with ''data or storage (codes) processing system'' is ''information system
An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, Information Processing and Management, store, and information distribution, distribute information. From a sociotechnical perspective, info ...
''. With regard particularly to electronic data processing
Electronic data processing (EDP) or business information processing can refer to the use of automated methods to process commercial data. Typically, this uses relatively simple, repetitive activities to process large volumes of similar information ...
, the corresponding concept is referred to as electronic data processing system.
Examples
Simple example
A very simple example of a data processing system is the process of maintaining a check register. Transactions— checks and deposits— are recorded as they occur and the transactions are summarized to determine a current balance. Monthly the data recorded in the register is reconciled with a hopefully identical list of transactions processed by the bank.
A more sophisticated record keeping system might further identify the transactions— for example deposits by source or checks by type, such as charitable contributions. This information might be used to obtain information like the total of all contributions for the year.
The important thing about this example is that it is a ''system'', in which, all transactions are recorded consistently, and the same method of bank reconciliation is used each time.
Real-world example
This is a flowchart
A flowchart is a type of diagram that represents a workflow or process. A flowchart can also be defined as a diagrammatic representation of an algorithm, a step-by-step approach to solving a task.
The flowchart shows the steps as boxes of v ...
of a data processing system combining manual and computerized processing to handle accounts receivable
Accounts receivable, abbreviated as AR or A/R, are legally enforceable claims for payment held by a business for goods supplied or services rendered that customers have ordered but not paid for. The accounts receivable process involves customer on ...
, billing, and general ledger
In bookkeeping, a general ledger is a bookkeeping ledger in which accounting data are posted from General journal, journals and aggregated from subledgers, such as accounts payable, accounts receivable, cash management, fixed assets, purchasing ...
See also
*Big data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
*Computation
A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms.
Mechanical or electronic devices (or, hist ...
*Computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
*Decision-making software
Decision-making software (DM software) is software for computer applications that help individuals and organisations make choices and take decisions, typically by ranking, prioritizing or choosing from a number of options.
An early example of DM s ...
*Information Age
The Information Age is a historical period that began in the mid-20th century. It is characterized by a rapid shift from traditional industries, as established during the Industrial Revolution, to an economy centered on information technology ...
*Information and communications technology
Information and communications technology (ICT) is an extensional term for information technology (IT) that stresses the role of unified communications and the integration of telecommunications (telephone lines and wireless signals) and computer ...
*Information technology
Information technology (IT) is a set of related fields within information and communications technology (ICT), that encompass computer systems, software, programming languages, data processing, data and information processing, and storage. Inf ...
*Scientific computing
Computational science, also known as scientific computing, technical computing or scientific computation (SC), is a division of science, and more specifically the Computer Sciences, which uses advanced computing capabilities to understand and s ...
Notes
External links
References
Further reading
*Bourque, Linda B.; Clark, Virginia A. (1992) ''Processing Data: The Survey Example''. (Quantitative Applications in the Social Sciences, no. 07-085). SAGE Publications
Sage Publishing, formerly SAGE Publications, is an American independent academic publishing company, founded in 1965 in New York City by Sara Miller McCune and now based in the Newbury Park neighborhood of Thousand Oaks, California.
Sage ...
.
*Levy, Joseph (1967)
Punched Card Data Processing
'. McGraw-Hill Book Company.
{{Authority control
Computer data