Machine-generated data is
information
Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
automatically generated by a
computer process
In computing, a process is the instance of a computer program that is being executed by one or many threads. There are many different process models, some of which are light weight, but almost all processes (even entire virtual machines) are root ...
,
application, or other mechanism without the active intervention of a human. While the term dates back over fifty years, there is some current indecision as to the scope of the term. Monash Research's Curt Monash defines it as "data that was produced entirely by machines OR data that is more about observing humans than recording their choices." Meanwhile, Daniel Abadi, CS Professor at
Yale
Yale University is a private research university in New Haven, Connecticut. Established in 1701 as the Collegiate School, it is the third-oldest institution of higher education in the United States and among the most prestigious in the wo ...
, proposes a narrower definition, "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action." Regardless of definition differences, both exclude data manually entered by a person.
[Monash, Three Broad Categories of Data] Machine-generated data crosses all
industry sector
Industry classification or industry taxonomy is a type of economic taxonomy that classifies companies, organizations and traders into industrial groupings based on similar production processes, similar products, or similar behavior in financial m ...
s. Often and increasingly, humans are unaware their actions are generating the data.
Relevance
Machine-generated data has no single form; rather, the type, format,
metadata, and frequency respond to some particular business purpose. Machines often create it on a defined time schedule or in response to a state change, action, transaction, or other event. Since the event is historical, the data is not prone to be updated or modified. Partly because of this quality, the
U.S.
The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America. It consists of 50 states, a federal district, five major unincorporated territor ...
court system
A court is any person or institution, often as a government institution, with the authority to adjudicate legal disputes between parties and carry out the administration of justice in civil, criminal, and administrative matters in accord ...
s consider machine-generated data as highly reliable.
Machine-generated data is the lifeblood of the
Internet of Things
The Internet of things (IoT) describes physical objects (or groups of such objects) with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other com ...
(IoT).
Growth
In 2009,
Gartner
Gartner, Inc is a technological research and consulting firm based in Stamford, Connecticut that conducts research on technology and shares this research both through private consulting as well as executive programs and conferences. Its clients ...
published that data will grow by 650% over the following five years.
[ScienceLogic] Most of the growth in data is the byproduct of machine-generated data.
IDC estimated that in 2020, there will be 26 times more connected things than people. Wikibon issued a forecast of $514 billion to be spent on the
Industrial Internet
The industrial internet of things (IIoT) refers to interconnected sensors, instruments, and other devices networked together with computers' industrial applications, including manufacturing and energy management. This connectivity allows for data ...
in 2020.
Wikibon
Wikibon is a community of practitioners and consultants on technology and business systems that uses open source sharing of free advisory knowledge. The company was launched in 2007 by David Vellante, David Floyer and Peter Burris and is headqua ...
Processing
Given the fairly static yet voluminous nature of machine-generated data, data owners rely on highly scalable tools to process and analyze the resulting dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the d ...
. Almost all machine-generated data is unstructured but then derived into a common structure. Typically, these derived structures contain many data point
In statistics, a unit of observation is the unit described by the data that one analyzes. A study may treat groups as a unit of observation with a country as the unit of analysis, drawing conclusions on group characteristics from data collected a ...
s/columns. With these data points, the challenge lies mostly with analyzing the data. Given high performance requirements along with large data sizes, traditional database index
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without ...
ing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar database
A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns (by eliminating the need to r ...
s as only particular "columns" of the dataset would be accessed during particular analysis.
Examples
* Web server logs[Monash, Examples of Machine Generated Data]
*Call detail record
A call detail record (CDR) is a data record produced by a telephone exchange or other telecommunications equipment that documents the details of a telephone call or other telecommunications transactions (e.g., text message) that passes through that ...
s
*Financial instrument
Financial instruments are monetary contracts between parties. They can be created, traded, modified and settled. They can be cash (currency), evidence of an ownership interest in an entity or a contractual right to receive or deliver in the form ...
trades
*Network event log
In software engineering, tracing involves a specialized use of logging to record information about a program's execution. This information is typically used by programmers for debugging purposes, and additionally, depending on the type and detail ...
s
* Security information and event management
Security information and event management (SIEM) is a field within the field of computer security, where software products and services combine security information management (SIM) and security event management (SEM). They provide real-time a ...
(SIEM) logs
*Telemetry
Telemetry is the in situ collection of measurements or other data at remote points and their automatic transmission to receiving equipment (telecommunication) for monitoring. The word is derived from the Greek roots ''tele'', "remote", and ' ...
collected by the government
Notes
Reference List
Bibliography
*
*
*
*
*
*
*
{{refend
Computer data