In
connection-oriented communication, a data stream is the
transmission of a sequence of
digitally encoded signals to convey
information
Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
. Typically, the transmitted symbols are grouped into a series of
packets.
Data streaming has become ubiquitous. Anything transmitted over the
Internet
The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
is transmitted as a data stream. Using a
mobile phone
A mobile phone or cell phone is a portable telephone that allows users to make and receive calls over a radio frequency link while moving within a designated telephone service area, unlike fixed-location phones ( landline phones). This rad ...
to have a conversation transmits the sound as a data stream.
Formal definition
In a formal way, a data stream is any
ordered pair
In mathematics, an ordered pair, denoted (''a'', ''b''), is a pair of objects in which their order is significant. The ordered pair (''a'', ''b'') is different from the ordered pair (''b'', ''a''), unless ''a'' = ''b''. In contrast, the '' unord ...
where:
#
is a
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
of
tuple
In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
s and
#
is a sequence of positive
real time intervals.
Content
Data Stream contains different sets of data, that depend on the chosen data format.
* Attributes – each attribute of the data stream represents a certain type of data, e.g. segment / data point ID, timestamp,
geodata.
*
Timestamp attribute helps to identify when an event occurred.
* Subject ID is an encoded-by-algorithm ID, that has been extracted out of a
cookie.
*
Raw Data includes information straight from the data provider without being processed by an algorithm nor human.
*
Processed Data is a data that has been prepared (somehow modified, validated or cleaned), to be used for future actions.
Usage
There are various areas where data streams are used:
*
Fraud
In law, fraud is intent (law), intentional deception to deprive a victim of a legal right or to gain from a victim unlawfully or unfairly. Fraud can violate Civil law (common law), civil law (e.g., a fraud victim may sue the fraud perpetrato ...
detection & scoring – raw data is used as source data for an anti-fraud algorithm (
data analysis techniques for fraud detection). For example, timestamps, cookie occurrences or analysis of data points are used within the scoring system to detect fraud or to make sure that a message receiver is not a bot (so-called Non-Human Traffic).
*
Artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
– raw data is treated like a train set and a test set during AI and
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
algorithms building.
*
Raw data is used for profiling and personalization to customize user profiles and divide them for segmentation, e.g., per gender or location (based on
data point).
*
Business intelligence
Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include Financial reporting, reporting, online an ...
– raw data is a source of information for BI systems, used for enriching user profiles with detailed information about them, e.g., purchase path or geodata. This information is used for
business analysis and predictive research.
* Targeting – processed data by data scientists improve online campaigns and is used for reaching the target audience.
* CRM Enrichment – raw data is integrated with
customer-relationship management system. CRM integration allows to fill the gaps in users' profiles with demographic data, interests or buying intentions.
Integration
Core integrations with data streams are:
* Data streams are integrated with systems such as
customer data platform
A customer data platform (CDP) is a collection of software which creates a persistent, unified customer database that is accessible to other systems. Data is pulled from multiple sources, cleaned and combined to create a single customer profile. ...
(CDP), customer relationship management (CRM) or
data management platform (DMP) to enrich users' profiles with external data. It is possible to expand the knowledge about existing users by using external sources.
* Data streams are used to enrich business intelligence systems and make analysis more precise and conclusions more accurate.
* In the case of
content management system
A content management system (CMS) is computer software used to manage the creation and modification of digital content ( content management).''Managing Enterprise Content: A Unified Content Strategy''. Ann Rockley, Pamela Kostur, Steve Manning. New ...
(CMS) integration, Data Stream is used to identify the users and personalize their visit, even if it's their first one. By data analysis, the actual content of the website is adapted to the user.
* Data streams are integrated with
demand side platform (DSP) within programmatic advertising ecosystem. Parties (e.g., advertisers) can exchange the users' IDs and concatenate with them existing profiles.
* Data streams are used to choose respective user segments (e.g., people interested in the automotive industry) and use them in an online campaign. Segments are enriched with more user characteristics out of data stream and then sent to DSP.
Data sources visible
In a data stream it is visible what device has been used by the user side – it is visible on
user agent
On the Web, a user agent is a software agent responsible for retrieving and facilitating end-user interaction with Web content. This includes all web browsers, such as Google Chrome and Safari
A safari (; originally ) is an overland jour ...
:
* mobile – when a user uses a mobile browser to explore, it has narrow screen resolution and mobile app version, respectively;
* desktop – when a user uses a desktop browser or app version.
The following information is shared out of used device:
* Actual URL to the visited website, where an event occurred
* User Agent
*
Geolocation
Geopositioning is the process of determining or estimating the geographic position of an object or a person.
Geopositioning yields a set of Geographic coordinate system, geographic coordinates (such as latitude and longitude) in a given map datum ...
*
Internet Protocol
The Internet Protocol (IP) is the network layer communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routing function enables internetworking, and essentially establishes the Internet.
IP ...
(IP)
Formats
A
data point is a tag that collects information about a certain action, performed by a user on a website. Data points exists in two types, the values of which are used to create appropriate audiences. Those are:
* 'event' with information about occurrences of the specific event (e.g., click on a link or displaying ad)
* 'attribute' with numerical or alphanumerical values.
Segment is a logical statement, built on specific Data Points using AND, OR or NOT operators.
Hybrid data – raw data out of both Data Point and Segment data formats.
URLs – is a set of information about a particular
URL that has been visited.
GDPR
Information gathered out of websites are based on user behavior. Data providers deliver both personal or non-personal information. There are two types of user data available in data stream:
*
Personally identifiable information
Personal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person.
The abbreviation PII is widely used in the United States, but the phrase it abbreviates has fou ...
(PII) – information that allows clearly or by combining with data identification methods identify a person. Examples of PII are: insurance ID, email address, phone number,
IP address
An Internet Protocol address (IP address) is a numerical label such as that is assigned to a device connected to a computer network that uses the Internet Protocol for communication. IP addresses serve two main functions: network interface i ...
, geolocation,
biometric data.
* Non-personally identifiable information (non-PII) is information that can't be used to identify a person or to track a location. A cookie or a device ID is an example of non-PII.
See also
*
Streaming algorithm
In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically one-pass algorithm, just one. These algorithms are desi ...
References
{{DEFAULTSORT:Data Stream
Computing terminology
Big data
Business analysis