Data Leakage Prevention
   HOME

TheInfoList



OR:

Data loss prevention (DLP) software detects potential
data breaches A data breach, also known as data leakage, is "the unauthorized exposure, disclosure, or loss of personal information". Attackers have a variety of motives, from financial gain to political activism, political repression, and espionage. There ar ...
/data exfiltration transmissions and prevents them by monitoring, detecting and blocking sensitive data while ''in use'' (endpoint actions), ''in motion'' (
network traffic Network traffic or data traffic is the amount of data moving across a network at a given point of time. Network data in computer networks is mostly encapsulated in network packets, which provide the load in the network. Network traffic is the main ...
), and ''at rest'' (
data storage Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are con ...
). The terms "
data loss Data loss is an error condition in information systems in which information is destroyed by failures (like failed spindle motors or head crashes on hard drives) or neglect (like mishandling, careless handling or storage under unsuitable conditions) ...
" and "
data leak A data breach, also known as data leakage, is "the unauthorized exposure, disclosure, or loss of personal information". Attackers have a variety of motives, from financial gain to political activism, political repression, and espionage. There a ...
" are related and are often used interchangeably.Asaf Shabtai, Yuval Elovici, Lior Rokach,
A Survey of Data Leakage Detection and Prevention Solutions
Springer-Verlag New York Incorporated, 2012
Data loss incidents turn into data leak incidents in cases where media containing sensitive information are lost and subsequently acquired by an unauthorized party. However, a data leak is possible without losing the data on the originating side. Other terms associated with data leakage prevention are information leak detection and prevention (ILDP), information leak prevention (ILP), content monitoring and filtering (CMF), information protection and control (IPC) and extrusion prevention system (EPS), as opposed to
intrusion prevention system An intrusion detection system (IDS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically either reported to an administrator or collec ...
.


Categories

The
technological Technology is the application of conceptual knowledge to achieve practical goals, especially in a reproducible way. The word ''technology'' can also mean the products resulting from such efforts, including both tangible tools such as ute ...
means
employed Employment is a relationship between two parties regulating the provision of paid labour services. Usually based on a contract, one party, the employer, which might be a corporation, a not-for-profit organization, a co-operative, or any other ...
for dealing with data leakage incidents can be divided into categories: standard security measures, advanced/intelligent security measures, access control and encryption and designated DLP systems, although only the latter category are currently thought of as DLP today.Phua, C.
Protecting organisations from personal data breaches
Computer Fraud and Security, 1:13-18, 2009
Common DLP methods for spotting malicious or otherwise unwanted activity and responding to it mechanically are automatic detection and response. Most DLP systems rely on predefined rules to identify and categorize sensitive information, which in turn helps system administrators zero in on vulnerable spots. After that, some areas could have extra safeguards installed.


Standard measures

Standard security measures, such as firewalls,
intrusion detection systems An intrusion detection system (IDS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically either reported to an administrator or collec ...
(IDSs) and
antivirus software Antivirus software (abbreviated to AV software), also known as anti-malware, is a computer program used to prevent, detect, and remove malware. Antivirus software was originally developed to detect and remove computer viruses, hence the name ...
, are commonly available products that guard computers against outsider and insider attacks. The use of a firewall, for example, prevents the access of outsiders to the internal network and an intrusion detection system detects intrusion attempts by outsiders. Inside attacks can be averted through antivirus scans that detect
Trojan horses In Greek mythology, the Trojan Horse () was a wooden horse said to have been used by the Greeks during the Trojan War to enter the city of Troy and win the war. The Trojan Horse is not mentioned in Homer's ''Iliad'', with the poem ending befor ...
that send
confidential information Confidentiality involves a set of rules or a promise sometimes executed through non-disclosure agreement, confidentiality agreements that limits the access to or places restrictions on the distribution of certain types of information. Legal con ...
, and by the use of thin clients that operate in a client-server architecture with no personal or sensitive data stored on a client device.


Advanced measures

Advanced security measures employ
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
and temporal reasoning
algorithms In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...
to detect abnormal access to data (e.g., databases or information retrieval systems) or abnormal email exchange, honeypots for detecting authorized personnel with malicious intentions and activity-based verification (e.g., recognition of keystroke dynamics) and
user activity monitoring In the field of information security, user activity monitoring (UAM) or user activity analysis (UAA) is the monitoring and recording of user actions. UAM captures user actions, including the use of applications, windows opened, system commands exe ...
for detecting abnormal data access.


Designated DLP systems

Designated systems detect and prevent unauthorized attempts to copy or send sensitive data, intentionally or unintentionally, mainly by personnel who are authorized to access the sensitive information. In order to classify certain information as sensitive, these use mechanisms, such as exact data matching, structured data fingerprinting, statistical methods, rule and
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
matching, published lexicons, conceptual definitions, keywords and contextual information such as the source of the data.Ouellet, E., Magic Quadrant for Content-Aware Data Loss Prevention, Technical Report, RA4 06242010, Gartner RAS Core Research, 2012


Types


Network

Network (data in motion) technology is typically installed at network egress points near the perimeter. It analyzes network traffic to detect sensitive data that is being sent in violation of
information security Information security is the practice of protecting information by mitigating information risks. It is part of information risk management. It typically involves preventing or reducing the probability of unauthorized or inappropriate access to data ...
policies. Multiple security control points may report activity to be analyzed by a central management server. A
next-generation firewall A next-generation firewall (NGFW) is a part of the third generation of firewall technology, combining a conventional firewall with other network device filtering functions, such as an application firewall using in-line deep packet inspection (DP ...
(NGFW) or
intrusion detection system An intrusion detection system (IDS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically either reported to an administrator or collec ...
(IDS) are common examples of technology that can be leveraged to perform DLP capabilities on the network. Network DLP capabilities can usually be undermined by a sophisticated
threat actor In cybersecurity, a threat actor, bad actor or malicious actor is either a person or a group of people that take part in Malice (law), malicious acts in the cyber realm including: computers, devices, systems, or Computer network, networks. Threat ...
through the use of
data masking Data masking or data obfuscation is the process of modifying sensitive data in such a way that it is of no or little value to unauthorized intruders while still being usable by software or authorized personnel. Data masking can also be referred ...
techniques such as encryption or compression.


Endpoint

Endpoint (data in use) systems run on internal end-user workstations or servers. Like network-based systems, endpoint-based technology can address internal as well as external communications. It can therefore be used to control information flow between groups or types of users (e.g. '
Chinese wall A Chinese wall or ethical wall is an information barrier protocol within an organization designed to prevent exchange of information or communication that could lead to conflicts of interest. For example, a Chinese wall may be established to sep ...
s'). They can also control email and
Instant Messaging Instant messaging (IM) technology is a type of synchronous computer-mediated communication involving the immediate ( real-time) transmission of messages between two or more parties over the Internet or another computer network. Originally involv ...
communications before they reach the corporate archive, such that a blocked communication (i.e., one that was never sent, and therefore not subject to retention rules) will not be identified in a subsequent legal discovery situation. Endpoint systems have the advantage that they can monitor and control access to physical devices (such as mobile devices with data storage capabilities) and in some cases can access information before it is encrypted. Endpoint systems also have access to the information needed to provide contextual classification; for example the source or author generating content. Some endpoint-based systems provide application controls to block attempted transmissions of confidential information and provide immediate user feedback. They must be installed on every workstation in the network (typically via a DLP Agent), cannot be used on mobile devices (e.g., cell phones and PDAs) or where they cannot be practically installed (for example on a workstation in an
Internet café An Internet café, also known as a cybercafé, is a Coffeehouse, café (or a convenience store or a fully dedicated Internet access business) that provides the use of computers with high bandwidth Internet access on the payment of a fee. Usage ...
).


Cloud

The
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles, suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
now contains a lot of critical data as organizations transform to cloud-native technologies to accelerate virtual team collaboration. The data floating in the cloud needs to be protected as well since they are susceptible to
cyberattacks A cyberattack (or cyber attack) occurs when there is an unauthorized action against computer infrastructure that compromises the confidentiality, integrity, or availability of its content. The rising dependence on increasingly complex and inte ...
, accidental leakage and insider threats. Cloud DLP monitors and audits the data, while providing access and usage control of data using policies. It establishes greater end-to-end visibility for all the data stored in the cloud.


Data identification

DLP includes techniques for identifying confidential or sensitive information. Sometimes confused with discovery, data identification is a process by which organizations use a DLP technology to determine what to look for. Data is classified as either structured or unstructured. Structured data resides in fixed fields within a file such as a spreadsheet, while
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically plain text, text-heavy, but may contain data such ...
refers to free-form text or media in text documents, PDF files and video. An estimated 80% of all data is unstructured and 20% structured.Brian E. Burke, “Information Protection and Control survey: Data Loss Prevention and Encryption trends,” IDC, May 2008


Data loss protection (DLP)

Sometimes a data distributor inadvertently or advertently gives sensitive data to one or more third parties, or uses it themselves in an authorized fashion. Sometime later, some of the data is found in an unauthorized place (e.g., on the web or on a user's laptop). The distributor must then investigate the source of the loss.


Data at rest

"
Data at rest Data at rest in information technology means data that is housed physically on computer data storage in any digital form (e.g. cloud storage, file hosting services, databases, data warehouses, spreadsheets, archives, tapes, off-site or cloud backu ...
" specifically refers to information that is not moving, i.e. that exists in a database or a file share. This information is of great concern to businesses and government institutions simply because the longer data is left unused in storage, the more likely it might be retrieved by unauthorized individuals. Protecting such data involves methods such as access control, data encryption and
data retention Data retention defines the policies of persistent data and records management for meeting legal and business data archival requirements. Although sometimes interchangeable, it is not to be confused with the Data Protection Act 1998. The differe ...
policies.


Data in use

"
Data in use Data in use is an information technology term referring to active data which is stored in a non-persistent digital state or volatile memory, typically in computer random-access memory (RAM), CPU caches, or CPU registers. Scranton, PA data scie ...
" refers to data that the user is currently interacting with. DLP systems that protect data in-use may monitor and flag unauthorized activities. These activities include screen-capture, copy/paste, print and fax operations involving sensitive data. It can be intentional or unintentional attempts to transmit sensitive data over communication channels.


Data in motion

" Data in motion" is data that is traversing through a network to an endpoint. Networks can be internal or external. DLP systems that protect data in-motion monitor sensitive data traveling across a network through various communication channels.


See also

*
Computer security Computer security (also cybersecurity, digital security, or information technology (IT) security) is a subdiscipline within the field of information security. It consists of the protection of computer software, systems and computer network, n ...
*
List of backup software This is a list of notable backup software that performs data backups. Archivers, transfer protocols, and version control systems are often used for backups but only software focused on backup is listed here. See Comparison of backup software ...
*
Metadata removal tool Metadata removal tool or metadata scrubber is a type of privacy software built to protect the privacy of its users by removing potentially privacy-compromising metadata from files before they are shared with others, e.g., by sending them as e-mai ...
*
Endpoint detection and response Endpoint detection and response (EDR), also known as endpoint threat detection and response (ETDR), is a cybersecurity technology that continually monitors an "endpoint" (e.g. a client device such as a mobile phone, laptop, Internet of things devi ...
*
Endpoint security Endpoint security or endpoint protection is an approach to the protection of computer networks that are remotely bridged to client devices. The connection of endpoint devices such as laptops, tablets, mobile phones, and other wireless devices t ...


References

{{Malware Data security Information technology Data management software