A data breach, also known as data leakage, is "the unauthorized exposure, disclosure, or loss of
personal information".
Attackers have a variety of motives, from financial gain to
political activism,
political repression, and
espionage
Espionage, spying, or intelligence gathering, as a subfield of the intelligence field, is the act of obtaining secret or confidential information ( intelligence). A person who commits espionage on a mission-specific contract is called an ...
. There are several technical root causes of data breaches, including accidental or intentional disclosure of information by insiders, loss or theft of
unencrypted devices, hacking into a system by exploiting
software vulnerabilities, and
social engineering attacks such as
phishing where insiders are tricked into disclosing information. Although prevention efforts by the company holding the data can reduce the risk of data breach, it cannot bring it to zero.
The first reported breach was in 2002 and the number occurring each year has grown since then. A large number of data breaches are never detected. If a breach is made known to the company holding the data, post-breach efforts commonly include containing the breach, investigating its scope and cause, and notifications to people whose records were compromised, as required by law in many jurisdictions. Law enforcement agencies may investigate breaches, although the hackers responsible are rarely caught.
Many criminals sell data obtained in breaches on the
dark web. Thus, people whose personal data was compromised are at elevated risk of
identity theft for years afterwards and a significant number will become victims of this crime.
Data breach notification laws in many jurisdictions, including all
states of the United States and
European Union member states, require the notification of people whose data has been breached. Lawsuits against the company that was breached are common, although few victims receive money from them. There is little empirical evidence of economic harm to firms from breaches except the direct cost, although there is some evidence suggesting a temporary, short-term decline in
stock price.
Definition
A data breach is a violation of "organizational, regulatory, legislative or contractual" law or policy that causes "the unauthorized exposure, disclosure, or loss of
personal information". Legal and contractual definitions vary. Some researchers include other types of information, for example
intellectual property
Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, co ...
or
classified information. However, companies mostly disclose breaches because it is required by law, and only personal information is covered by
data breach notification laws.
Prevalence
The first reported data breach occurred on 5 April 2002 when 250,000
social security numbers collected by the
State of California were stolen from a data center. Before the widespread adoption of
data breach notification laws around 2005, the prevalence of data breaches is difficult to determine. Even afterwards, statistics per year cannot be relied on because data breaches may be reported years after they occurred, or not reported at all. Nevertheless, the statistics show a continued increase in the number and severity of data breaches that continues . In 2016, researcher
Sasha Romanosky estimated that data breaches (excluding
phishing) outnumbered other security breaches by a factor of four.
Perpetrators
According to a 2020 estimate, 55 percent of data breaches were caused by
organized crime
Organized crime is a category of transnational organized crime, transnational, national, or local group of centralized enterprises run to engage in illegal activity, most commonly for profit. While organized crime is generally thought of as a f ...
, 10 percent by
system administrators, 10 percent by
end users such as customers or employees, and 10 percent by states or state-affiliated actors. Opportunistic criminals may cause data breaches—often using
malware or
social engineering attacks, but they will typically move on if the security is above average. More organized criminals have more resources and are more focused in their
targeting of particular data. Both of them sell the information they obtain for financial gain. Another source of data breaches are
politically motivated hackers, for example
Anonymous, that target particular objectives. State-sponsored hackers target either citizens of their country or foreign entities, for such purposes as
political repression and
espionage
Espionage, spying, or intelligence gathering, as a subfield of the intelligence field, is the act of obtaining secret or confidential information ( intelligence). A person who commits espionage on a mission-specific contract is called an ...
. Often they use undisclosed
zero-day vulnerabilities for which the hackers are paid large sums of money. The
Pegasus spyware—a
no-click malware developed by the Israeli company
NSO Group that can be installed on most cellphones and spies on the users' activity—has drawn attention both for use against criminals such as drug kingpin
El Chapo as well as political dissidents, facilitating the
murder of Jamal Khashoggi.
Causes
Technical causes
Despite developers' goal of delivering a product that works entirely as intended, virtually all
software
Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.
The history of software is closely tied to the development of digital comput ...
and
hardware contains bugs. If a bug creates a security risk, it is called a
vulnerability.
Patches are often released to fix identified vulnerabilities, but those that remain unknown (
zero days) as well as those that have not been patched are still liable for exploitation. Both software written by the target of the breach and third party software used by them are vulnerable to attack. The
software vendor is rarely legally liable for the cost of breaches, thus creating an incentive to make cheaper but less secure software.
Vulnerabilities vary in their ability to be
exploited by malicious actors. The most valuable allow the attacker to
inject and run their own code (called
malware), without the user being aware of it. Some malware is downloaded by users via clicking on a malicious link, but it is also possible for malicious
web applications to download malware just from visiting the website (
drive-by download).
Keyloggers, a type of malware that records a user's keystrokes, are often used in data breaches. The majority of data breaches could have been averted by storing all sensitive information in an encrypted format. That way, physical possession of the storage device or access to encrypted information is useless unless the attacker has the
encryption key.
Hashing is also a good solution for keeping
passwords safe from
brute-force attacks, but only if the algorithm is sufficiently secure.
Many data breaches occur on the hardware operated by a partner of the organization targeted—including the
2013 Target data breach and
2014 JPMorgan Chase data breach.
Outsourcing work to a third party leads to a risk of data breach if that company has lower security standards; in particular, small companies often lack the resources to take as many security precautions. As a result, outsourcing agreements often include security guarantees and provisions for what happens in the event of a data breach.
Human causes
Human causes of breach are often based on trust of another actor that turns out to be malicious.
Social engineering attacks rely on tricking an insider into doing something that compromises the system's security, such as revealing a password or clicking a link to download malware. Data breaches may also be deliberately caused by insiders. One type of social engineering,
phishing, obtains a user's
credential
A credential is a piece of any document that details a qualification, competence, or authority issued to an individual by a third party with a relevant or ''de facto'' authority or assumed competence to do so.
Examples of credentials include aca ...
s by sending them a malicious message impersonating a legitimate entity, such as a bank, and getting the user to enter their credentials onto a malicious website controlled by the cybercriminal.
Two-factor authentication can prevent the malicious actor from using the credentials. Training employees to recognize social engineering is another common strategy.
Another source of breaches is accidental disclosure of information, for example publishing information that should be kept private. With the increase in
remote work and
bring your own device
Bring your own device (BYOD ) (also called bring your own technology (BYOT), bring your own phone (BYOP), and bring your own personal computer (BYOPC)) refers to being allowed to use one's personally owned device, rather than being required to use ...
policies, large amounts of corporate data is stored on personal devices of employees. Via carelessness or disregard of company security policies, these devices can be lost or stolen. Technical solutions can prevent many causes of human error, such as encrypting all sensitive data, preventing employees from using insecure passwords, installing
antivirus software to prevent malware, and implementing a robust patching system to ensure that all devices are kept up to date.
Breach lifecycle
Prevention
Although attention to security can reduce the risk of data breach, it cannot bring it to zero. Security is not the only priority of organizations, and an attempt to achieve perfect security would make the technology unusable. Many companies hire a
chief information security officer (CISO) to oversee the company's information security strategy. To obtain information about potential threats, security professionals will network with each other and share information with other organizations facing similar threats. Defense measures can include an updated incident response strategy, contracts with
digital forensics firms that could investigate a breach,
cyber insurance, and monitoring the
dark web for stolen credentials of employees. In 2024, the United States
National Institute of Standards and Technology
The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into Outline of p ...
(NIST) issued a special publication, "Data Confidentiality: Identifying and Protecting Assets Against Data Breaches". The
NIST Cybersecurity Framework also contains information about data protection. Other organizations have released different standards for data protection.
The architecture of a company's systems plays a key role in deterring attackers. Daswani and Elbayadi recommend having only one means of
authentication
Authentication (from ''authentikos'', "real, genuine", from αὐθέντης ''authentes'', "author") is the act of proving an Logical assertion, assertion, such as the Digital identity, identity of a computer system user. In contrast with iden ...
, avoiding redundant systems, and making the most secure setting default.
Defense in depth and
distributed privilege (requiring multiple authentications to execute an operation) also can make a system more difficult to hack. Giving employees and software the least amount of access necessary to fulfill their functions (
principle of least privilege) limits the likelihood and damage of breaches. Several data breaches were enabled by reliance on
security by obscurity; the victims had put access credentials in publicly accessible files. Nevertheless, prioritizing ease of use is also important because otherwise users might circumvent the security systems. Rigorous
software testing
Software testing is the act of checking whether software satisfies expectations.
Software testing can provide objective, independent information about the Quality (business), quality of software and the risk of its failure to a User (computin ...
, including
penetration testing, can reduce software vulnerabilities, and must be performed prior to each release even if the company is using a
continuous integration/continuous deployment model where new versions are constantly being rolled out.
The principle of
least persistence—avoiding the collection of data that is not necessary and destruction of data that is no longer necessary—can mitigate the harm from breaches. The challenge is that destroying data can be more complex with modern database systems.
Response
A large number of data breaches are never detected. Of those that are, most breaches are detected by third parties; others are detected by employees or automated systems. Responding to breaches is often the responsibility of a dedicated
computer security incident response team, often including technical experts,
public relations, and legal counsel. Many companies do not have sufficient expertise in-house, and subcontract some of these roles; often, these outside resources are provided by the cyber insurance policy. After a data breach becomes known to the company, the next steps typically include confirming it occurred, notifying the response team, and attempting to contain the damage.
To stop exfiltration of data, common strategies include shutting down affected servers, taking them offline,
patching the vulnerability, and
rebuilding. Once the exact way that the data was compromised is identified, there is typically only one or two technical vulnerabilities that need to be addressed in order to contain the breach and prevent it from reoccurring. A
penetration test can then verify that the fix is working as expected. If
malware is involved, the organization must investigate and close all infiltration and exfiltration vectors, as well as locate and remove all malware from its systems. If data was posted on the
dark web, companies may attempt to have it taken down. Containing the breach can compromise investigation, and some tactics (such as shutting down servers) can violate the company's contractual obligations.
Gathering data about the breach can facilitate later litigation or criminal prosecution, but only if the data is gathered according to legal standards and the
chain of custody is maintained. Database forensics can narrow down the records involved, limiting the scope of the incident. Extensive investigation may be undertaken, which can be even more expensive than
litigation. In the United States, breaches may be investigated by government agencies such as the
Office for Civil Rights, the
United States Department of Health and Human Services, and the
Federal Trade Commission
The Federal Trade Commission (FTC) is an independent agency of the United States government whose principal mission is the enforcement of civil (non-criminal) United States antitrust law, antitrust law and the promotion of consumer protection. It ...
(FTC). Law enforcement agencies may investigate breaches although the hackers responsible are rarely caught.
Notifications are typically sent out as required by law. Many companies offer free
credit monitoring to people affected by a data breach, although only around 5 percent of those eligible take advantage of the service. Issuing new credit cards to consumers, although expensive, is an effective strategy to reduce the risk of
credit card fraud. Companies try to restore trust in their business operations and take steps to prevent a breach from reoccurring.
Consequences
For consumers
After a data breach, criminals make money by selling data, such as usernames, passwords,
social media
Social media are interactive technologies that facilitate the Content creation, creation, information exchange, sharing and news aggregator, aggregation of Content (media), content (such as ideas, interests, and other forms of expression) amongs ...
or
customer loyalty account information,
debit and
credit card numbers, and personal health information (see
medical data breach). Criminals often sell this data on the
dark web—parts of the internet where it is difficult to trace users and illicit activity is widespread—using platforms like
.onion or
I2P. Originating in the 2000s, the dark web, followed by untraceable
cryptocurrencies such as
Bitcoin in the 2010s, made it possible for criminals to sell data obtained in breaches with minimal risk of getting caught, facilitating an increase in hacking. One popular darknet marketplace,
Silk Road, was shut down in 2013 and its operators arrested, but several other marketplaces emerged in its place.
Telegram
Telegraphy is the long-distance transmission of messages where the sender uses symbolic codes, known to the recipient, rather than a physical exchange of an object bearing the message. Thus flag semaphore is a method of telegraphy, whereas pi ...
is also a popular forum for illegal sales of data.
This information may be used for a variety of purposes, such as
spamming, obtaining products with a victim's loyalty or payment information,
identity theft,
prescription drug fraud, or
insurance fraud
Insurance fraud is any intentional act committed to deceive or mislead an insurance company during the application or claims process, or the wrongful denial of a legitimate claim by an insurance company. It occurs when a claimant knowingly attem ...
. The threat of data breach or revealing information obtained in a data breach can be used for
extortion.
Consumers may suffer various forms of tangible or intangible harm from the theft of their personal data, or not notice any harm. A significant portion of those affected by a data breach become victims of
identity theft. A person's identifying information often circulates on the dark web for years, causing an increased risk of identity theft regardless of remediation efforts. Even if a customer does not end up footing the bill for
credit card fraud or identity theft, they have to spend time resolving the situation. Intangible harms include
doxxing (publicly revealing someone's personal information), for example medication usage or personal photos.
For organizations
There is little empirical evidence of economic harm from breaches except the direct cost, although there is some evidence suggesting a temporary, short-term decline in
stock price. Other impacts on the company can range from lost business, reduced employee productivity due to systems being offline or personnel redirected to working on the breach, resignation or firing of senior executives,
reputational damage, and increasing the future cost of auditing or security. Consumer losses from a breach are usually a negative
externality for the business. Some experts have argued that the evidence suggests there is not enough direct costs or reputational damage from data breaches to sufficiently
incentivize their prevention.
Estimating the cost of data breaches is difficult, both because not all breaches are reported and also because calculating the impact of breaches in financial terms is not straightforward. There are multiple ways of calculating the cost to businesses, especially when it comes to personnel time dedicated to dealing with the breach. Author Kevvie Fowler estimates that more than half the direct cost incurred by companies is in the form of litigation expenses and services provided to affected individuals, with the remaining cost split between notification and detection, including forensics and investigation. He argues that these costs are reduced if the organization has invested in security prior to the breach or has previous experience with breaches. The more
data records involved, the more expensive a breach typically will be. In 2016, researcher
Sasha Romanosky estimated that while the mean breach cost around the targeted firm $5 million, this figure was inflated by a few highly expensive breaches, and the typical data breach was much less costly, around $200,000. Romanosky estimated the total annual cost to corporations in the United States to be around $10 billion.
Laws
Notification
The law regarding data breaches is often found in
legislation to protect privacy more generally, and is dominated by provisions mandating notification when breaches occur. Laws differ greatly in how breaches are defined, what type of information is protected, the deadline for notification, and who has
standing to sue if the law is violated. Notification laws increase
transparency and provide a reputational incentive for companies to reduce breaches. The cost of notifying the breach can be high if many people were affected and is incurred regardless of the company's responsibility, so it can function like a
strict liability fine.
, ''Thomas on Data Breach'' listed 62
United Nations member states that are covered by data breach notification laws. Some other countries require breach notification in more general
data protection laws. Shortly after the first reported data breach in April 2002, California passed
a law requiring notification when an individual's personal information was breached. In the United States, notification laws proliferated after the February 2005
ChoicePoint data breach, widely publicized in part because of the large number of people affected (more than 140,000) and also because of outrage that the company initially informed only affected people in California. In 2018, the
European Union
The European Union (EU) is a supranational union, supranational political union, political and economic union of Member state of the European Union, member states that are Geography of the European Union, located primarily in Europe. The u ...
's
General Data Protection Regulation (GDPR) took effect. The GDPR requires notification within 72 hours, with very high fines possible for large companies not in compliance. This regulation also stimulated the tightening of data privacy laws elsewhere. , the only
United States federal law
The law of the United States comprises many levels of Codification (law), codified and uncodified forms of law, of which the supreme law is the nation's Constitution of the United States, Constitution, which prescribes the foundation of the ...
requiring notification for data breaches is limited to medical data regulated under
HIPAA, but all 50 states (since Alabama passed a law in 2018) have their own general data breach notification laws.
Security safeguards
Measures to protect data from a breach are typically absent from the law or vague. Filling this gap is standards required by
cyber insurance, which is held by most large companies and
functions as ''de facto'' regulation. Of the laws that do exist, there are two main approaches—one that prescribes specific standards to follow, and the
reasonableness approach. The former is rarely used due to a lack of flexibility and reluctance of legislators to arbitrate technical issues; with the latter approach, the law is vague but specific standards can emerge from
case law. Companies often prefer the standards approach for providing greater
legal certainty, but they might check all the boxes without providing a secure product. An additional flaw is that the laws are poorly enforced, with penalties often much less than the cost of a breach, and many companies do not follow them.
Litigation
Many
class-action lawsuits,
derivative suits, and other litigation have been brought after data breaches. They are often
settled regardless of the merits of the case due to the high cost of litigation. Even if a settlement is paid, few affected consumers receive any money as it usually is only cents to a few dollars per victim. Legal scholars
Daniel J. Solove and
Woodrow Hartzog argue that "Litigation has increased the costs of data breaches but has accomplished little else." Plaintiffs often struggle to prove that they suffered harm from a data breach. The contribution of a company's actions to a data breach varies, and likewise the liability for the damage resulting for data breaches is a contested matter. It is disputed what standard should be applied, whether it is strict liability,
negligence, or something else.
See also
*
Full disclosure (computer security)
*
Medical data breach
*
Surveillance capitalism
References
Sources
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
{{Information security
Breach
Data laws
Secure communication
Security breaches
Computer security