HOME

TheInfoList



OR:

''t''-closeness is a further refinement of ''l''-diversity group based
anonymization Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous. Overv ...
that is used to preserve
privacy Privacy (, ) is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. The domain of privacy partially overlaps with security, which can include the concepts of a ...
in data sets by reducing the granularity of a
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
representation. This reduction is a trade off that results in some loss of effectiveness of
data management Data management comprises all disciplines related to handling data as a valuable resource. Concept The concept of data management arose in the 1980s as technology moved from sequential processing (first punched cards, then magnetic tape) to ...
or data mining
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s in order to gain some privacy. The ''t''-closeness model extends the ''l''-diversity model by treating the values of an attribute distinctly by taking into account the distribution of data values for that attribute.


Formal definition

Given the existence of
data breach A data breach is a security violation, in which sensitive, protected or confidential data is copied, transmitted, viewed, stolen or used by an individual unauthorized to do so. Other terms are unintentional information disclosure, data leak, info ...
es where sensitive attributes may be inferred based upon the distribution of values for ''l''-diverse data, the ''t''-closeness method was created to further ''l''-diversity by additionally maintaining the distribution of sensitive fields. The original paper by Ninghui Li, Tiancheng Li, and
Suresh Venkatasubramanian Suresh Venkatasubramanian is an Indian computer scientist and professor at Brown University. In 2021, Prof. Venkatasubramanian was appointed to the White House Office of Science and Technology Policy, advising on matters relating to fairness and b ...
defines ''t''-closeness as: Charu Aggarwal and
Philip S. Yu Philip S. Yu (born 1952) is an American computer scientist and Professor in Information Technology at the University of Illinois at Chicago, known for his work in the field of data mining. Biography Yu received his BS in electrical engineering ...
further state in their book on privacy-preserving data mining that with this definition, threshold ''t'' gives an upper bound on the difference between the distribution of the sensitive attribute values within an anonymized group as compared to the global distribution of values. They also state that for numeric attributes, using ''t''-closeness anonymization is more effective than many other privacy-preserving data mining methods.


Data breaches and ''l''-diversity

In real data sets attribute values may be skewed or semantically similar. However, accounting for value distributions may cause difficulty in creating feasible ''l''-diverse representations. The ''l''-diversity technique is useful in that it may hinder an attacker leveraging the global distribution of an attribute's data values in order to infer information about sensitive data values. Not every value may exhibit equal sensitivity, for example, a rare positive indicator for a disease may provide more information than a common negative indicator. Because of examples like this, ''l''-diversity may be difficult and unnecessary to achieve when protecting against attribute disclosure. Alternatively, sensitive information leaks may occur because while ''l''-diversity requirement ensures “diversity” of sensitive values in each group, it does not recognize that values may be semantically close, for example, an attacker could deduce a stomach disease applies to an individual if a sample containing the individual only listed three different stomach diseases.


See also

*
k-anonymity ''k''-anonymity is a property possessed by certain anonymized data. The concept of ''k''-anonymity was first introduced by Latanya Sweeney and Pierangela Samarati in a paper published in 1998 as an attempt to solve the problem: "Given person-spe ...
*
Differential privacy Differential privacy (DP) is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is t ...


References

{{Reflist Anonymity Privacy