K-anonymity

	K-anonymity ''k''-anonymity is a property possessed by certain anonymized data. The concept of ''k''-anonymity was first introduced by Latanya Sweeney and Pierangela Samarati in a paper published in 1998 as an attempt to solve the problem: "Given person-specific field-structured data, produce a release of the data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful." A release of data is said to have the ''k''-anonymity property if the information for each person contained in the release cannot be distinguished from at least k - 1 individuals whose information also appear in the release. Unfortunately, the guarantees provided by k-anonymity are aspirational, not mathematical. Methods for ''k''-anonymization To use k-anonymity to process a dataset so that it can be released with privacy protection, a data scientist must first examine the dataset and decide if each attribute (column) is an ''identif ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Latanya Arvette Sweeney Latanya Arvette Sweeney is an American computer scientist. She is the Daniel Paul Professor of the Practice of Government and Technology at the Harvard Kennedy School and in the Harvard Faculty of Arts and Sciences at Harvard University. She is the founder and director of the Public Interest Tech Lab, founded in 2021 with a $3 million grant from the Ford Foundation as well as the Data Privacy Lab. She is the current Faculty Dean in Currier House at Harvard. Sweeney is the former Chief Technologist of the Federal Trade Commission and Editor-in-Chief of ''Technology Science''. Her best known academic work is on the theory of ''k''-anonymity, and she is credited with the observation that "87% of the U.S. population is uniquely identified by date of birth, gender, postal code". Education Sweeney graduated from Dana Hall Schools in Wellesley, Massachusetts, receiving her high school diploma in 1977. She delivered the valedictory at the graduation ceremony. She completed her u ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	L-diversity ''l''-diversity, also written as ''ℓ''-diversity, is a form of group based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data representation. This reduction is a trade off that results in some loss of effectiveness of data management or mining algorithms in order to gain some privacy. The ''l''-diversity model is an extension of the ''k''-anonymity model which reduces the granularity of data representation using techniques including generalization and suppression such that any given record maps onto at least ''k-1'' other records in the data. The ''l''-diversity model handles some of the weaknesses in the ''k''-anonymity model where protected identities to the level of ''k''-individuals is not equivalent to protecting the corresponding sensitive values that were generalized or suppressed, especially when the sensitive values within a group exhibit homogeneity. The ''l''-diversity model adds the promotion of intra-group diversity for s ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Data Anonymization Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous. Overview Data anonymization has been defined as a "process by which personal data is altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." Data anonymization may enable the transfer of information across a boundary, such as between two departments within an agency or between two agencies, while reducing the risk of unintended disclosure, and in certain environments in a manner that enables evaluation and analytics post-anonymization. In the context of medical data, anonymized data refers to data from which the patient cannot be identified by the recipient of the information. The name, address, and full postcode must be removed ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Differential Privacy Differential privacy (DP) is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is that if the effect of making an arbitrary single substitution in the database is small enough, the query result cannot be used to infer much about any single individual, and therefore provides privacy. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible ev ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	T-closeness ''t''-closeness is a further refinement of ''l''-diversity group based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data representation. This reduction is a trade off that results in some loss of effectiveness of data management or data mining algorithms in order to gain some privacy. The ''t''-closeness model extends the ''l''-diversity model by treating the values of an attribute distinctly by taking into account the distribution of data values for that attribute. Formal definition Given the existence of data breaches where sensitive attributes may be inferred based upon the distribution of values for ''l''-diverse data, the ''t''-closeness method was created to further ''l''-diversity by additionally maintaining the distribution of sensitive fields. The original paper by Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian defines ''t''-closeness as: Charu Aggarwal and Philip S. Yu further state in their book on privacy-p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Unicity (computer Science) Unicity (\varepsilon_p) is a risk metric for measuring the re-identifiability of high-dimensional anonymous data. First introduced in 2013, unicity is measured by the number of points ''p'' needed to uniquely identify an individual in a data set. The fewer points needed, the more unique the traces are and the easier they would be to re-identify using outside information. In a high-dimensional, human behavioural data set, such as mobile phone meta-data, for each person, there exists potentially thousands of different records. In the case of mobile phone meta-data, credit card transaction histories and many other types of personal data, this information includes the time and location of an individual. In research, unicity is widely used to illustrate the re-identifiability of anonymous data sets. In 2013 researchers from the MIT Media Lab showed that only 4 points needed to uniquely identify 95% of individual trajectories in a de-identified data set of 1.5 million mobility traject ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	NP-hard In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard problem is the subset sum problem. A more precise specification is: a problem ''H'' is NP-hard when every problem ''L'' in NP can be reduced in polynomial time to ''H''; that is, assuming a solution for ''H'' takes 1 unit time, ''H''s solution can be used to solve ''L'' in polynomial time. As a consequence, finding a polynomial time algorithm to solve any NP-hard problem would give polynomial time algorithms for all the problems in NP. As it is suspected that P≠NP, it is unlikely that such an algorithm exists. It is suspected that there are no polynomial-time algorithms for NP-hard problems, but that has not been proven. Moreover, the class P, in which all problems can be solved in polynomial time, is contained in the NP class. Def ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Differential Privacy Differential privacy (DP) is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is that if the effect of making an arbitrary single substitution in the database is small enough, the query result cannot be used to infer much about any single individual, and therefore provides privacy. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible ev ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Quasi-identifier Quasi-identifiers are pieces of information that are not of themselves unique identifiers, but are sufficiently well correlated with an entity that they can be combined with other quasi-identifiers to create a unique identifier. Quasi-identifiers can thus, when combined, become personally identifying information. This process is called re-identification. As an example, Latanya Sweeney has shown that even though neither gender, birth dates nor postal codes uniquely identify an individual, the combination of all three is sufficient to identify 87% of individuals in the United States. The term was introduced by Tore Dalenius in 1986. Since then, quasi-identifiers have been the basis of several attacks on released data. For instance, Sweeney linked health records to publicly available information to locate the then-governor of Massachusetts' hospital records using uniquely identifying quasi-identifiers, and Sweeney, Abu and Winn used public voter records to re-identify participants in ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Buddhism Buddhism ( , ), also known as Buddha Dharma and Dharmavinaya (), is an Indian religion or philosophical tradition based on teachings attributed to the Buddha. It originated in northern India as a -movement in the 5th century BCE, and gradually spread throughout much of Asia via the Silk Road. It is the world's fourth-largest religion, with over 520 million followers (Buddhists) who comprise seven percent of the global population. The Buddha taught the Middle Way, a path of spiritual development that avoids both extreme asceticism and hedonism. It aims at liberation from clinging and craving to things which are impermanent (), incapable of satisfying ('), and without a lasting essence (), ending the cycle of death and rebirth (). A summary of this path is expressed in the Noble Eightfold Path, a training of the mind with observance of Buddhist ethics and meditation. Other widely observed practices include: monasticism; "taking refuge" in the Buddha, the , and the ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Cardiovascular Disease Cardiovascular disease (CVD) is a class of diseases that involve the heart or blood vessels. CVD includes coronary artery diseases (CAD) such as angina and myocardial infarction (commonly known as a heart attack). Other CVDs include stroke, heart failure, hypertensive heart disease, rheumatic heart disease, cardiomyopathy, abnormal heart rhythms, congenital heart disease, valvular heart disease, carditis, aortic aneurysms, peripheral artery disease, thromboembolic disease, and venous thrombosis. The underlying mechanisms vary depending on the disease. It is estimated that dietary risk factors are associated with 53% of CVD deaths. Coronary artery disease, stroke, and peripheral artery disease involve atherosclerosis. This may be caused by high blood pressure, smoking, diabetes mellitus, lack of exercise, obesity, high blood cholesterol, poor diet, excessive alcohol consumption, and poor sleep, among other things. High blood pressure is estimated to account ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Parsi Parsis () or Parsees are an ethnoreligious group of the Indian subcontinent adhering to Zoroastrianism. They are descended from Persians who migrated to Medieval India during and after the Arab conquest of Iran (part of the early Muslim conquests) in order to preserve their Zoroastrian identity. The Parsi people comprise the older of the Indian subcontinent's two Zoroastrian communities vis-à-vis the Iranis, whose ancestors migrated to British-ruled India from Qajar-era Iran. According to a 16th-century Parsi epic, '' Qissa-i Sanjan'', Zoroastrian Persians continued to migrate to the Indian subcontinent from Greater Iran in between the 8th and 10th centuries, and ultimately settled in present-day Gujarat after being granted refuge by a local Hindu king. Prior to the 7th-century fall of the Sassanid Empire to the Rashidun Caliphate, the Iranian mainland (historically known as 'Persia') had a Zoroastrian majority, and Zoroastrianism had served as the Iranian state religio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]