The exponential mechanism is a technique for designing differentially private algorithms. It was developed by

Frank McSherry Frank McSherry is a computer scientist. McSherry's areas of research include distributed computing and information privacy. McSherry is known, along with Cynthia Dwork, Adam D. Smith, and Kobbi Nissim, as one of the co-inventors of differential pr ...

and Kunal Talwar in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies. Most of the initial research in the field of differential privacy revolved around real-valued functions which have relatively low

sensitivity Sensitivity may refer to: Science and technology Natural sciences * Sensitivity (physiology), the ability of an organism or organ to respond to external stimuli ** Sensory processing sensitivity in humans * Sensitivity and specificity, statisti ...

to change in the data of a single individual and whose usefulness is not hampered by small additive perturbations. A natural question is what happens in the situation when one wants to preserve more general sets of properties. The exponential mechanism helps to extend the notion of differential privacy to address these issues. Moreover, it describes a class of mechanisms that includes all possible differentially private mechanisms.

The mechanism

Algorithm

In very generic terms, a privacy mechanism maps a set of

n\,\!

inputs from domain

\mathcal\,\!

to a range

\mathcal\,\!

. The map may be randomized, in which case each element of the domain

\mathcal\,\!

corresponds to a probability distribution over the range

\mathcal\,\!

. The privacy mechanism makes no assumption about the nature of

\mathcal\,\!

and

\mathcal\,\!

apart from a base

measure Measure may refer to: * Measurement, the assignment of a number to a characteristic of an object or event Law * Ballot measure, proposed legislation in the United States * Church of England Measure, legislation of the Church of England * Meas ...

\mu\,\!

\mathcal\,\!

. Let us define a function

q:\mathcal^n\times\mathcal\rightarrow\mathbb\,\!

. Intuitively this function assigns a score to the pair

(d,r)\,\!

, where

d\in\mathcal^n\,\!

and

r\in\mathcal\,\!

. The score reflects the appeal of the pair

(d,r)\,\!

, i.e. the higher the score, the more appealing the pair is. Given the input

d\in\mathcal^n\,\!

, the mechanism's objective is to return an

r\in\mathcal\,\!

such that the function

q(d,r)\,\!

is approximately maximized. To achieve this, set up the mechanism

\mathcal_q^\varepsilon(d)\,\!

as follows:
Definition: For any function

q:(\mathcal^n\times\mathcal)\rightarrow\mathbb\,\!

, and a base measure

\mu\,\!

over

\mathcal\,\!

, define: :

\mathcal_q^\varepsilon(d):=\,\!

Choose

r\,\!

with probability proportional to

e^\times\mu(r)\,\!

, where

d\in\mathcal^n,r\in \mathcal\,\!

. This definition implies the fact that the probability of returning an

r\,\!

increases exponentially with the increase in the value of

q(d,r)\,\!

. Ignoring the base measure

\mu\,\!

then the value

r\,\!

which maximizes

q(d,r)\,\!

has the highest probability. Moreover, this mechanism is differentially private. Proof of this claim will follow. One technicality that should be kept in mind is that in order to properly define

\mathcal_q^\varepsilon(d)\,\!

the

\int_r e^\times\mu(r)\,\!

should be finite. Theorem (differential privacy):

\mathcal_q^\varepsilon(d)\,\!

gives

(2\varepsilon\Delta q)\,\!

-differential privacy. Proof: The probability density of

\mathcal_q^\varepsilon(d)\,\!

r\,\!

equals :

\frac.\,\!

Now, if a single change in

d\,\!

changes

q\,\!

by at most

\Delta q\,\!

then the numerator can change at most by a factor of

e^\,\!

and the denominator minimum by a factor of

e^\,\!

. Thus, the ratio of the new probability density (i.e. with new

d\,\!

) and the earlier one is at most

\exp(2\varepsilon\Delta q)\,\!

Accuracy

We would ideally want the random draws of

r\,\!

from the mechanism

\mathcal_q^\varepsilon(d)\,\!

to nearly maximize

q(d,r)\,\!

. If we consider

\max_rq(d,r)\,\!

to be

OPT\,\!

then we can show that the probability of the mechanism deviating from

OPT\,\!

is low, as long as there is a sufficient mass (in terms of

\mu

) of values

r\,\!

with value

q\,\!

close to the optimum. Lemma: Let

S_=\\,\!

and

\bar_=\\,\!

, we have

p(\bar_)\,\!

is at most

\exp(-\varepsilon t)/\mu(S_)\,\!

. The probability is taken over

\mathcal\,\!

. Proof: The probability

p(\bar_)\,\!

is at most

p(\bar_)/p(S_t)\,\!

, as the denominator can be at most one. Since both the probabilities have the same normalizing term so, :

\frac = \frac \leq \exp(-\varepsilon t) \frac.

The value of

\mu(\bar_)\,\!

is at most one, and so this bound implies the lemma statement. Theorem (Accuracy): For those values of

t\geq \ln \left( \frac \right) / \varepsilon\,\!

, we have

geq OPT-3t\,\!

. Proof: It follows from the previous lemma that the probability of the score being at least

OPT-2t\,\!

1-\exp(-\varepsilon t)/\mu(S_)\,\!

. By hypothesis,

t\geq \ln\left(\frac\right)/\varepsilon\,\!

. Substituting the value of

t\,\!

we get this probability to be at least

1-t/OPT\,\!

. Multiplying with

OPT-2t\,\!

yields the desired bound. We can assume

\mu(A)\,\!

for

A\subseteq \mathcal\,\!

to be less than or equal to one in all the computations, because we can always normalize with

\mu(\mathcal)\,\!

Example application

Before we get into the details of the example let us define some terms which we will be using extensively throughout our discussion. Definition (global sensitivity): The global sensitivity of a query

Q\,\!

is its maximum difference when evaluated on two neighbouring datasets

D_1,D_2\in\mathcal^n\,\!

: :

GS_Q = \max_, (Q(D_1)-Q(D_2)), .\,\!

Definition: A predicate query

Q_\varphi\,\!

for any predicate

\varphi\,\!

is defined to be :

Q_=\frac.\,\!

Note that

GS_\leq 1/n\,\!

for any predicate

\varphi\,\!

Release mechanism

The following is due to

Avrim Blum Avrim Blum (born 27 May 1966) is a computer scientist. In 2007, he was made a Fellow of the Association for Computing Machinery "for contributions to learning theory and algorithms." Blum attended MIT, where he received his Ph.D. in 1991 under ...

Katrina Ligett Katrina Ligett is an American computer scientist. She is a Professor of computer science at the Hebrew University and Visiting Associate at California Institute of Technology. She is known for work on algorithmic game theory and privacy. Educati ...

and

Aaron Roth Aaron Roth is an American computer scientist. He is the Henry Salvatori Professor of Computer and Cognitive Science at the University of Pennsylvania. Biography Roth is the son of Alvin E. Roth, a former Harvard University professor who won th ...

. Definition (Usefulness):
mechanism

\mathcal\,\!

(\alpha,\delta)\,\!

-useful for queries in class

H\,\!

with probability

1-\delta\,\!

, if

\forall h\in H\,\!

and every dataset

D\,\!

, for

\widehat=\mathcal(D)\,\!

, Q_h(\widehat)-Q_h(D), \leq \alpha\,\!

. Informally, it means that with high probability the query

Q_\,\!

will behave in a similar way on the original dataset

D\,\!

and on the synthetic dataset

\widehat\,\!

.
Consider a common problem in Data Mining. Assume there is a database

D\,\!

with

n\,\!

entries. Each entry consist of

k\,\!

-tuples of the form

(x_1,x_2,\dots,x_k)\,\!

where

x_\in\\,\!

. Now, a user wants to learn a linear halfspace of the form

\pi_1 x_1 + \pi_2 x_2+\cdots+\pi_x_\geq x_\,\!

. In essence the user wants to figure out the values of

\pi_1,\pi_2,\dots,\pi_\,\!

such that maximum number of tuples in the database satisfy the inequality. The algorithm we describe below can generate a synthetic database

\widehat\,\!

which will allow the user to learn (approximately) the same linear half-space while querying on this synthetic database. The motivation for such an algorithm being that the new database will be generated in a differentially private manner and thus assure privacy to the individual records in the database

D\,\!

. In this section we show that it is possible to release a dataset which is useful for concepts from a polynomial VC-Dimension class and at the same time adhere to

\varepsilon\,\!

-differential privacy as long as the size of the original dataset is at least polynomial on the VC-Dimension of the concept class. To state formally: Theorem: For any class of functions

H\,\!

and any dataset

D\subset \^k\,\!

such that :

, D, \geq O\left(\frac+\frac\right)\,\!

we can output an

(\alpha,\delta)\,\!

-useful dataset

\widehat\,\!

that preserves

\varepsilon\,\!

-differential privacy. As we had mentioned earlier the algorithm need not be efficient. One interesting fact is that the algorithm which we are going to develop generates a synthetic dataset whose size is independent of the original dataset; in fact, it only depends on the VC-dimension of the concept class and the parameter

\alpha\,\!

. The algorithm outputs a dataset of size

\tilde(\operatorname(H)/\alpha^2)\,\!

We borrow the

Uniform Convergence Theorem In the mathematical field of analysis, uniform convergence is a mode of convergence of functions stronger than pointwise convergence. A sequence of functions (f_n) converges uniformly to a limiting function f on a set E if, given any arbitrarily s ...

from

combinatorics Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many a ...

and state a corollary of it which aligns to our need. Lemma: Given any dataset

D\,\!

there exists a dataset

\widehat\,\!

of size

=O(\operatorname(H)\log(1/\alpha))/\alpha^2\,\!

such that

\max_, Q_(D)-Q_(\widehat), \leq \alpha/2\,\!

. Proof: We know from the uniform convergence theorem that :

\leq & 2 \left( \frac \right)^ \cdot e^, \end

where probability is over the distribution of the dataset. Thus, if the RHS is less than one then we know for sure that the data set

\widehat\,\!

exists. To bound the RHS to less than one we need

m\geq\lambda(\operatorname(H)\log(m/\operatorname(H))/\alpha^2)\,\!

, where

\lambda\,\!

is some positive constant. Since we stated earlier that we will output a dataset of size

\tilde(\operatorname(H)/\alpha^2)\,\!

, so using this bound on

m\,\!

we get

m\geq\lambda(\operatorname(H) \log( 1/\alpha) / \alpha^2) \,\!

. Hence the lemma. Now we invoke the exponential mechanism. Definition: For any function

q:((\^k)^n \times(\^k)^m) \rightarrow \mathbb\,\!

and input dataset

D\,\!

, the exponential mechanism outputs each dataset

\widehat\,\!

with probability proportional to

e^\,\!

. From the exponential mechanism we know this preserves

(\varepsilon nGS_q)\,\!

-differential privacy. Let's get back to the proof of the Theorem. We define

(q(D),q(\widehat))=-\max_, Q_h(D)-Q_h(\widehat), \,\!

. To show that the mechanism satisfies the

(\alpha,\delta)\,\!

-usefulness, we should show that it outputs some dataset

\widehat\,\!

with

q(D,\widehat)\geq -\alpha\,\!

with probability

1-\delta\,\!

. There are at most

2^\,\!

output datasets and the probability that

q(D,\widehat)\leq -\alpha\,\!

is at most proportional to

e^\,\!

. Thus by union bound, the probability of outputting any such dataset

\widehat\,\!

is at most proportional to

2^e^\,\!

. Again, we know that there exists some dataset

\widehat\in(\^k)^m\,\!

for which

q(D,\widehat)\geq -\alpha/2\,\!

. Therefore, such a dataset is output with probability at least proportional to

e^\,\!

. Let

A:=\,\!

the event that the exponential mechanism outputs some dataset

\widehat\,\!

such that

q(D,\widehat)\geq-\alpha/2\,\!

B:=\,\!

the event that the exponential mechanism outputs some dataset

\widehat\,\!

such that

q(D,\widehat)\leq-\alpha\,\!

. :

\therefore \frac\geq \frac=\frac.\,\!

Now setting this quantity to be at least

1/\delta\geq(1-\delta)/\delta\,\!

, we find that it suffices to have :

n\geq\frac 4 \left(km+\ln\frac 1 \delta \right)\geq O\left(\frac+\frac\right).\,\!

And hence we prove the theorem.

Applications in other domains

In the above example of the usage of exponential mechanism, one can output a synthetic dataset in a differentially private manner and can use the dataset to answer queries with good accuracy. Other private mechanisms, such as posterior sampling, which returns parameters rather than datasets, can be made equivalent to the exponential one. Apart from the setting of privacy, the exponential mechanism has also been studied in the context of

auction theory Auction theory is an applied branch of economics which deals with how bidders act in auction markets and researches how the features of auction markets incentivise predictable outcomes. Auction theory is a tool used to inform the design of real- ...

and

classification algorithms Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

.Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim,

Sofya Raskhodnikova Sofya Raskhodnikova (born 1976) is a Belarusian and American theoretical computer scientist. She is known for her research in sublinear-time algorithms, information privacy, property testing, and approximation algorithms, and was one of the f ...

, Adam Smith. What Can We Learn Privately? Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
arXiv:0803.0924
/ref> In the case of auctions the exponential mechanism helps to achieve a ''truthful'' auction setting.

References

External links

The Algorithmic Foundations of Differential Privacy
by Cynthia Dwork and Aaron Roth, 2014. {{DEFAULTSORT:Exponential Mechanism (Differential Privacy) Information privacy Theory of cryptography Applied probability Differential privacy