The Inception Score (IS) is an algorithm used to assess the quality of images created by a

generative Generative may refer to: * Generative actor, a person who instigates social change * Generative art, art that has been created using an autonomous system that is frequently, but not necessarily, implemented using a computer * Generative music, ...

image model such as a

generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is ...

(GAN). The score is calculated based on the output of a separate, pretrained

Inceptionv3 Inception v3 is a convolutional neural network for assisting in image analysis and object detection, and got its start as a module for GoogLeNet. It is the third edition of Google's Inception Convolutional Neural Network, originally introduced duri ...

image classification model applied to a sample of (typically around 30,000) images generated by the generative model. The Inception Score is maximized when the following conditions are true: # The

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...

of the distribution of labels predicted by the Inceptionv3 model for the generated images is minimized. In other words, the classification model confidently predicts a single label for each image. Intuitively, this corresponds to the desideratum of generated images being "sharp" or "distinct". # The predictions of the classification model are evenly distributed across all possible labels. This corresponds to the desideratum that the output of the generative model is "diverse". It has been somewhat superseded by the related

Fréchet inception distance The Fréchet inception distance (FID) is a metric used to assess the quality of images created by a generative model, like a generative adversarial network (GAN). Unlike the earlier inception score (IS), which evaluates only the distribution of g ...

. While the Inception Score only evaluates the distribution of generated images, the FID compares the distribution of generated images with the distribution of a set of real images ("ground truth").

Definition

Let there be two spaces, the space of images

\Omega_X

and the space of labels

\Omega_Y

. The space of labels is finite. Let

p_

be a probability distribution over

\Omega_X

that we wish to judge. Let a discriminator be a function of type

p_:\Omega_X \to M(\Omega_Y)

where

M(\Omega_Y)

is the set of all probability distributions on

\Omega_Y

. For any image

x

, and any label

y

, let

p_(y, x)

be the probability that image

x

has label

y

, according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet. The Inception Score of

p_

relative to

p_

IS(p_, p_) := \exp\left( \mathbb E_\left x) \,  \int p_(\cdot ,  x) p_(x)dx \right)
	  \right right)

Equivalent rewrites include

\ln IS(p_, p_) := \mathbb E_\left x) \,  \mathbb E_x) right)
		  \right /math> \ln IS(p_, p_) := 
		  H x) -\mathbb E_[ H[p_(\cdot "> x) To show that this is nonnegative, use Jensen's inequality.

Pseudocode:

Interpretation

A higher inception score is interpreted as "better", as it means that

p_

is a "sharp and distinct" collection of pictures.

\ln IS(p_, p_) \in, \ln N /math>, where N is the total number of possible labels. \ln IS(p_, p_) = 0 iff for almost all x\sim p_p_(\cdot ,  x) = \int p_(\cdot ,  x) p_(x)dx That means p_is completely "indistinct". That is, for any image x sampled from p_, discriminator returns exactly the same label predictions p_(\cdot ,  x) .

The highest inception score N is achieved if and only if the two conditions are both true:
* For almost all x\sim p_, the distribution p_(y, x) is concentrated on one label. That is, H_y x) = 0 . That is, every image sampled from p_is exactly classified by the discriminator.
* For every label y, the proportion of generated images labelled as y is exactly \mathbb E_x) = \frac 1 N . That is, the generated images are equally distributed over all labels.

References

{{Machine learning evaluation metrics Machine learning Computer graphics