Maximally informative dimensions is a

dimensionality reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...

technique used in the statistical analyses of neural responses. Specifically, it is a way of projecting a stimulus onto a low-dimensional subspace so that as much

information Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...

as possible about the stimulus is preserved in the neural response. It is motivated by the fact that natural stimuli are typically confined by their

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

to a lower-dimensional space than that spanned by

white noise In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used with this or similar meanings in many scientific and technical disciplines, i ...

but correctly identifying this subspace using traditional techniques is complicated by the correlations that exist within natural images. Within this subspace, stimulus-response functions may be either

linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...

nonlinear In mathematics and science, a nonlinear system (or a non-linear system) is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathe ...

. The idea was originally developed by Tatyana Sharpee, Nicole C. Rust, and William Bialek in 2003.

Mathematical formulation

Neural stimulus-response functions are typically given as the probability of a

neuron A neuron (American English), neurone (British English), or nerve cell, is an membrane potential#Cell excitability, excitable cell (biology), cell that fires electric signals called action potentials across a neural network (biology), neural net ...

generating an

action potential An action potential (also known as a nerve impulse or "spike" when in a neuron) is a series of quick changes in voltage across a cell membrane. An action potential occurs when the membrane potential of a specific Cell (biology), cell rapidly ri ...

, or spike, in response to a stimulus

\mathbf

. The goal of maximally informative dimensions is to find a small relevant subspace of the much larger stimulus space that accurately captures the salient features of

\mathbf

. Let

D

denote the dimensionality of the entire stimulus space and

K

denote the dimensionality of the relevant subspace, such that

K \ll D

. We let

\

denote the basis of the relevant subspace, and

\mathbf^K

the

projection Projection or projections may refer to: Physics * Projection (physics), the action/process of light, heat, or sound reflecting from a surface to another in a different direction * The display of images by a projector Optics, graphics, and carto ...

\mathbf

onto

\

. Using

Bayes' theorem Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...

we can write out the probability of a spike given a stimulus: :

P(spike, \mathbf^K) = P(spike)f(\mathbf^K)

where :

f(\mathbf^K) = \frac

is some nonlinear function of the projected stimulus. In order to choose the optimal

\

, we compare the prior stimulus distribution

P(\mathbf)

with the spike-triggered stimulus distribution

P(\mathbf, spike)

using the Shannon information. The

average In colloquial, ordinary language, an average is a single number or value that best represents a set of data. The type of average taken as most typically representative of a list of numbers is the arithmetic mean the sum of the numbers divided by ...

information (averaged across all presented stimuli) per spike is given by :

I_ = \sum_ P(\mathbf, spike) log_2 spike)/P(\mathbf) /math>. N. Brenner, S. P. Strong, R. Koberle, W. Bialek, and R. R. de Ruyter van Steveninck. "Synergy in a neural code. Neural Comp., 12:1531-1552, 2000. Now consider a K = 1 dimensional subspace defined by a single direction \mathbf . The average information conveyed by a single spike about the projection x = \mathbf \cdot \mathbf is

: I(\mathbf) = \int dx P_(x, spike)log2 spike)/P_(x) /math>,

where the probability distributions are approximated by a measured data set via P_(x, spike) = \langle \delta(x - \mathbf \cdot \mathbf) , spike \rangle_and P_(x) = \langle \delta(x - \mathbf \cdot \mathbf)\rangle_, i.e., each presented stimulus is represented by a scaled

Dirac delta function In mathematical analysis, the Dirac delta function (or distribution), also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line ...

and the probability distributions are created by averaging over all spike-eliciting stimuli, in the former case, or the entire presented stimulus set, in the latter case. For a given dataset, the average information is a function only of the direction

\mathbf

. Under this formulation, the relevant subspace of dimension

K = 1

would be defined by the direction

\mathbf

that maximizes the average information

I(\mathbf)

. This procedure can readily be extended to a relevant subspace of dimension

K > 1

by defining :

P_(\mathbf, spike) = \langle \prod_^K \delta(x_i - \mathbf \cdot \mathbf_i) , spike \rangle_

and :

P_(\mathbf) = \langle \prod_^K \delta(x_i - \mathbf \cdot \mathbf_i) \rangle_

and maximizing

I()

Importance

Maximally informative dimensions does not make any assumptions about the Gaussianity of the stimulus set, which is important, because naturalistic stimuli tend to have non-Gaussian statistics. In this way the technique is more robust than other dimensionality reduction techniques such as spike-triggered covariance analyses.

References

{{Reflist Neuroscience Computational neuroscience