A latent variable model is a

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...

that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of

latent variable In statistics, latent variables (from Latin: present participle of ) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such '' latent va ...

s. Latent variable models are applied across a wide range of fields such as biology, computer science, and social science. Common use cases for latent variable models include applications in

psychometrics Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and rela ...

(e.g., summarizing responses to a set of survey questions with a

factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observe ...

model positing a smaller number of psychological attributes, such as the trait

extraversion Extraversion and introversion are a central trait dimension in human personality theory. The terms were introduced into psychology by Carl Jung, though both the popular understanding and current psychological usage are not the same as Jung's ...

, that are presumed to cause the survey question responses), and

natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...

(e.g., a

topic model In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden ...

summarizing a corpus of texts with a number of "topics"). It is assumed that the responses on the indicators or manifest variables are the result of an individual's position on the latent variable(s), and that the manifest variables have nothing in common after controlling for the latent variable ( local independence). Different types of the latent variable models can be grouped according to whether the manifest and latent variables are categorical or continuous: The

Rasch model The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, ...

represents the simplest form of item response theory. Mixture models are central to latent profile analysis. In

and

latent trait analysis In psychometrics, item response theory (IRT, also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring ...

the latent variables are treated as continuous

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

variables, and in latent profile analysis and latent class analysis as from a

multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided die rolled ''n'' times. For ''n'' statistical independence, indepen ...

. The manifest variables in factor analysis and latent profile analysis are continuous and in most cases, their conditional distribution given the latent variables is assumed to be normal. In latent trait analysis and latent class analysis, the manifest variables are discrete. These variables could be dichotomous, ordinal or nominal variables. Their conditional distributions are assumed to be binomial or multinomial.

See also

Notes

References

Further reading