Observable quantity
   HOME

TheInfoList



OR:

In statistics, latent variables (from
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
:
present participle In linguistics, a participle () (from Latin ' a "sharing, partaking") is a nonfinite verb form that has some of the characteristics and functions of both verbs and adjectives. More narrowly, ''participle'' has been defined as "a word derived from ...
of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such ''
latent variable model A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. It is assumed that the responses on the indicators or manifest variabl ...
s'' are used in many disciplines, including
political science Political science is the scientific study of politics. It is a social science dealing with systems of governance and power, and the analysis of political activities, political thought, political behavior, and associated constitutions and la ...
,
demography Demography () is the statistical study of populations, especially human beings. Demographic analysis examines and measures the dimensions and dynamics of populations; it can cover whole societies or groups defined by criteria such as edu ...
,
engineering Engineering is the use of scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad range of more speciali ...
,
medicine Medicine is the science and practice of caring for a patient, managing the diagnosis, prognosis, prevention, treatment, palliation of their injury or disease, and promoting their health. Medicine encompasses a variety of health care pr ...
,
ecology Ecology () is the study of the relationships between living organisms, including humans, and their physical environment. Ecology considers organisms at the individual, population, community, ecosystem, and biosphere level. Ecology overl ...
,
physics Physics is the natural science that studies matter, its fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge which r ...
,
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
/
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
, bioinformatics,
chemometrics Chemometrics is the science of extracting information from chemical systems by data-driven means. Chemometrics is inherently interdisciplinary, using methods frequently employed in core data-analytic disciplines such as multivariate statistics, a ...
, natural language processing,
management Management (or managing) is the administration of an organization, whether it is a business, a nonprofit organization, or a Government agency, government body. It is the art and science of managing resources of the business. Management includ ...
and the
social sciences Social science is one of the branches of science, devoted to the study of societies and the relationships among individuals within those societies. The term was formerly used to refer to the field of sociology, the original "science of so ...
. Latent variables may correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons. In this situation, the term ''hidden variables'' is commonly used (reflecting the fact that the variables are meaningful, but not observable). Other latent variables correspond to abstract concepts, like categories, behavioral or mental states, or data structures. The terms ''hypothetical variables'' or ''hypothetical constructs'' may be used in these situations. The use of latent variables can serve to reduce the dimensionality of data. Many observable variables can be aggregated in a model to represent an underlying concept, making it easier to understand the data. In this sense, they serve a function similar to that of scientific theories. At the same time, latent variables link observable "
sub-symbolic In artificial intelligence, symbolic artificial intelligence is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic and search. Sy ...
" data in the real world to symbolic data in the modeled world.


Examples


Psychology

Latent variables, as created by factor analytic methods, generally represent "shared" variance, or the degree to which variables "move" together. Variables that have no correlation cannot result in a latent construct based on the common factor model. * The "
Big Five personality traits The Big Five personality traits is a suggested taxonomy, or grouping, for personality traits, developed from the 1980s onward in psychological trait theory. Starting in the 1990s, the theory identified five factors by labels, for the US English ...
" have been inferred using
factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
. * extraversion * spatial ability * wisdom “Two of the more predominant means of assessing wisdom include wisdom-related performance and latent variable measures.” *
Spearman's g The ''g'' factor (also known as general intelligence, general mental ability or general intelligence factor) is a construct developed in psychometric investigations of cognitive abilities and human intelligence. It is a variable that summarizes ...
, or the
general intelligence factor The ''g'' factor (also known as general intelligence, general mental ability or general intelligence factor) is a construct developed in psychometric investigations of cognitive abilities and human intelligence. It is a variable that summarizes ...
in
psychometrics Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally refers to specialized fields within psychology and education devoted to testing, measurement, assessment, and ...


Economics

Examples of latent variables from the field of
economics Economics () is the social science that studies the production, distribution, and consumption of goods and services. Economics focuses on the behaviour and interactions of economic agents and how economies work. Microeconomics analyzes ...
include
quality of life Quality of life (QOL) is defined by the World Health Organization as "an individual's perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards ...
, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly. But linking these latent variables to other, observable variables, the values of the latent variables can be inferred from measurements of the observable variables. Quality of life is a latent variable which cannot be measured directly so observable variables are used to infer quality of life. Observable variables to measure quality of life include wealth, employment, environment, physical and mental health, education, recreation and leisure time, and social belonging.


Medicine

Latent-variable methodology is used in many branches of
medicine Medicine is the science and practice of caring for a patient, managing the diagnosis, prognosis, prevention, treatment, palliation of their injury or disease, and promoting their health. Medicine encompasses a variety of health care pr ...
. A class of problems that naturally lend themselves to latent variables approaches are
longitudinal studies A longitudinal study (or longitudinal survey, or panel study) is a research design that involves repeated observations of the same variables (e.g., people) over short or long periods of time (i.e., uses longitudinal data). It is often a type of obs ...
where the time scale (e.g. age of participant or time since study baseline) is not synchronized with the trait being studied. For such studies, an unobserved time scale that is synchronized with the trait being studied can be modeled as a transformation of the observed time scale using latent variables. Examples of this include disease progression modeling and modeling of growth (see box).


Inferring latent variables

There exists a range of different model classes and methodology that make use of latent variables and allow inference in the presence of latent variables. Models include: * linear mixed-effects models and
nonlinear mixed-effects model Nonlinear mixed-effects models constitute a class of statistical models generalizing mixed model, linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within t ...
s *
Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
s *
Factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
*
Item response theory In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring ...
Analysis and inference methods include: * Principal component analysis * Instrumented principal component analysisKelly, Bryan T. and Pruitt, Seth and Su, Yinan, Instrumented Principal Component Analysis (December 17, 2020). Available at SSRN: https://ssrn.com/abstract=2983919 or http://dx.doi.org/10.2139/ssrn.2983919 *
Partial least squares regression Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a ...
*
Latent semantic analysis Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the do ...
and
probabilistic latent semantic analysis Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one ca ...
*
EM algorithm EM, Em or em may refer to: Arts and entertainment Music * EM, the E major musical scale * Em, the E minor musical scale * Electronic music, music that employs electronic musical instruments and electronic music technology in its production * Enc ...
s *
Metropolis–Hastings algorithm In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This sequ ...


Bayesian algorithms and methods

Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
is often used for inferring latent variables. *
Latent Dirichlet allocation In natural language processing, Latent Dirichlet Allocation (LDA) is a generative statistical model that explains a set of observations through unobserved groups, and each group explains why some parts of the data are similar. The LDA is an ex ...
* The Chinese restaurant process is often used to provide a prior distribution over assignments of objects to latent categories. * The Indian buffet process is often used to provide a prior distribution over assignments of latent binary features to objects.


See also

* Confounding * Dependent and independent variables *
Errors-in-variables models In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured e ...
* Evidence lower bound *
Factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
* Intervening variable *
Latent variable model A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. It is assumed that the responses on the indicators or manifest variabl ...
*
Item response theory In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring ...
*
Partial least squares path modeling The partial least squares path modeling or partial least squares structural equation modeling (PLS-PM, PLS-SEM) is a method for structural equation modeling that allows estimation of complex cause-effect relationships in path models with latent vari ...
*
Partial least squares regression Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a ...
*
Proxy (statistics) In statistics, a proxy or proxy variable is a variable that is not in itself directly relevant, but that serves in place of an unobservable or immeasurable variable. In order for a variable to be a good proxy, it must have a close correlation, not ...
*
Rasch model The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, ...
* Structural equation modeling


References


Further reading

* {{DEFAULTSORT:Latent Variable Social research Bayesian networks Econometric modeling Latent variable Psychometrics de:Latente Variable