Latent Class Analysis
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, a latent class model (LCM) is a model for clustering multivariate discrete data. It assumes that the data arise from a mixture of discrete distributions, within each of which the variables are independent. It is called a latent class model because the class to which each data point belongs is unobserved, or latent. Latent class analysis (LCA) is a subset of
structural equation modeling Structural equation modeling (SEM) is a diverse set of methods used by scientists for both observational and experimental research. SEM is used mostly in the social and behavioral science fields, but it is also used in epidemiology, business, ...
, used to find groups or subtypes of cases in multivariate
categorical data In statistics, a categorical variable (also called qualitative variable) is a variable (research), variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a ...
. These subtypes are called "latent classes".Lazarsfeld, P.F. and Henry, N.W. (1968) ''Latent structure analysis''. Boston: Houghton Mifflin Confronted with a situation as follows, a researcher might choose to use LCA to understand the data: Imagine that symptoms a-d have been measured in a range of patients with diseases X, Y, and Z, and that disease X is associated with the presence of symptoms a, b, and c, disease Y with symptoms b, c, d, and disease Z with symptoms a, c and d. The LCA will attempt to detect the presence of latent classes (the disease entities), creating patterns of association in the symptoms. As in
factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observe ...
, the LCA can also be used to classify case according to their
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
class membership. Because the criterion for solving the LCA is to achieve latent classes within which there is no longer any association of one symptom with another (because the class is the disease which causes their association), and the set of diseases a patient has (or class a case is a member of) causes the symptom association, the symptoms will be "conditionally independent", i.e., conditional on class membership, they are no longer related.


Model

Within each latent class, the observed variables are
statistically independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two event (probability theory), events are independent, statistically independent, or stochastically independent if, informally s ...
. This is an important aspect. Usually the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes variables are independent ( local independence). We then say that the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987). In one form, the latent class model is written as : p_ \approx \sum_t^T p_t \, \prod_n^N p^n_, where T is the number of latent classes and p_t are the so-called recruitment or unconditional probabilities that should sum to one. p^n_ are the marginal or conditional probabilities. For a two-way latent class model, the form is : p_ \approx \sum_t^T p_t \, p_ \, p_. This two-way model is related to
probabilistic latent semantic analysis Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one c ...
and
non-negative matrix factorization Non-negative matrix factorization (NMF or NNMF), also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix is factorized into (usually) two matrices and , with the property th ...
. The probability model used in LCA is closely related to the
Naive Bayes classifier In statistics, naive (sometimes simple or idiot's) Bayes classifiers are a family of " probabilistic classifiers" which assumes that the features are conditionally independent, given the target class. In other words, a naive Bayes model assumes th ...
. The main difference is that in LCA, the class membership of an individual is a latent variable, whereas in Naive Bayes classifiers the class membership is an observed label.


Related methods

There are a number of methods with distinct names and uses that share a common relationship.
Cluster analysis Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...
is, like LCA, used to discover taxon-like groups of cases in data. Multivariate mixture estimation (MME) is applicable to continuous data, and assumes that such data arise from a mixture of distributions: imagine a set of heights arising from a mixture of men and women. If a multivariate mixture estimation is constrained so that measures must be uncorrelated within each distribution, it is termed
latent profile analysis Latency or latent may refer to: Engineering * Latency (engineering), a measure of the time delay experienced by a system ** Latency (audio), the delay between the moment an audio signal is triggered and the moment it is produced or received ** Mec ...
. Modified to handle discrete data, this constrained analysis is known as LCA. Discrete latent trait models further constrain the classes to form from segments of a single dimension: essentially allocating members to classes on that dimension: an example would be assigning cases to social classes on a dimension of ability or merit. As a practical instance, the variables could be
multiple choice Multiple choice (MC), objective response or MCQ (for multiple choice question) is a form of an objective assessment in which respondents are asked to select only the correct answer from the choices offered as a list. The multiple choice format i ...
items of a political questionnaire. The data in this case consists of a N-way
contingency table In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the multivariate frequency distribution of the variables. They are heavily used in survey research, business int ...
with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion and the latent classes to political groups. Given group membership, the
conditional probabilities In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This particular method relies on ...
specify the chance certain answers are chosen.


Application

LCA may be used in many fields, such as:
collaborative filtering Collaborative filtering (CF) is, besides content-based filtering, one of two major techniques used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook, Recommender Systems Handbo ...
, ''
Behavior Genetics Behavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" c ...
'' an
Evaluation of diagnostic tests


References

* * * *


External links

* Statistical Innovations
Home Page
2016. Website with latent class software (Latent GOLD 5.1), free demonstrations, tutorials, user guides, and publications for download. Also included: online courses, FAQs, and other related software. * The Methodology Center
Latent Class Analysis
a research center at
Penn State #Redirect Pennsylvania State University The Pennsylvania State University (Penn State or PSU) is a Public university, public Commonwealth System of Higher Education, state-related Land-grant university, land-grant research university with ca ...
, free software, FAQ * John Uebersax
Latent Class Analysis
2006. A web-site with bibliography, software, links and FAQ for latent class analysis {{DEFAULTSORT:Latent Class Model Classification algorithms Latent variable models Market research Market segmentation