HOME

TheInfoList



OR:

In
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, a latent class model (LCM) relates a set of observed (usually discrete)
multivariate Multivariate may refer to: In mathematics * Multivariable calculus * Multivariate function * Multivariate polynomial In computing * Multivariate cryptography * Multivariate division algorithm * Multivariate interpolation * Multivariate optical c ...
variables to a set of latent variables. It is a type of latent variable model. It is called a latent class model because the latent variable is discrete. A class is characterized by a pattern of conditional probabilities that indicate the chance that variables take on certain values. Latent class analysis (LCA) is a subset of
structural equation modeling Structural equation modeling (SEM) is a label for a diverse set of methods used by scientists in both experimental and observational research across the sciences, business, and other fields. It is used most in the social and behavioral scienc ...
, used to find groups or subtypes of cases in multivariate
categorical data In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...
. These subtypes are called "latent classes".Lazarsfeld, P.F. and Henry, N.W. (1968) ''Latent structure analysis''. Boston: Houghton Mifflin Formann, A. K. (1984). ''Latent Class Analyse: Einführung in die Theorie und Anwendung atent class analysis: Introduction to theory and application'. Weinheim: Beltz. Confronted with a situation as follows, a researcher might choose to use LCA to understand the data: Imagine that symptoms a-d have been measured in a range of patients with diseases X, Y, and Z, and that disease X is associated with the presence of symptoms a, b, and c, disease Y with symptoms b, c, d, and disease Z with symptoms a, c and d. The LCA will attempt to detect the presence of latent classes (the disease entities), creating patterns of association in the symptoms. As in
factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
, the LCA can also be used to classify case according to their
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...
class membership. Because the criterion for solving the LCA is to achieve latent classes within which there is no longer any association of one symptom with another (because the class is the disease which causes their association), and the set of diseases a patient has (or class a case is a member of) causes the symptom association, the symptoms will be "conditionally independent", i.e., conditional on class membership, they are no longer related.


Model

Within each latent class, the observed variables are statistically independent. This is an important aspect. Usually the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes variables are independent ( local independence). We then say that the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987). In one form, the latent class model is written as : p_ \approx \sum_t^T p_t \, \prod_n^N p^n_, where T is the number of latent classes and p_t are the so-called recruitment or unconditional probabilities that should sum to one. p^n_ are the marginal or conditional probabilities. For a two-way latent class model, the form is : p_ \approx \sum_t^T p_t \, p_ \, p_. This two-way model is related to probabilistic latent semantic analysis and non-negative matrix factorization.


Related methods

There are a number of methods with distinct names and uses that share a common relationship.
Cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
is, like LCA, used to discover taxon-like groups of cases in data. Multivariate mixture estimation (MME) is applicable to continuous data, and assumes that such data arise from a mixture of distributions: imagine a set of heights arising from a mixture of men and women. If a multivariate mixture estimation is constrained so that measures must be uncorrelated within each distribution it is termed latent profile analysis. Modified to handle discrete data, this constrained analysis is known as LCA. Discrete latent trait models further constrain the classes to form from segments of a single dimension: essentially allocating members to classes on that dimension: an example would be assigning cases to social classes on a dimension of ability or merit. As a practical instance, the variables could be
multiple choice Multiple choice (MC), objective response or MCQ (for multiple choice question) is a form of an objective assessment in which respondents are asked to select only correct answers from the choices offered as a list. The multiple choice format is mo ...
items of a political questionnaire. The data in this case consists of a N-way
contingency table In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business ...
with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion and the latent classes to political groups. Given group membership, the conditional probabilities specify the chance certain answers are chosen.


Application

LCA may be used in many fields, such as: collaborative filtering, ''
Behavior Genetics Behavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" ...
'' an
Evaluation of diagnostic tests


References

* * * *


External links

* Statistical Innovations
Home Page
2016. Website with latent class software (Latent GOLD 5.1), free demonstrations, tutorials, user guides, and publications for download. Also included: online courses, FAQs, and other related software. * The Methodology Center
Latent Class Analysis
a research center at Penn State, free software, FAQ * John Uebersax
Latent Class Analysis
2006. A web-site with bibliography, software, links and FAQ for latent class analysis {{DEFAULTSORT:Latent Class Model Classification algorithms Latent variable models Market research Market segmentation