Latent and observable variables
In statistics, latent variables (from Latin: present participle of lateo, “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured.[1] Such latent variable models are used in many disciplines, including political science, demography, engineering, medicine, ecology, physics, machine learning/artificial intelligence, bioinformatics, chemometrics, natural language processing, management, psychology and the social sciences.
Latent variables may correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons. In this situation, the term hidden variables is commonly used (reflecting the fact that the variables are meaningful, but not observable). Other latent variables correspond to abstract concepts, like categories, behavioral or mental states, or data structures. The terms hypothetical variables or hypothetical constructs may be used in these situations.
The use of latent variables can serve to reduce the dimensionality of data. Many observable variables can be aggregated in a model to represent an underlying concept, making it easier to understand the data. In this sense, they serve a function similar to that of scientific theories. At the same time, latent variables link observable "sub-symbolic" data in the real world to symbolic data in the modeled world.
Examples
Psychology
Latent variables, as created by factor analytic methods, generally represent "shared" variance, or the degree to which variables "move" together. Variables that have no correlation cannot result in a latent construct based on the common factor model.[3]
- The "Big Five personality traits" have been inferred using factor analysis.
- extraversion[4]
- spatial ability[4]
- wisdom “Two of the more predominant means of assessing wisdom include wisdom-related performance and latent variable measures.”[5]
- Spearman's g, or the general intelligence factor in psychometrics[6]
Economics
Examples of latent variables from the field of economics include quality of life, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly. But linking these latent variables to other, observable variables, the values of the latent variables can be inferred from measurements of the observable variables. Quality of life is a latent variable which cannot be measured directly so observable variables are used to infer quality of life. Observable variables to measure quality of life include wealth, employment, environment, physical and mental health, education, recreation and leisure time, and social belonging.
Medicine
Latent-variable methodology is used in many branches of medicine. A class of problems that naturally lend themselves to latent variables approaches are longitudinal studies where the time scale (e.g. age of participant or time since study baseline) is not synchronized with the trait being studied. For such studies, an unobserved time scale that is synchronized with the trait being studied can be modeled as a transformation of the observed time scale using latent variables. Examples of this include disease progression modeling and modeling of growth (see box).
Inferring latent variables
There exists a range of different model classes and methodology that make use of latent variables and allow inference in the presence of latent variables. Models include:
- linear mixed-effects models and nonlinear mixed-effects models
- Hidden Markov models
- Factor analysis
- Item response theory
Analysis and inference methods include:
- Principal component analysis
- Instrumented principal component analysis[7]
- Partial least squares regression
- Latent semantic analysis and probabilistic latent semantic analysis
- EM algorithms
- Metropolis–Hastings algorithm
Bayesian algorithms and methods
Bayesian statistics is often used for inferring latent variables.
- Latent Dirichlet allocation
- The Chinese restaurant process is often used to provide a prior distribution over assignments of objects to latent categories.
- The Indian buffet process is often used to provide a prior distribution over assignments of latent binary features to objects.
See also
- Confounding
- Dependent and independent variables
- Errors-in-variables models
- Evidence lower bound
- Factor analysis
- Intervening variable
- Latent variable model
- Item response theory
- Partial least squares path modeling
- Partial least squares regression
- Proxy (statistics)
- Rasch model
- Structural equation modeling
References
- Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9
- Raket LL, Sommer S, Markussen B (2014). "A nonlinear mixed-effects model for simultaneous smoothing and registration of functional data". Pattern Recognition Letters. 38: 1–7. doi:10.1016/j.patrec.2013.10.018.
- Tabachnick, B.G.; Fidell, L.S. (2001). Using Multivariate Analysis. Boston: Allyn and Bacon. ISBN 978-0-321-05677-1.
- Borsboom, D.; Mellenbergh, G.J.; van Heerden, J. (2003). "The Theoretical Status of Latent Variables" (PDF). Psychological Review. 110 (2): 203–219. CiteSeerX 10.1.1.134.9704. doi:10.1037/0033-295X.110.2.203. PMID 12747522. Archived from the original (PDF) on 2013-01-20. Retrieved 2008-04-08.
- Greene, Jeffrey A.; Brown, Scott C. (2009). "The Wisdom Development Scale: Further Validity Investigations". International Journal of Aging and Human Development. 68 (4): 289–320 (at p. 291). doi:10.2190/AG.68.4.b. PMID 19711618.
- Spearman, C. (1904). ""General Intelligence," Objectively Determined and Measured". The American Journal of Psychology. 15 (2): 201–292. doi:10.2307/1412107. JSTOR 1412107.
- Kelly, Bryan T. and Pruitt, Seth and Su, Yinan, Instrumented Principal Component Analysis (December 17, 2020). Available at SSRN: https://ssrn.com/abstract=2983919 or http://dx.doi.org/10.2139/ssrn.2983919
Further reading
- Kmenta, Jan (1986). "Latent Variables". Elements of Econometrics (Second ed.). New York: Macmillan. pp. 581–587. ISBN 978-0-02-365070-3.