Latent class analysis (LCA)

Stata’s gsem command now supports latent class analysis (LCA).

Latent class models use categorical latent variables. Categorical means group. Latent means unobserved. Categorical latent variables can be used, for instance,

  1. in marketing or management to represent consumers with different buying preferences;
  2. in health to represent patients in different risk groups; and
  3. in education or psychology to represent students with different patterns of behavior.

Unobserved are the buying preferences, risk groups, and behavior patterns. These unobserved categories are the latent classes, and LCA is used to identify and understand them.

If we have observed variables that are indicators of unobserved groups of consumers, we could fit a latent class model and then

  1. estimate the proportion of consumers belonging to each class;
  2. estimate the probability of a positive response to observed variables in each consumer group;
  3. evaluate the goodness of fit; and
  4. predict the probability of belonging to each consumer group for individuals with a specific pattern of observed responses.

Stata’s LCA features also allow you to fit latent profile models (with continuous observed outcomes), path models with latent categorical variables, and finite mixture models (FMMs). But for FMMs, see item 6 below.