Structural Equation Model - a flexible tool for Predictive and Prescriptive analysis

Virendra Pal

Published Sep 7, 2020

SEM (Structural Equation Model) is a powerful and flexible multivariate analysis tool that has been extensively used in psychometrics, behavioural science, business and marketing research. The application of this algorithm is not restricted to a single data source and can be extended to usage with web logs, social media data, transactional data, economic data and other alternate data sources.

SEM is a phenomenal combination of multilevel regression, ANOVA and factor analysis. It allows the user to study and validate complex relationships amongst many independent/dependent variables or features. It tests and evaluates proposed multivariate causal relationships, hence, alternately being called as “Causal Modeling or Confirmatory Technique.

SEM requires knowledge of key terms and nomenclature such as exogenous, endogenous, latent and indicator variables. Independent variables not influenced by other variables in a model are called exogenous while the ones influenced by other variables are known as endogenous. An attribute that is directly observed and measured is called an indicator variable. On the other hand, features that are not directly measured are called latent variables. Apart from these terms, one also needs to have statistical prerequisite of covariance, correlation, partial correlation main effect, interaction effect, moderation, mediation, Sobel test, direct effect and indirect effects as these are the building blocks for this algorithm.

Like any other test or statistical modelling technique, SEM requires specification of hypothesized model that shall undergo validation testing. SEM can also be used to compare multiple hypothesized models at the same time.

Let’s take an example to understand how SEM can be applied in lending industry to determine factors affecting customer defaults and to evaluate variable association. Referring to the diagram above:

Using internal data like credit history, transaction details and other financial details; underlying latent factor can be derived for Payment Capability of the Customer
Data like social media, demographics, psychometric and other sources can help predict Intent and environmental factors. Mediation effect should also be taken into consideration as environmental factors can directly and indirectly affect the default of customer.
“Intent” and “Ability to Pay” factors will play a pivotal role in default predictions.
Above model can also help in understanding various customer segments present in the data

SEM is well equipped to inform the amount of variance in dependent variables (DVs) – both indicator and latent DVs. It can also tell the reliability of each measured variable. SEM allows to examine mediation and moderation, inclusive of indirect effects, hence, in certain scenarios, it can also be used for variable selection. One of the interesting advantages of SEM is hierarchical modelling functionality with inclusion of fixed/random effects.

While working on SEM, few steps need to be followed such as hypothesis of construct, verifying assumptions, parameter estimation, model evaluation and tuning.

Initial steps include establishing theoretically plausible models i.e. prior knowledge of the positive or negative direct effects among variables
SEM follows all assumptions of multivariate models such as multivariate normality and Multicollinearity, however, some estimation methods do not require normality assumptions. Like other models, it is also sensitive to missing values and sample size.
SEM uses various Estimation methods such as maximum likelihood (ML), generalized least squares, weighted least squares, and partial least squares.
For Model evaluation and modification, one needs to know model fit indices like χ 2, CFI, RMSEA, TLI, GFI, NFI, SRMR, AIC, and BIC. Also, the residuals of covariances need to be small and centred about zero. For non-normal residuals, robustness of model can be measured by goodness of fit tests like Lagrange Multiplier test.

Similar to other statistical methods, unmet assumptions, overfitting and non-convergence of model are also faced in this algorithm, however, with conceptual understanding and practice, one should be able to control multiple error metrics and model fit index for getting best fit model. Researchers have come up with many different variants of SEM, Partial least square SEM (PLS-SEM), Hierarchical SEM, Bayesian SEM and many more, which is worth exploration for specific use cases.

Simar Kaur Kaler 3y

Do you have information on how to use this model on excel?

Deepanjan Dey 5y

Good read thanks

1 Reaction

See more comments

To view or add a comment, sign in

Structural Equation Model - a flexible tool for Predictive and Prescriptive analysis

Virendra Pal

More articles by Virendra Pal

Others also viewed

Multicollinearity in Linear Regression

Hypothesis Testing: A/B Tests Explained

Bayesian Approach for Predicting COVID-19 Impact on Stock Market Movement Using Alternative Data

Interpreting the Intercept in a Regression Model

Evaluation Metrics

Linear Regression : (What/Why & How)

Key Metrics for Evaluating Regression Models [Part 2]

Regression : Transforming the healthcare landscape with statistics

Overfitting in Regression Models

Methods to determine the important predictor in an OLS regression model

Explore content categories

More articles by Virendra Pal

The Future of Banking Automation: From RPA to Multi-Agent AI Systems 🏦

Unveiling the Architectural Synergy of VLM and LLM: A Deep Dive into AI's Next Frontier

The Decade-Long Evolution of Credit Card Fraud Detection: From Expert-Driven Complexity to Data-Driven Efficiency

How Thinking Straight Can Transform Your Data Science Career 💡🔍

Learn, unlearn and relearn – Adaptive Predictive Modelling during Pandemic

MDS - Distance, dissimilarity and similarity