Propensity Models: Concept, Development & Maintenance

Propensity Models: Concept, Development & Maintenance

1.0 Propensity towards an action

According to the Oxford dictionary the noun “Propensity” denotes the meaning of “tendency to a particular kind of behavior”. As an example, it is possible to consider the common behavior of dogs towards strangers which is usually aggressive in nature. Studies show most dogs tend to bark towards strangers (ASPCA, 2023). This event can be rephrased as “The propensity of dogs barking at strangers”.

Propensity indicates how likely a behavioral event occurs at the given conditions. The propensity is measured by propensity score. The propensity score is the conditional probability of receiving the treatment rather than the control given the observed covariates (Rosenbaum and Rubin 1983). 

2.0 Propensity Modeling

Prior to going into the details of propensity models, it is worth understanding the differences between Predictive Analytics and Machine Learning. Wakefield, from SAS UK has mentioned that “A common misconception is that predictive analytics and machine learning are the same thing.” Although both have some overlaps there are many factors that distinguish both (Wakefield, 2023).

Propensity models, are statistical models used in predictive modeling to determine the likelihood of an event occurring, are sometimes referred to as propensity score models. To determine the probability that a person would do a specific action or display a specific behavior, these models are frequently used in the social sciences, banking, marketing, and healthcare domains Wakefield, 2023).

3.0 Steps of Model Development

First, it is required to state the goal of the propensity model (use case). This might be a to find how customers likely to select a specific product, traveling overseas, spending on specific supermarkets, revolving the credit card interest, purchasing an electronic item, staying on a specific hotel, loyal customer of specific supermarket / airline / pharmacy or any action of a customer which has an impact.

Article content

3.1 Data Extraction / Data Collection

Collect / extract pertinent information on the predictor variables (Commonly called as features) and the outcome variable (target variable or response variable). The variables should accurately reflect the objective of the model. It's critical to comprehend how each predictor variable affects the response variable.

3.2 Data Preparation / Transformation

3.2.1 Handling Outliers & Missing Values

It is required to handle the outliers and missing values. Most common methodology on imputing missing data using the descriptive statistics-based techniques. The mean or median of the observed values for that feature should be used to fill in any missing values. This approach is suitable for numerical data. Replace missing values for categorical variables with the variable's mode, or most frequent value. Other than this predictive modeling techniques such as Time Series, Linear Regression & KNN can also be used for data imputation.

3.2.2 Data Normalization:

When the numerical data has a higher variance and is represented in larger values, the model might be overfitting. When training a model, data normalization can help in reducing overfitting. One of the most used techniques for normalizing data is min-max normalization. All features have their minimum value converted to a zero, their maximum value converted to a one, and all other values converted to a decimal between 0 and 1.

3.2.3 Data Transformation:

The significant categorical variables required to be transformed into the model reading formats. As an example, attribute gender can be segregated as two different binary variables 1). Being a male (0 or 1) & being a 2). Being female (0 or 1).

Transforming a few attributes to ratios can be classified as another way of transforming. This is merging a few features of the data set into one feature.

As an example.

a)  Current Month’s Spend / Last Six month’s Spend.

b)  Current Months’ Credit Cards Balance / Last three month’s Average Credit Cards Balance.

3.3 Variable Selection

Predictor variables should be selected based on their likelihood to affect the result. These might consist of historical conduct, demographic data, and other relevant characteristics. There are several methods of selecting the most appropriate predictor variables. By evaluating Correlation Coefficient & Gini Coefficient it is possible to determine the strength of the predictor variables towards the response variable and selected accordingly.

During this process it is mandatory to exclude the variables with multicollinearity. This will avoid the issue of overfitting. When selecting the binary variables, it is required to exclude the variables which finally create for the perfect match to the target variable. This also can cause the error of model overfitting. As example, when finding the propensity of a customer shopping on supermarket or buying a car from a specific dealer etc., there could be an advertising channel where these customers acquired. The combination of these binary variables could be equal to the predictor variable, and it will be highly corelate to the target variable (always indicated as 1). Hence such variables must be excluded from the predictor variables.

3.4 Model Development / Model Fitting

During the process of propensity model development, several models are required to be developed. GLMs, Random Forest, Decision Trees, XG Boost & SVMs are widely used ML algorithms. However, GLMs (Logistic Regression) are very common in nature for propensity models. It is required to understand the underlying assumptions of the propensity model, particularly when using logistic regression approaches.

3.5 Model Selection

During the process of model development & selecting the best performing model, it is required to segregate 70% of data to train the model while keeping 30% of data to test the model using a random selection methodology. (Composition percentage of training and test dataset vary based on the particular use case). It is mandatory not to include any record of the training data set (Both the data sets must be unique). This kind of error can result in high performance and misguide the selection of the model.

Model selection should be based on the performance measures of the models developed. To find out how effectively a machine learning model performs at its intended task and how well it generalizes to new data, it is essential to measure the model's performance. There are numerous measures and methods for evaluating a model's performance. Confusion Matrix, ROC Curve and AUC-ROC, Mean Squared Error (MSE), Mean Absolute Error (MAE) and Gini Coefficient are commonly used performance measurement techniques.

Selecting performance measures that align with the objectives of the propensity model is crucial. Moreover, while evaluating the overall performance of a model, it is crucial to consider variables like feature engineering, data quality, and model interpretability. In case of failing to select the best performing model (Non availability of good model), it is required to re-look the model features. Perhaps, further discussion with the respective business owner for further insights would be very beneficial.

3.6 Model Testing / Interpretations

After selecting the best fitting model, it is required to prepare the new customer base with all the selected features for the operation of the specific model. Prior to predicting the propensity (predictor variable), it is very beneficial to re-train the model once again with the new data as mentioned in step 3.5. This evaluation provides some insights on the robustness of the selected mode. If the model is not providing the same performance, it is required to relook at the model features. Perhaps this may be due to the volatility of the market conditions toward the features of the model.  

If the model is providing a similar performance at the model selection, the model is ready to predict the predictor variable. (The propensity of the occurrence or non-occurrence of the event). The concerned stakeholders can launch a sales campaign / business activity and test the outcome of the model. The results of the campaign or the business activity will be very beneficial for the evaluation of the model’s performance. The analysis of the model results will be very critical to the respective business stakeholders because this would provide the correct insights into the model’s performance in a real-life scenario.  

3.7 Monitoring the Model Performance & Update

Analyzing the model performance before and after the usage of the model for research / business activity / campaign would be very essential to evaluate model time to time.  This will indicate the robustness or the validity of the model and result accurate predictions.

If the model is not providing the same performance, it is required to relook at the model features. Perhaps this may be due to the volatility of the market conditions toward the features of the model. 


Bibliography

1. Definition of propensity noun from the Oxford Advanced American Dictionary [Online], 2023, Available from: https://www.oxfordlearnersdictionaries.com/definition/american_english/propensity#:~:text=%2Fpr%C9%99%CB%88p%C9%9Bns%C9%99t%CC%AEi%2F,showed%20a%20propensity%20for%20violence. [Accessed 15 August 2023].

2. Aggression [Online], 2023, Available from:  https://www.aspca.org/pet-care/dog-care/common-dog-behavior-issues/aggression#:~:text=Family%20Members%2C%20Strangers%20or%20Other%20Animals&text=It's%20common%20for%20dogs%20to,unfamiliar%20dogs%20is%20also%20widespread [Accessed 17 December 2023].

3. Suffering from foreign travel withdrawal symptoms? You can now catch a flight that never takes off [Online], 2020, Available from: https://economictimes.indiatimes.com/magazines/panache/suffering-from-foreign-travel-withdrawal-symptoms-you-can-now-catch-a-flight-that-never-takes-off/articleshow/76780684.cms?from=mdr [Accessed 19 December 2023].

4. Top four foreign travel safety tips [Online], 2022, Available from: https://magazine.northeast.aaa.com/daily/money/travel-insurance/top-4-foreign-travel-safety-tips/ [Accessed 21 December 2023].

5. Hyper markets, [Online], 2023, Available from: https://alibinali.com/business-streams/hypermarkets/ [Accessed 19 December 2023].

6. Fintech: Making it easy to compare and purchase financial products, [Online], 2023, Available from: https://stocknews.my/posts/51489/fintech-making-it-easy-to-compare-and-purchase-financial-products, [Accessed 19 December 2023].

7. Propensity Model: Using Data to Predict Customer Behavior, [Online], 2023, Available from: https://jelvix.com/blog/propensity-model, [Accessed 20 December 2023].

8. Wakefield, K., 2019. Predictive Modeling Analytics and Machine Learning. Available from: https://www.sas.com/en_gb/insights/articles/analytics/a-guide-to-predictive-analytics-and-machine-learning.html#:~:text=Predictive%20analytics%20and%20machine%20learning%20go%20hand%2Din%2Dhand%2C,the%20field%20of%20machine%20learning [Accessed 18 December 2023].

9. Skewness, [Online], 2023, Available from: https://dataanalyze.wordpress.com/skewness/ [Accessed 19 December 2023].

10. What Is Machine Learning? Definition, Types, Applications, and Trends, [Online], 2023, Available from: https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-ml/ [Accessed 04 January 2024].

11. Machine Learning at a Glance: Fast Facts for Top Marketers, [Online], 2023, Available from: https://www.criteo.com/blog/machine-learning-fast-facts/[Accessed 04 January 2024].

To view or add a comment, sign in

More articles by Chandima Dabare

Others also viewed

Explore content categories