End of module 3: Programming for Artificial Intelligence - Predicting Credit Card Payment Defaults

End of module 3: Programming for Artificial Intelligence - Predicting Credit Card Payment Defaults

I recently presented my project on predicting credit card payment defaults using Random Forest and Neural Network algorithms, and I'm proud to announce that I passed with an 83% score! Moreover, I received positive feedback from my tutor regarding my presentation skills, which was both pleasing and encouraging.

I planned to write this article to share the content of my presentation, but first, I'd like to provide some further updates.

Over the past few weeks, I've been wrapping up my learning materials, reflecting on the content, and writing my student journal for my end-of-year assessment. Moreover, outside of my formal studies, I've been experimenting with AI on my own, exploring how to apply it to real-world problems.

Considering I'm not a developer and only started learning Python less than a year ago, I'm proud to say that through various available resources, countless hours spent staring at the code, applying what I've learned, and a lot of trial and error, I've successfully created my first AI assistant! It's currently running offline on my local PC, but I'm exploring the best platforms to host and share it.

My drive is to create free AI tools that can be used by everyone to solve real-world problems. To support this and future projects aimed at developing free AI tools to support as many people as possible, I've set up a "Buy Me a Coffee" account. Any contributions will go towards hosting and maintaining these tools, ensuring they remain free for everyone to use, as I continue to learn and improve my skills in building and deploying AI solutions.

If you have any ideas, projects, or challenges you want to discuss and potentially solve by exploring AI solutions, feel free to contact me via LinkedIn. I'd also love to connect and personally thank you for funding my projects.

Now, let's go back to the main content of this article.

In my presentation, I introduced the dataset from the UCI Machine Learning Repository, which included data from 30,000 credit card clients in Taiwan. The dataset included various features such as credit limit, gender, education, marital status, age, and payment history. My goal was to develop a model to predict credit card payment defaults, where '1' indicated a default and '0' indicated no default.

Article content
Screenshot of the dataset raw data

I chose two machine learning algorithms for this task: Random Forest and Neural Networks.

Random Forest is an ensemble learning method that combines multiple decision trees to improve predictive accuracy and reduce overfitting. It uses techniques like bagging and feature randomness to enhance its performance. Bagging involves creating multiple subsets of the original dataset through bootstrapping and then aggregating the predictions from these subsets. To add diversity among the trees, I also used feature randomness which randomly select a subset of features during the split of a decision tree.

Neural Networks are inspired by the human brain and consist of layers of connected nodes or neurons that process information. They are particularly effective at finding complex patterns in data. The training of Neural Networks relies on backpropagation and gradient descent. Backpropagation involves a forward pass, where input data is passed through the network layer by layer, and a backward pass, where the error is sent back through the network to understand how each part contributed to the overall error.


Article content
Graphical representation and comparison of the two algorithms I used

To prepare the data, I designed a robust data preparation pipeline. I started by cleaning the dataset, handling missing values and duplicates. I then created new features such as 'Total_Bill_Amount' and 'Total_Payment_Amount' to capture overall financial behaviour. I also used feature engineering to reveal complex patterns that are not visible in individual features. I then split the data into training and testing sets to evaluate the model's performance on unseen data and applied feature scaling using StandardScaler to ensure all features contributed equally to the model.

I implemented the Random Forest model using scikit-learn's RandomForestClassifier. I optimised the model by setting the number of estimators to 100, the maximum depth to 10 levels, and the minimum samples to split an internal node to 5. I also chose the square root of the total features for each split and set the random state to 82 (as it's my birth year) to control and reproduce the randomness in the machine learning process.

The Random Forest model achieved an accuracy of 88.05% and an AUC of 0.8015, indicating its effectiveness in distinguishing between defaulting and non-defaulting clients. The feature importance diagram revealed that payment history (PAY_0) and credit limit (LIMIT_BAL) were the most significant factors in predicting defaults. The confusion matrix showed a balanced performance between predicting defaults and non-defaults.


Article content
Feature Importance and Confusion Matrix

Although it wasn't part of the assignment, I also developed two additional Neural Network models using scikit-learn and Keras to compare their results with the Random Forest model. The Neural Network model (scikit-learn) achieved the highest accuracy at 89.21%, while the Random Forest and Neural Network (Keras) models followed closely with accuracies of 88.05% and 88.63%, respectively. All models had AUC scores above 0.78, indicating good performance.


Article content
Numerical comparison of the 3 models


Article content
ROC curve to compare the accuracy of the 3 models

My conclusion is that both Random Forest and Neural Network models have their strengths and are effective for predicting credit card defaults. Random Forest models are easier to train and understand, making them ideal if we want to know how the model makes predictions. They also provide feature importance, which is useful for understanding key factors.

Neural Networks, on the other hand, can potentially achieve higher accuracy but require more computational resources. Their reasoning process is more complex and is often described as a 'black box,' making it harder to understand how they arrive at their predictions without additional interpretability techniques.

The choice between them should therefore depend on our specific requirements and constraints.

I hope you enjoyed this article/presentation recap and remember that you can now support my AI projects here!

Stay tuned for more updates as I continue to explore and share my AI implementations in the coming weeks!

To view or add a comment, sign in

More articles by Stefano Parmesani

Explore content categories