The Process of Building an Explainable Fraud Detection System Using ML and Streamlit

The Process of Building an Explainable Fraud Detection System Using ML and Streamlit

Following my previous project 'Building a Fraud Detection System Using XGBoost', I wanted to go further and build a dashboard on Streamlit. This project uses synthetic data that mimics the risk of fraudulent refund abuse. To simulate tackling this issue, I built an end-to-end Machine Learning dashboard for fraudulent user detection, using a synthetic dataset from Kaggle.

This project included data preprocessing, feature engineering, model training, explainability through SHAP, and deployment using Streamlit. It is important to reflect critically on the process, especially on the dataset's limitations, and highlight the project's true strengths.

Shortcomings and Challenges

1. Synthetic Nature of the Dataset

The dataset used was artificially generated and not from real-world customer behaviours which introduced several limitations:

Predictability: The data patterns were simpler and cleaner than what real-world data would present.

Perfect Accuracy: The model achieved an accuracy of 1.0, which is not realistic in production environments where user behaviour is noisy and evolving.

Lack of Outliers: Real fraud often involves rare, outlier behavior, which synthetic datasets struggle to replicate.

2. Data Quality Issues

Before modelling, significant inconsistencies were identified:

Return Dates Earlier than Order Dates: Logically invalid records, which required removal.

Loss of Balance: Removing invalid entries slightly skewed the balanced nature of the original dataset, although not dramatically.

3. Simplified Feature Relationships

Although feature engineering was done, the relationships between features and fraud were more linear than expected in reality:

  • In production, fraud indicators often interact in non-linear and subtle ways.
  • Additional features like multi-order behaviour would be necessary to fully capture fraud patterns.

4. Streamlit Deployment Constraints

While Streamlit enabled rapid deployment, it also posed some challenges:

  • Initial difficulty handling SHAP visualisations due to the changes in
  • Environment-specific module errors (e.g., missing 'shap' or 'pickle' issues).
  • Limited options for scaling to millions of records without a backend database.


Strengths and Achievements

Despite these limitations, the project has several important strengths worth celebrating:

1. End-to-End Pipeline

Built a full system from raw data ingestion ➔ cleaning ➔ feature generation ➔ model prediction ➔ visualisation.

Automated the fraud detection pipeline.

2. Feature Engineering from Domain Intuition

Created meaningful features like Days_to_Return_Corrected, Suspicious_Score, and High_Returner_Flag based on logical business behaviour.

3. Model Explainability Integrated

  • Implemented SHAP values to interpret how each feature impacted fraud predictions.
  • Created a bar chart visualisation for easy business understanding.

4. Professional Dashboard Experience

Designed a clean, tabbed Streamlit app allowing easy exploration of KPIs, fraud scores, and detailed user explainability.

5. Problem Solving and Adaptability

  • Diagnosed and fixed data quality issues effectively.
  • Handled environment errors during deployment.
  • Shifted modelling expectations appropriately when facing perfect accuracy artifacts.

Going forward, applying similar methodologies to real, messy data would introduce new challenges such as handling concept drift, building feedback loops, threshold tuning, and minimising false positives. Embracing these complexities will be key to building robust, production-ready fraud detection systems.


To view or add a comment, sign in

More articles by Nameera Nilofer K.

  • SQL on MacOS

    This is a brief guide I wanted to write as someone who was trying to find tutorials to host a SQL Server on Mac…

  • Insights from Islamic-Inspired Data and Modern Voices

    When we walk about our relationships in 2025, especially in Islamic or spiritually conscious communities, a question…

  • Building a Fraud Detection Model with XGBoost

    Fraud detection is one of the most pressing challenges in the financial industry and one of the most go-to problems for…

    3 Comments
  • A Beginner’s Journey Exploring Football Data Analysis

    Football is more than just a game; it is a beautiful game and a data-rich field. As someone diving into the world of…

Others also viewed

Explore content categories