The Art of Testing: Ensuring Reliability in Data Science & Machine Learning

Thomas Bierhance

Published Jul 12, 2023

Testing plays a crucial role in any software project. It is the key to ensuring the reliability, accuracy, and performance of our solutions. Data science and machine learning projects should not be different in that regard. Nevertheless, in practice I often see data science projects that struggle to set up decent test automation.

Some of possible reasons that I have observed:

Lack of Proper Knowledge: Data scientists, while skilled in various aspects of data analysis and modeling, may not always possess a comprehensive understanding of software testing methodologies, frameworks, or techniques.
Usage of Difficult to test tools: Popular tools like Jupyter Notebooks or Point-and-Click services by Azure, AWS or GCP were not inherently designed with testing in mind.
Difficulty in Defining Testable Units: In traditional software development, code units are often well-defined, and tests can be written to validate specific functions or modules. In data science and machine learning, the units may not be as clearly defined if the initial code was derived from Jupyter Notebooks, for example.
Focus on Exploratory Analysis: Data science projects often involve exploratory analysis, where the primary goal is to gain insights and uncover patterns in the data. This exploratory code is not a good fit for automated testing processes.
Emphasis on Validation through Metrics: Data science projects typically involve evaluating model performance using metrics such as accuracy or RMSE. The focus is often on validating the model's output against these metrics rather than implementing extensive unit or integration tests.

There already exists a wealth of knowledge and resources regarding testing principles and best practices in software development that is applicable to data projects as well. Rather than reiterating this existing knowledge, I will focus on providing examples of how we approach testing data science and machine learning solutions at BettercallPaul.

Fast, Narrow & Sociable

I am a fan of fast, narrow and sociable tests: Tests should run quickly, allowing for rapid feedback. Instead of creating broad and all-encompassing tests, narrow tests verify specific behaviors or functions. At the same time, they should be sociable. This means that when possible, they should avoid mocks to ensure realistic integration and interaction between components in the testing process.

Crucial Code: Unit Tests using PyTest

Any code that is crucial for the project and will be run regularly should be tested. We prefer normal python modules for this kind of code instead of Jupyter notebooks. The PyTest library works great for us. We normally test all relevant code: feature creation, model training, validation and serving. As we aim for fast and reliable tests, these are normally done on small, synthetic data sets.

Notebooks: Ensure they run without errors

Normally we do not "test" our notebooks. However, we check that they still execute without errors using nbmake. Otherwise, they would normally stop working after a few months because code or libraries have evolved in the meantime. To ensure fast tests, we normally have a feature switch on the top of the notebook (e.g. `ACTIVATE_FAST_TEST=True`). When activated (the default), data will be massively filtered to allow for fast execution.

Recommended by LinkedIn

Data Science vs. Software Engineering: Key Differences

Pratibha Kumari J. 2 years ago

Through the needle's eye: Data science in production

Bartosz Telenczuk 7 years ago

Mastering API Data Retrieval: A Comprehensive Guide…

Bhavishya Kolloori 1 year ago

Automate Testing using CI/CD pipelines

An automated CI/CD pipeline (e.g. using GitLab CI/CD or GitHub Actions) ensures that all tests run whenever someone changes the code. It is important to fix a failing pipeline as soon as possible. Otherwise, the problems will only get bigger with each commit.

Measure Test Coverage

Fixating on test coverage as a KPI can result in really bad tests. But when used properly, it can give good hints about parts of code that might still need some testing. We like to use PyTest-cov.

Fight the nemesis: Data Leakage

A simple test case can save you from this enemy. The concrete implementation depends on the individual case. For a forecasting model, for example, one can randomly modify the raw data starting at a point in time and check whether features before this point in time remain unaffected by the modification.

Separate Testing Code and Testing Data

Tools like Pandera or Great Expectations can be used inside pipelines to check your data. However, I strongly advise to separate these pipelines that might load data into feature stores or train models from pipelines that test your code. Loading and processing large amounts of data will inevitably take a lot of time and should be separate from the fast tests of your code.

Embrace proper testing, but start simple!

This article provided a glimpse into some good testing practices in data science projects. It's important to note that there are other aspects, such as test-driven development or non-functional tests for runtime/model performance, privacy, and biases, which were not covered here. However, starting with basic automation and embracing proper testing practices is a crucial step towards building reliable and robust data science projects. As the field continues to evolve, it is my hope to see more and more data science projects incorporating automated tests, ensuring the delivery of high-quality and trustworthy solutions in the future.

Michael Dirksmöller 2y

Ich schätze die Kreativität und das Engagement, die in deine Arbeit einfließen. 💡👏

1 Reaction

To view or add a comment, sign in

The Art of Testing: Ensuring Reliability in Data Science & Machine Learning

Thomas Bierhance

Fast, Narrow & Sociable

Crucial Code: Unit Tests using PyTest

Notebooks: Ensure they run without errors

Recommended by LinkedIn

Automate Testing using CI/CD pipelines

Measure Test Coverage

Fight the nemesis: Data Leakage

Separate Testing Code and Testing Data

Embrace proper testing, but start simple!

More articles by Thomas Bierhance

Others also viewed

Tools & Techniques in Data Science

Data science from a software perspective

Using ML K-means and Power BI integration

Leveraging AI to Match up Business Associates (Vendors/Contractors)

🐼PandasAI: The future of data science analysis🐼

How to hug a data scientist

Vibe Coding for Data Engineers: Speed Without Breaking Meaning

Data Cleaning: The Key to Accurate and Reliable Data Analysis

Data Science: A new Learning

PySpark Feature Engineering and High Dimensional Data Visualization with Spark SQL in an Hour

Importance of Early Testing in Data Integration

How to Build Reliable Test Scripts

Why LLM Code Needs More Than Unit Tests

Streamlined Regression Testing Approaches

How to Write Maintainable and Readable Tests

Explore content categories

Fast, Narrow & Sociable

Crucial Code: Unit Tests using PyTest

Notebooks: Ensure they run without errors

Recommended by LinkedIn

Automate Testing using CI/CD pipelines

Measure Test Coverage

Fight the nemesis: Data Leakage

Separate Testing Code and Testing Data

Embrace proper testing, but start simple!

More articles by Thomas Bierhance

Is Your Smartphone Smarter Than You? Unpacking the Myth of 'Real' AI

SWAT: A Secret Weapon to Fight Technical Debt

Others also viewed

Tools & Techniques in Data Science

Data science from a software perspective

Using ML K-means and Power BI integration

Leveraging AI to Match up Business Associates (Vendors/Contractors)

🐼PandasAI: The future of data science analysis🐼

How to hug a data scientist

Vibe Coding for Data Engineers: Speed Without Breaking Meaning

Data Cleaning: The Key to Accurate and Reliable Data Analysis

Data Science: A new Learning

PySpark Feature Engineering and High Dimensional Data Visualization with Spark SQL in an Hour

Similar topics

Importance of Early Testing in Data Integration

How to Build Reliable Test Scripts

Why LLM Code Needs More Than Unit Tests

Streamlined Regression Testing Approaches

How to Write Maintainable and Readable Tests

Explore content categories