Floating point Issue faced practically !!!

Jithendrian Sundaravaradan

Published Sep 22, 2023

I wanted to brush up few Machine learning concepts, one of the basic concepts I was keen to practice is linear regression in depth. Typical project is Bike Sharing demand project, in that we have 730 rows of data. I don't want to get into details about this project much. Please refer in Kaggle for further information.

In this 730 size, I wanted to split into train_size as 0.7 and test_size as 0.3 i.e 70:30 ratio

Expected Train Size = 730 X 0.7 = 511 and Test Size 703 X 0.3 = 219, but to my surprise the train size has come out as 510 and test size as 219, adds together 729 against 730

df_train, df_test = train_test_split(bike, train_size = 0.7, test_size=0.3, random_state = 100)

If I give either train size or test size, it will calculate accordingly and subtract the other value with the total rows

so first I gave only (test_size = 0.3)

df_train, df_test = train_test_split(bike, test_size = 0.3, random_state = 100)

It calculated test size as 219 and train size as 511, Thats great, this is what I expected. It first calculated test sample size as 219 and subtracted 219 from 730 for train same size hence 511

But I tried only with train size to check (train_size = 0.7)

df_train, df_test = train_test_split(bike, train_size = 0.7, random_state = 100)

To my surprise it calculated train size as 510 and test size as 220, in this case it first calculated train sample size as 510 and subtracted 510 from 730 hence 220 as test size

From Machine learning perspective, training 511 samples or 510 samples should not make difference, but as a software engineer perspective, wanted to dig deep down why is this discrepancy and where I am loosing that 1 value if I give parameters train_size 0.7

if test_size_type == "f":
    n_test = ceil(test_size * n_samples)
elif test_size_type == "i":
    n_test = float(test_size)

if train_size_type == "f":
    n_train = floor(train_size * n_samples)
elif train_size_type == "i":
    n_train = float(train_size)

if train_size is None:
    n_train = n_samples - n_test
elif test_size is None:
    n_test = n_samples - n_train

This is the code in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/model_selection/_split.py

When I debugged I saw the library flooring the value (train_size * n_samples). Flooring of (511.0) should be 511, but I got 510, then I dug further to understand, then I figured out the below issue,

In python if we multiply 0.7 * 730 instead of giving straight 511.0 it gives 510.99999999999994 and flooring that as in the library makes the value to 510 !!!!

This is the floating point error in Python, I verified in having coded in multiple languages, Java and C. (Of course different languages handle floating points in different ways)

See below outputs from various languages

Java Code

class Calculate {
    public static void main(String[] args) {
        float x = 0.7f;
        int y = 730;
        float z = x * y;
        System.out.println(z);
    }
}

C Code

Recommended by LinkedIn

Why & how do we split our data set before model…

Amit Pandey 5 years ago

The Singleton: The Alternatives Monostate Pattern and…

Rainer Grimm 3 years ago

Where is Wally ?

Alex Diaz Santos 7 years ago

#include <stdio.h>

int main(){

	int x = 730;
	float y = 0.7;
	printf("%f\n", x*y);
	return 0;
}

Golang Code

package main

func main() {
	var a int
	var b float64
	a = 730
	b = 0.7
	result := float64(a) * b
	println(result)
}

Python code

x = 730
y = 0.7
print(x * y)

The result from all the languages,

This one clarified my doubts, Python is producing 510.99999999999994 instead of 511 and the library floors this value hence we get 510.

How to resolve this in python,

One of the ways is to use Decimal module

x = 730
y = 0.7
from decimal import Decimal
print(format(Decimal.from_float((x * y)), '.5'))

output

jithrock@tech:~/fp_issues$ python3 Calculate.py 
511.00
jithrock@tech:~/fp_issues$

If you have come across these kinds of issues, please share !! and also please suggest how to solve in better way, since python is used predominantly in scientific and data science world, these areas are full of numbers crunching how python handles these issues ?

In next article I will explore on how floating point issues are solved in various languages and share my observation

To view or add a comment, sign in

See all

Floating point Issue faced practically !!!

Jithendrian Sundaravaradan

Recommended by LinkedIn

More articles by this author

Others also viewed

To D or to 2D, that is the question

After 900 leetcode problems here is what I learned

Missing Number

Time-Series-Analysis-with-Statsmodels - Chapter 3

Hand written image prediction by R using data from MNIST database.

Tools & Tool Calling in LangChain 🔧

The Data Pulse #15 - The Design Assumption: What 2,912 Data Professionals Got Wrong About Slicing, Similarity, and Sorting

Big O Notation Explained As Simple As Possible

Rebalancing with the Conservative Formula

BACK PROPAGATION ALGORITHM FROM SCRATCH IN DEEPLEARNING-

Explore content categories

Recommended by LinkedIn

AI 2026: Setting the Bar for Simple, Easy, and Impactful AI

Dec 24, 2025

Breaking Down AI/ML: Where to Start {On, In, For}

Jan 24, 2025

Radar Chart - the underrated chart

Dec 2, 2023

Switch case in Python !!!

Jun 26, 2023

Writing word document via python script

Jun 8, 2023

Merging PDF files

Jun 2, 2023

Old tax regime - when to choose ? !!

Feb 2, 2023

Others also viewed

To D or to 2D, that is the question

After 900 leetcode problems here is what I learned

Missing Number

Time-Series-Analysis-with-Statsmodels - Chapter 3

Hand written image prediction by R using data from MNIST database.

Tools & Tool Calling in LangChain 🔧

The Data Pulse #15 - The Design Assumption: What 2,912 Data Professionals Got Wrong About Slicing, Similarity, and Sorting

Big O Notation Explained As Simple As Possible

Rebalancing with the Conservative Formula

BACK PROPAGATION ALGORITHM FROM SCRATCH IN DEEPLEARNING-

Similar topics

How to Optimize Machine Learning Performance

Tips for Machine Learning Success

Explore content categories