Understanding the square function for Variance & Deviation in Data Science

Sachin Khode

Published Jun 23, 2020

Since we all know Data Science is all about data and data is all about variety. So how can we standardised the format or any pattern to understand the variety of data ?

The simple answer is to set a benchmark value in data from we can map the differential nature of other values in data.

Normally we set this point as mean of data and simple computation can be given by calculating the sum or the average deviation of all points from the mean point. In context of deviation, whenever we talk about the measure of centrality so the deviation always get cancel out from the both side of the mean but in the spread when we are considering the variance so we need to check the whole spread in terms of how far even the negative values can spread as both the negative and positive values contribute together.

The main reason of cancellation of the spread is their sign-encounter as in (the negative vs the positive) but if we take the absolute value for deviation, so it won’t get cancel out and we can compute the whole length of data spread.

But we do use the square function.....!

The very important question that why we use the square function ?

Of course the square function is better than the absolute function and what does that mean ?

In both the plot below we observe that square function is smooth because square function is differentiable and absolute function is not but why do we even care about differentiation ?

The Reason is…

Calculus and differentiation plays a very significant role in Data Science and applications like Machine Learning, Deep Learning for optimising the values which is true for the square function but not for the absolute function.

And the second most important reason is that in deviation or in the spread we all want to suppress the low deviated value and magnify the high deviated value for the spread and the square function does the same for an example if minor values like 0 to 1 like 0.9 will suppress to even small value 0.81 in square function and higher value like 7 will magnify in spread by 49 which will ultimately show the contribution of outliers in spread.

So that's how we get the significance the for the square in variance with respect to Machine Learning and Deep Learning.

Understanding the square function for Variance & Deviation in Data Science

Sachin Khode

More articles by Sachin Khode

Others also viewed

Data Science: The Art of Communication

Data Pre-Processing for Real Estate House Price Prediction: A Comprehensive Guide

Week 16 of Data Science: Decision Tree and Support Vector Machine

Balancing Academia and Industry: The Quiet Power of Consistency in Data Science

Understanding Big O Notation

Do you know what you are predicting?

Data Science Office Hour - Choosing a Machine Learning Algorithm

How do we decide what Data Science is?

How to get started with data science?

Explore content categories