Understanding the square function for Variance & Deviation in Data Science

Understanding the square function for Variance & Deviation in Data Science

Since we all know Data Science is all about data and data is all about variety. So how can we standardised the format or any pattern to understand the variety of data ?

The simple answer is to set a benchmark value in data from we can map the differential nature of other values in data.

Normally we set this point as mean of data and simple computation can be given by calculating the sum or the average deviation of all points from the mean point. In context of deviation, whenever we talk about the measure of centrality so the deviation always get cancel out from the both side of the mean but in the spread when we are considering the variance so we need to check the whole spread in terms of how far even the negative values can spread as both the negative and positive values contribute together.

The main reason of cancellation of the spread is their sign-encounter as in (the negative vs the positive) but if we take the absolute value for deviation, so it won’t get cancel out and we can compute the whole length of data spread.

But we do use the square function.....!

No alt text provided for this image

 The very important question that why we use the square function ?

Of course the square function is better than the absolute function and what does that mean ?

In both the plot below we observe that square function is smooth because square function is differentiable and absolute function is not but why do we even care about differentiation ?


No alt text provided for this image
No alt text provided for this image

The Reason is…

Calculus and differentiation plays a very significant role in Data Science and applications like Machine Learning, Deep Learning for optimising the values which is true for the square function but not for the absolute function.

And the second most important reason is that in deviation or in the spread we all want to suppress the low deviated value and magnify the high deviated value for the spread and the square function does the same for an example if minor values like 0 to 1 like 0.9 will suppress to even small value 0.81 in square function and higher value like 7 will magnify in spread by 49 which will ultimately show the contribution of outliers in spread.

So that's how we get the significance the for the square in variance with respect to Machine Learning and Deep Learning.

To view or add a comment, sign in

More articles by Sachin Khode

  • How cool is OpenCV ?

    "Why do AI enthusiasts fall in love with OpenCV" let's see..

Others also viewed

Explore content categories