Statistics for Datascience

Vandit Patel

Published Jan 30, 2021

+ Follow

What is Statistics?

Statistics is interpreting data in order to make predictions for the population.

Branches of Statistics:

There are two branches of Statistics.

1) DESCRIPTIVE STATISTICS :

Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire or a sample of a population. Descriptive statistics are broken down into measures of central tendency and measures of variability.

2) INFERENTIAL STATISTICS

While descriptive statistics summarize the characteristics of a data set, inferential statistics help you come to conclusions and make predictions based on your data.

When you have collected data from a sample, you can use inferential statistics to understand the larger population from which the sample is taken.

Commonly Used Measures

1) Measures of Central Tendency

2) Measures of Variability

Measures of Central Tendency

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location.
Mean : Mean is defined as the ratio of the sum of all the observations in the data to the total number of observations. This is also known as Average. Thus mean is a number around which the entire data set is spread.
Median : Median is the point which divides the entire data into two equal halves. One-half of the data is less than the median, and the other half is greater than the same. Median is calculated by first arranging the data in either ascending or descending order.
Mode : Mode is the number which has the maximum frequency in the entire data set, or in other words,mode is the number that appears the maximum number of times. A data can have one or more than one mode.

Measures of Variability

Measures of Dispersion describes the spread of the data around the central value (or the Measures of Central Tendency)

1) Absolute Deviation from Mean :- The Absolute Deviation from Mean, also called Mean Absolute Deviation (MAD), describe the variation in the data set, in sense that it tells the average absolute distance of each data point in the set. It is calculated as

2) Variance :- Variance measures how far are data points spread out from the mean. A high variance indicates that data points are spread widely and a small variance indicates that the data points are closer to the mean of the data set. It is calculated as

3) Standard Deviation :-The square root of Variance is called the Standard Deviation. It is calculated as

4) Range :- Range is the difference between the Maximum value and the Minimum value in the data set. It is given as

5) Quartiles :- Quartiles are the points in the data set that divides the data set into four equal parts. Q1, Q2 and Q3 are the first, second and third quartile of the data set.

6) Skewness :- The measure of asymmetry in a probability distribution is defined by Skewness. It can either be positive, negative or undefined.

Positive Skew :- This is the case when the tail on the right side of the curve is bigger than that on the left side. For these distributions, mean is greater than the mode.
Negative Skew :- This is the case when the tail on the left side of the curve is bigger than that on the right side. For these distributions, mean is smaller than the mode.

7) Kurtosis :- Kurtosis describes the whether the data is light tailed (lack of outliers) or heavy tailed (outliers present) when compared to a Normal distribution. There are three kinds of Kurtosis:

Mesokurtic :- This is the case when the kurtosis is zero, similar to the normal distributions.
Leptokurtic :- This is when the tail of the distribution is heavy (outlier present) and kurtosis is higher than that of the normal distribution.
Platykurtic :- This is when the tail of the distribution is light( no outlier) and kurtosis is lesser than that of the normal distribution

Statistics for Datascience

Vandit Patel

What is Statistics?

Branches of Statistics:

Commonly Used Measures

Measures of Central Tendency

Measures of Variability

Others also viewed

what a p-value really means?

Basics of Statistics

Jharkhand CoVID-19 analysis

🚀Variance & Co-Variance key aspects

Confidence intervals

Visualizing UN Population Data

The Science of Numbers: A Closer Look at Quantitative Survey Research

Giving Data Its Voice: My Small Role in a Bigger System

Statistics for Data Scientists in 5 mins

Statistics for Market Research, A Brief Refresher by Brian P Neill

Explore content categories