Day 4 - Statistics in Machine Learning

Day 4 - Statistics in Machine Learning

What are Z-Scores?

A Z-Score measures exactly how many standard deviations above or below the mean a data point is.

Here's the formula for calculating a Z-Score:

No alt text provided for this image

Here's the same formula written with symbols:

No alt text provided for this image

Here are some important facts about Z-Scores:

  • A positive z-score says the data point is above average.
  • A negative z-score says the data point is below average.
  • A z-score close to 0 says the data point is close to average.
  • A data point can be considered unusual if its z-score is above 3 or below -3.

What is a Normal Distribution?

Early statisticians noticed the same shape coming up over and over again in different distributions—so they named it the normal distribution.

No alt text provided for this image

Normal Distributions have the following features:

  • Symmetric Bell Shape.
  • Mean and Median are equal.
  • Both located at the center of the distribution.
  • ≈68% of the data falls within 1 standard deviation of the mean.
  • ≈95% of the data falls within 2 standard deviations of the mean.
  • ≈99.7% of the data falls within 3 standard deviations of the mean.

What is a Scatterplot?

A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the dataset gets plotted as a point whose (x,y) coordinates relates to its values for the two variables.

No alt text provided for this image

What are outliers in scatter plots?

Scatter plots often have a pattern. We call a data point an outlier if it doesn't fit the pattern.

No alt text provided for this image

Consider the scatter plot above, which shows data for students on a backpacking trip. (Each point represents a student.)

Notice how two of the points don't fit the pattern very well.

  • These points have been labeled Brad and Sharon, which are the names of the students they represent.
  • Sharon could be considered an outlier because she is carrying a much heavier backpack than the pattern predicts.
  • Brad could be considered an outlier because he is carrying a much lighter backpack than the pattern predicts.

Key idea: There is no special rule that tells us whether or not a point is an outlier in a scatter plot. When doing more advanced statistics, it may become helpful to invent a precise definition of "outlier", but we don't need that yet.

What are clusters in Scatter Plots?

Sometimes the data points in a scatter plot form distinct groups. These groups are called clusters.

No alt text provided for this image

Consider the scatter plot above, which shows nutritional information for 16 brands of hot dogs in 1986 (Each point represents a brand).

  • The points form two clusters, one on the left and another on the right.
  • The left cluster is of brands that tend to be low in calories and low in sodium.
  • The right cluster is of brands that tend to be high in calories and high in sodium.

Describing scatterplots (form, direction, strength, outliers)

When we look at Scatter Plot, we should be able to describe the association we see between the variables.

A quick description of the association in a scatterplot should always include a description of the Form, Direction, and Strength of the association, along with the presence of any Outlier.

  • Form: Is the association linear or nonlinear?
  • Direction: Is the association positive or negative?
  • Strength: Does the association appear to be strong, moderately strong, or weak?
  • Outliers: Do there appear to be any data points that are unusually far away from the general pattern?

It's also important to include the context of the two variables in the description of these features.

What is correlation?

We often see patterns or relationships in scatterplots.

When the y variable tends to increase as the x variable increases, we say there is a positive correlation between the variables.

No alt text provided for this image

When the y variable tends to decrease as the x variable increases, we say there is a negative correlation between the variables.

No alt text provided for this image

When there is no clear relationship between the two variables, we say there is no correlation between the two variables.

No alt text provided for this image


To view or add a comment, sign in

More articles by Mrityunjay Pathak

  • Bias and Variance and Its Trade Off

    There are various ways to evaluate a machine-learning model. Bias and Variance are one such way to help us in parameter…

  • Machine Learning Mathematics🔣

    Machine Learning is the field of study that gives computers the capability to learn without being explicitly…

  • How to Modify your GitHub Profile Readme File as your Portfolio

    What if you don't have a personal portfolio website? No worries! You can transform your GitHub README.md into a…

    4 Comments
  • Data Science Resources

    Are you starting your journey into the world of Data Science? Here's a curated list of top resources to master various…

  • 25 Python Sets Questions with Solution

    25 Python Sets Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Write a Python…

    1 Comment
  • 25 Python Tuple Questions with Solution

    25 Python Tuple Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Find the length of a…

  • 25 Python Dictionary Questions and Solutions

    25 Python Dictionary Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Create an empty…

  • 25 Python List Questions with Solution

    25 Python List Coding Questions along with Explanations for each. Let's get started ↓ Question: Given a list nums, find…

    3 Comments
  • 25 Python String Questions with Solution

    25 Python Strings Coding Questions along with Explanations for each. Let's get started ↓ Write a Python program to…

    3 Comments
  • 25 Python Loop Coding Questions

    25 Python Loop Coding Questions along with Explanations for each. Let's get started ↓ Print numbers from 1 to 10 using…

    9 Comments

Others also viewed

Explore content categories