Day 4 - Statistics in Machine Learning
What are Z-Scores?
A Z-Score measures exactly how many standard deviations above or below the mean a data point is.
Here's the formula for calculating a Z-Score:
Here's the same formula written with symbols:
Here are some important facts about Z-Scores:
What is a Normal Distribution?
Early statisticians noticed the same shape coming up over and over again in different distributions—so they named it the normal distribution.
Normal Distributions have the following features:
What is a Scatterplot?
A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the dataset gets plotted as a point whose (x,y) coordinates relates to its values for the two variables.
What are outliers in scatter plots?
Scatter plots often have a pattern. We call a data point an outlier if it doesn't fit the pattern.
Consider the scatter plot above, which shows data for students on a backpacking trip. (Each point represents a student.)
Notice how two of the points don't fit the pattern very well.
Recommended by LinkedIn
Key idea: There is no special rule that tells us whether or not a point is an outlier in a scatter plot. When doing more advanced statistics, it may become helpful to invent a precise definition of "outlier", but we don't need that yet.
What are clusters in Scatter Plots?
Sometimes the data points in a scatter plot form distinct groups. These groups are called clusters.
Consider the scatter plot above, which shows nutritional information for 16 brands of hot dogs in 1986 (Each point represents a brand).
Describing scatterplots (form, direction, strength, outliers)
When we look at Scatter Plot, we should be able to describe the association we see between the variables.
A quick description of the association in a scatterplot should always include a description of the Form, Direction, and Strength of the association, along with the presence of any Outlier.
It's also important to include the context of the two variables in the description of these features.
What is correlation?
We often see patterns or relationships in scatterplots.
When the y variable tends to increase as the x variable increases, we say there is a positive correlation between the variables.
When the y variable tends to decrease as the x variable increases, we say there is a negative correlation between the variables.
When there is no clear relationship between the two variables, we say there is no correlation between the two variables.