The Math behind Machine Learning & Data Science

Muhammad Yahiya

Published Sep 30, 2020

https://www.youtube.com/watch?v=ciM6wigZK0w

Hyperplane

Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes. In simple terms, it is the ability of your machine learning model to correctly differentiate /separate/classify between different groups of data.

“In n-dimensional space, a hyperplane is a subspace of the n-1 dimension. Meaning, if a data has 2-dimensional space, then the hyperplane can be a straight line dividing the data space into halves.”

The dimension of the hyperplane is directly proportional to the number of features. For a simple linear regression model if your data is based on a single feature then your plane/ decision boundary would look like left side image. If you have more than 1 feature then the plane would be called a hyperplane as its data points now reside in 3D vectors. As features get increased so does the numbers of dimensions for the ML model which is hard to picture for let's say 3,4,8,24 or 100 dimensions. Also, there are techniques to reduce dimensionality like PCA, Backward/Forward Feature Elimination, High Correlation / Low Variance Filters and etc since too many features are computationally expensive to model and make classifications on.

Euclidean (Relating to the geometry of plane figures ):

It is just a distance measure between a pair of samples p and q in n-dimensional feature space:

For example, picture it as a “straight, connecting” line in a 2D feature space:

The Euclidean is often the “default” distance used in e.g., K-nearest neighbors (classification) or K-means (clustering) to find the “k closest points” of a particular sample point. Another prominent example is hierarchical clustering, agglomerative clustering (complete and single linkage) where you want to find the distance between clusters.

Hilbert Space

It extends the methods of vector algebra and calculus from the two-dimensional Euclidean plane and three-dimensional space to spaces with any finite or infinite number of dimensions. A Hilbert space is an abstract vector space possessing the structure of an inner product that allows length and angle to be measured. Furthermore, Hilbert spaces are complete: there are enough limits in the space to allow the techniques of calculus to be used.

Vector Space

A vector space (also called a linear space) is a collection of objects called vectors, which may be added together and multiplied ("scaled") by numbers, called Scalars. Scalars are often taken to be real numbers, but there are also vector spaces with scalar multiplication by complex numbers, rational numbers, or generally any field. The operations of vector addition and scalar multiplication must satisfy certain requirements, called vector axioms.

Matrices

Matrices are commonly written in box brackets or parentheses:

matrix (plural matrices) is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns.

Statistics

Statistics is a subfield of mathematics. It refers to a collection of methods for working with data and using data to answer questions.

Descriptive Statistics

Descriptive statistics refer to methods for summarizing raw observations into information that we can understand and share. Commonly, we think of descriptive statistics as the calculation of statistical values on samples of data in order to summarize properties of the sample of data, such as the common expected value (e.g. the mean or median) and the spread of the data (e.g. the variance or standard deviation).

Descriptive statistics may also cover graphical methods that can be used to visualize samples of data. Charts and graphics can provide a useful qualitative understanding of both the shape or distribution of observations as well as how variables may relate to each other.

Inferential Statistics (Statistical inference)

Inferential statistics is a fancy name for methods that aid in quantifying properties of the domain or population from a smaller set of obtained observations called a sample. We think of inferential statistics as the estimation of quantities from the population distribution, such as the expected value or the amount of spread.

More sophisticated statistical inference tools can be used to quantify the likelihood of observing data samples given an assumption. These are often referred to as tools for statistical hypothesis testing, where the base assumption of a test is called the null hypothesis. There are many examples of inferential statistical methods given the range of hypotheses we may assume and the constraints we may impose on the data in order to increase the power or likelihood that the finding of the test is correct.

Statistical learning theory

Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics.

Statistics is a subfield of mathematics. It refers to a collection of methods for working with data and using data to answer questions.

Functional Analysis

Functional analysis is a branch of mathematical analysis that studies the trans-formations of functions and their algebraic and topological properties.

Statistical classification

In statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagnosis to a given patient based on observed characteristics of the patient (sex, blood pressure, presence or absence of certain symptoms, etc.). Classification is an example of pattern recognition.

Clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning and a common technique for statistical data analysis used in many fields.

Knowledge Discovery

Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criteria are that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of a schema based on the source data.

Data mining

Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an inter discip-linary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Deep learning is an emerging field it can apply in all the areas of science & Technology. Complete all the courses from deeplearning.ai.

Mo Rebaie 5y

Thanks Muhammad for sharing it this Insightful read💡

2 Reactions

See more comments

To view or add a comment, sign in

The Math behind Machine Learning & Data Science

Muhammad Yahiya

Hyperplane

Euclidean (Relating to the geometry of plane figures ):

Hilbert Space

Statistics

Descriptive Statistics

Inferential Statistics (Statistical inference)

Statistical learning theory

Statistical classification

Data mining

More articles by Muhammad Yahiya

Others also viewed

What to expect at JADS before summer

Mastering Principal Component Analysis (PCA): Application on the Iris Dataset

Learning ABCD with Machine Learning: Navigating the Alphabets of Data Science

Exploring Data Science Journey with Your Guide - Arnav Munshi

From Noise to Signal: The Statistical Foundations Behind Reliable Data Analysis

Understanding Gaussian Mixture Models (GMMs) - The Probabilistic Modelling

Power Transformations In Machine Learning

4th ECML/PKDD 2019 Workshop on Advanced Analytics and Learning on Temporal Data

Learning Weekend : Matrix Decomposition (Singular Value Decomposition) in Data Science

How Linear Algebra Powers Dimensionality Reduction in Data Science

Explore content categories

Hyperplane

Euclidean (Relating to the geometry of plane figures ):

Hilbert Space

Statistics

Descriptive Statistics

Inferential Statistics (Statistical inference)

Statistical learning theory

Statistical classification

Data mining

More articles by Muhammad Yahiya

Essentials of Front-end , Back-end & Devops in Software Engineering

How to Kickstart your Career in AI & Data Science.

Does gravitational force really exist?

Others also viewed

What to expect at JADS before summer

Mastering Principal Component Analysis (PCA): Application on the Iris Dataset

Learning ABCD with Machine Learning: Navigating the Alphabets of Data Science

Exploring Data Science Journey with Your Guide - Arnav Munshi

From Noise to Signal: The Statistical Foundations Behind Reliable Data Analysis

Understanding Gaussian Mixture Models (GMMs) - The Probabilistic Modelling

Power Transformations In Machine Learning

4th ECML/PKDD 2019 Workshop on Advanced Analytics and Learning on Temporal Data

Learning Weekend : Matrix Decomposition (Singular Value Decomposition) in Data Science

How Linear Algebra Powers Dimensionality Reduction in Data Science

Explore content categories