Math for a beginner in Data Science
(You can read the Russian version of this post on my Telegram channel here)
This Summer I got interested in Data Science, particularly in Machine Learning (ML). This sphere has a fairly high entry bar and the requirements for junior specialists are rising every year. To become a pro in ML you need to have a solid knowledge of Math, especially:
In this post, I collected Math resources that helped me to gain mathematical fundament. All of them are in English, but some of them have translations in Russian. (Note: I am not a professional Data Scientist, just a beginner who wanted to share some resources, which might be helpful for other beginners like me.)
1. Calculus
Single variable and Multivariable Calculus by MIT OCW are courses with video lectures, readings, assignments, and quizzes, which were taken by MIT freshmen back in 2006 and 2007. Suitable for those who start learning calculus from scratch. I really liked that the teachers are trying to convey the meaning of calculus and where it is applied, and not just listing the formulas on the blackboard. I did not watch the courses in full, because I had passed these topics in college, but still could find something new for myself.
2. Linear Algebra
The first resource is a YouTube playlist, where the author shows the meaning of linear algebra operations in space using beautiful visualizations. This playlist will be useful even for those who are already familiar with these topics since the videos will make you look at linear algebra from another point of view. The videos have subtitles and translations in Russian.
The second resource is again from MIT OCW. The lecturer is Gilbert Strang, the author of 'Introduction to Linear Algebra' (the main textbook of this class). He is a very cool lecturer, I advise you to first watch his video, then read the relevant chapters in his textbook, and solve the problems at the end of the chapters.
3. Probability
Recommended by LinkedIn
Introduction to Probability (MIT OCW version | edX version: archive and up-to-date page)
This is a Probability course that has been taught at MIT for 50 years. The first version is lectures and recitations in the form in which it was taught by John Tsitsiklis at MIT in 2010. The second version is its format in edX, topics are divided into several small videos, with analysis of problems and quizzes. For the edX, I left 2 links - to the archived version, where you can go through everything for free, and to the up-to-date one, where the course starts in January 2022 and will be taught by the course authors themselves.
'Should I take MIT OCW or edX version?' The content is the same, I suggest trying both options and choose the format which is most convenient for you.
4. Statistics
MIT OCW Statistics for Applications (everyone already guessed that I am a fan of MIT huh?)
A good continuation of the probability course. In this course, you will cover regressions, PCA, linear models, and more - all these topics you will meet in ML.
All video lectures are accompanied by the textbook of the course. You can also find problem sets for each class.
I hope this was a helpful review. If you have anything to add, write it in the comments, maybe it will be possible to collect a post with your recommendations.
thx a lot 🖖