Why I wrote 50 lines of code when import sklearn takes three. I’ve been building Linear Regression from scratch to really get an intuition of how these algorithms work. Here is the realization that hit me: The Learning Rate is not absolute; it is relative to the scale of your data. Because I was using raw salary data ($100k+), the gradients were massive. Multiplying a massive gradient by 0.01 still resulted in a huge step size, causing the algorithm to overshoot the minima entirely. This is exactly why libraries like Scikit-Learn emphasize preprocessing pipelines. Without normalizing or standardizing your features, Gradient Descent is fighting the scale of your own data. Abstraction is great for productivity, but implementation is essential for intuition. #MachineLearning #Algorithms #DataScience #ArtificialIntelligence #Python #ScikitLearn
Interesting I would appreciate you to tell me how you started learning the algorithms from scratch, I mean there are so many resources and I want to learn but I get really confused
On top of that you will know what to tweak when your model is not performing well. Once learned how the algorithm works behind the scenes, the wisdom that you gain, will guide you through further.
Interesting 🤔 I mean if you do not normalise it, the step would be big Coz step = -learning_rate times gradient But shouldn't that balance out given the path would be non normalized 🤔
Abstractions are not always helpful especially when it comes to understanding low-level details!
To get better understanding of regression, learn the Gaussian process and Cholesky decomposition.
interesting
Does the circles represents confidence intervals/credible regions, in any sense? I often come across similar looking contour plots generated for MCMC and that represents Credible regions.