Jeorge Silva’s Post

10 models, 1 loop, and a lot of learning 🚀 One of the most fascinating takeaways from my Data Science journey so far is that there’s no such thing as a "silver bullet." An algorithm that shines in one scenario might fail miserably in another. Today, I decided to automate my benchmarking process. Instead of manually testing algorithms one by one, I built a Python workflow that pre-processes the data and evaluates 10 different models at once using Cross-Validation. 💡 Key learnings from this experiment: The power of Pipelines: It keeps the code clean and ensures pre-processing steps (like KNNImputer or MinMaxScaler) are locked to the model, preventing data leakage. Interpretation matters: Seeing a negative score for Lasso while Random Forest hit 0.92+ gave me immediate insight into the nature of my dataset (likely highly non-linear). Efficiency: Automating repetitive tasks frees up time for the actual analysis and tuning. Seeing that final list of scores print out brings a huge sense of satisfaction! On to the next steps. 📈 Question for the network: Do you usually test a wide range of models in the initial phase, or do you skip straight to the heavy hitters (like XGBoost/LightGBM)? 👇 #DataScience #MachineLearning #Python #ScikitLearn #Coding #LearningJourney

  • text

The pipeline is solod, however I'd suggest removing ridge and lasso as they are regressors by default being used in a classification setting, (they likely tried to regress on class labels 0/1)

You could use autogluon to this with "one line of code" :-)

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories