DUGGIRALA JNANA SATYA PRASAD’s Post

Built end-to-end ML project this week — a Customer Churn Predictor. Here's the mistake that cost me 487 Minutes ⏳ I used GridSearchCV with RandomForest on 440,000 rows. 2 values × 2 values × 1 value = just 4 combinations. But with cv=3, that's 12 full model fits on a massive dataset. Result? Still running after 8 hours. The fix? Switch to RandomizedSearchCV with n_iter=10. Same search space. 10 random combinations instead of exhaustive. Finished in under 5 minutes. The second bug: my XGBoost was giving 50% accuracy — basically random guessing. Root cause: I forgot scale_pos_weight on an imbalanced dataset (250k vs 190k class split). One parameter fix → accuracy jumped to 85%+. Lessons I'm taking forward: → Never use GridSearchCV on large datasets. RandomizedSearchCV first. → Always check class balance before touching any model. → Accuracy is a lying metric on imbalanced data. Use ROC-AUC and F1. Stack: Python · Scikit-learn · XGBoost · Pandas Building toward a full deployment with FastAPI + Streamlit. More updates coming. #MachineLearning #Python #XGBoost #DataScience #MLEngineer #BuildInPublic

  • text

To view or add a comment, sign in

Explore content categories