Excited to share the final evolution of my Top IMDb Movies project: from a data analysis deep dive to a deployed machine learning application. After the initial exploratory analysis, I built a predictive model to answer a more nuanced question: "What attributes truly drive a movie's rating?" The process of building and deploying this model as a live Streamlit app was a challenging and incredibly insightful journey. My biggest takeaways weren't just about code, but about the practical realities of data science: 🔹 The Model's Story: Predicting a subjective outcome like a movie rating is inherently complex. The final XGBoost model achieved a 25% R-squared, which is a respectable result for a social science problem. More importantly, the low error metrics (like a MAPE of ~2%) prove the model's practical accuracy. This taught me that the context of a problem is just as important as the final score. 🔹 The Value of Debugging: I identified and corrected two subtle but critical forms of data leakage in my preprocessing pipeline. This experience was the most valuable lesson of the project, reinforcing the importance of a methodologically sound process. 🔹 Feature Engineering is the Real MVP: The most significant performance gains came from thoughtful feature engineering and selection, not from simply using a complex algorithm. Discovering that a simpler model with better features could outperform a complex one was a key insight. This project has been a journey from a static CSV file to a functional, interactive application. I would be thrilled for you to try it out and share any feedback. 🚀 Live App Link: https://lnkd.in/gzCY7TJq 📖 Full Project & Code on GitHub: https://lnkd.in/gBKXtVtr #DataScience #MachineLearning #DataAnalysis #Python #Streamlit #PortfolioProject #XGBoost #ScikitLearn #FeatureEngineering

Great Work Soumyadeep Saha 💯

To view or add a comment, sign in

Explore content categories