🤖 Machine Learning Project 2 📚Book Recommendation! ✅ Step 1: Data Loading & Inspection First, I loaded the three separate datasets: Books.csv, Users.csv, and Ratings.csv. I immediately checked their shapes, finding missing data and checking for duplicates. ✅ Step 2: Model 1 - Popularity-Based Recommender My first goal was to create a general "Top 50" list, perfect for new users. I merged the ratings and books tables. I grouped them to find the total ratings and average rating for each book. To ensure statistical significance, I filtered for books with 200 or more ratings. Finally, I sorted this filtered list by average rating to get my Top 50. ✅ Step 3: Model 2 - Collaborative Filtering Recommender This is where the personalization comes in. My process was: A) Filtering: To build a robust model, I filtered the data to include only users who had rated 150+ books and only books that had 50+ ratings. This is critical for reducing noise. B) Creating the Pivot Table: I pivoted the filtered data to create a user-item matrix, with book titles as the index, user IDs as the columns, and the ratings as the values. C) Filling Null Value: This matrix was very sparse, so I filled all NaN values (where a user hadn't rated a book) with 0. D) Calculating Similarity: I used scikit-learn's Cosine Similarity on this matrix. This powerful function calculated the similarity "score" between every book based on who rated them. E) Building the recommend Function: Finally, I built a function that takes a book title, finds its vector in the similarity matrix, and returns the top 5 most similar books. This project was a great exercise in: 🔹 Following a clear data pipeline (load, clean, filter, model). 🔹 Building different models for different use cases (new users vs. known users). 🔹 Matrix manipulation as the core of collaborative filtering. Github Repo: https://lnkd.in/dMZc-QQJ #DataScience #MachineLearning #Python #RecommendationSystem #CollaborativeFiltering #Pandas #ScikitLearn #DataAnalysis #Projects
Building a Recommendation System with Machine Learning
More Relevant Posts
-
🎉 Day 82: Building an ML Automation Engine! 🎉 Today is a huge milestone: I'm building an End-to-End Machine Learning Project that automates the entire model selection process. This project connects all the dots from my learning journey and is already feeling like a game-changer! 🚀 Project Under Development: The Auto-Model Comparator 🧠 The goal is to create a robust pipeline that can ingest data, compare multiple algorithms, and automatically select the best model. Data Integration Mastery: The project starts by connecting directly to the SQL database from VS Code. This ensures my data pipeline is production-ready and can handle dynamic data sources. Ensemble vs. Single Model Showdown: The core engine automatically trains and compares an extensive suite of classifiers and regressors, including: Single Models: KNN, Naive Bayes, Logistic Regression, SVM. Ensemble Models: AdaBoost and Gradient Boosting. Metric-Driven Selection: The comparison is rigorous! I'm evaluating performance based on both $\text{R}^2$ Score (for regression tasks) and Accuracy Score (for classification), giving a clear, quantifiable winner. Final Output: The pipeline's last step is to automatically serialize the best-performing model into a .pkl file, ready for immediate deployment. What I Found Interesting Seeing all the math and algorithm theory—from the simplicity of KNN to the complexity of Gradient Boosting—being orchestrated into a single, automated script is incredibly rewarding. This project is the culmination of everything I've learned! Upcoming Learning My focus is now 100% on bringing this Auto-Model Comparator to life, ensuring the code is clean, efficient, and ready to handle various datasets.\ Let's Connect: If you've built an auto-ML project, what was the biggest challenge you faced when creating the model comparison loop? #DataScience #MachineLearning #AutoML #MLOps #SQL #Python #LearningJourney #Day82
To view or add a comment, sign in
-
-
In the past few months, the work of data scientists and analysts has changed a lot. But the rate of change is unevenly distributed—it depends on how you run an analysis. We often had to choose between: A) Quick, manual analysis (in Excel/Tableau/etc) Pro: quick follow up with stakeholders Con: hard to extend or reuse B) Structured, automated analysis (in Python with version control, documentation and parameters) Pro: reproducible, easier to scale, safer to build on Con: fixed setup cost before value BUT, with today’s AI tools, option B can be as fast - or faster - to perform than A. The setup work that used to take a few hours often takes minutes, and you keep the benefits of rigor and reuse. Have you seen the same shift? And are you also finding this lots of fun? #DataScience #Analytics #AILeadership
To view or add a comment, sign in
-
-
🔢 NumPy Practice – Building Strong Data Analytics Foundations 🚀 Today I focused on improving my Python skills by practicing NumPy, one of the core libraries for Data Analytics and Machine Learning. NumPy helps make numerical operations extremely fast, efficient, and clean. 🔍 What I practiced: ✅ Creating arrays using `array()`, `arange()`, `linspace()` ✅ Indexing & slicing (1D, 2D, 3D) ✅ Mathematical & statistical operations ✅ Broadcasting ✅ Reshaping arrays using `reshape()` ✅ Horizontal & vertical stacking ✅ Boolean filtering ✅ Random module (`rand`, `randn`, `randint`) ✅ Vectorization 🔥 Additional Advanced Practice: 📌 Matrix multiplication (`dot`, `matmul`) 📌 Conditional selection using `np.where()` 📌 Sorting arrays using `np.sort()` 📌 Getting unique values with `np.unique()` 📌 Loading files with `np.genfromtxt()` 📌 Checking memory usage of list vs array 📌 Speed testing using `%timeit` 🧠 Why NumPy is a must-learn? * Faster numerical operations * Clean & simplified code * Backbone for Pandas, Scikit-Learn, Matplotlib * Essential for ML, AI, Data Preprocessing 🔗 GitHub Repository Here is the code I practiced today 👇 👉 GitHub:https://lnkd.in/gs-jEcH9
To view or add a comment, sign in
-
Step11 continue .. towards Data Science and ML model creation ############### Before and after Another t-test example i-e t-test paired Problem -: A fitness trainer measure the weight of 8 people before and after 4 week training program. Their weight in [kg] Before = [ 80,85,78,90,95,88,76,82] After = [ 78,84,76,88,94,87,74,81] Can we conclude with 5% significant level that training program had a significant effect on their weight.' ######## Solution this is t-test paired with help python # Import pyhton package import numpy as np from scipy import stats #This is paired sample t-test # Given data is # sample size =8 # alpha = 0.05 Before_weight = [ 80,85,78,90,95,88,76,82] After_weight = [ 78,84,76,88,94,87,74,81] t_stastics , p_value = stats.ttest_rel(Before_weight,After_weight) print("The value of t_stastics -> ",t_stastics) print("The value of p_value -> " ,p_value) # conclude with help of hypothese method if p_value < 0.05: print("We are rejecting null hypothesis, training program had a significant effect on their weight programm success ") else: print("We are accepting null hypothesis, training program had no significant effect on their weight") ======== The value of t_stastics -> 7.937253933193772 The value of p_value -> 9.584590571929183e-05 We are rejecting null hypothesis, training program had a significant effect on their weight programm success
To view or add a comment, sign in
-
🚀 Top 40 NumPy Functions Every Data Pro Should Know Brought to you by programmingvalley.com If you’re learning Data Science or Machine Learning, NumPy is your best friend. Here’s a quick cheat list to make you unstoppable 👇 Array Creation → np.array() – Create arrays → np.zeros(), np.ones() – Empty or filled arrays → np.eye() – Identity matrix → np.arange() / np.linspace() – Evenly spaced numbers → np.random.randint() / np.random.random() – Random values Array Manipulation → reshape() – Change shape → transpose() – Swap axes → concatenate() – Merge arrays → flatten() – Make 1D → unique() – Get distinct values Search & Indexing → argmax() / argmin() – Index of max/min → where() – Conditional filter → nonzero() – Locate non-zero elements Math Operations → sin(), cos(), tan() – Trig → floor(), ceil(), round() – Rounding → exp(), log(), sqrt() – Math essentials → sum(), mean(), std() – Statistics Matrix Magic → dot() / matmul() / @ – Matrix multiplication → linalg.norm() – Vector or matrix norm → sort() / argsort() – Sorting & indexing 💡 Why it matters: NumPy powers pandas, TensorFlow, scikit-learn, and PyTorch. Master these and everything else becomes easier. 🎓 Free Courses to Level Up: Python for Data Science, AI & Development → https://lnkd.in/d5iyumu4 Data Analysis with Python → https://lnkd.in/dc2p2j_W IBM Data Science Professional Certificate → https://lnkd.in/dhtTe9i9 Machine Learning Specialization by Andrew Ng → imp.i384100.net/7aqNGY Save this post 🔖 Share it to help someone master NumPy faster. #NumPy #Python #DataScience #MachineLearning #AI #ProgrammingValley #LearnPython #100DaysOfCode
To view or add a comment, sign in
-
-
🧠 Doing More with Less One of the most powerful shifts in mindset is realizing that progress doesn’t always come from doing more, sometimes it comes from doing the same things more efficiently. Whether it’s writing queries, designing dashboards, or analyzing data, the right tools and approach can help you unlock new levels of clarity, speed, and impact. When I started learning Python for data visualization, I remember how much effort it took to create multiple charts on the same axis using Matplotlib. It was a lot of manual setup and tweaking - juggling subplots, looping through dimensions or even using .twinx() to overlay axes. It worked, but it felt a bit clunky. Then I learnt Seaborn. With just simple parameters like 'hue', 'col', 'row' or functions like .catplot() or .relplot(), I could generate the same visualizations; cleaner, faster, and with far less code. The insights didn’t change, but the process became smoother and more intuitive. It’s a powerful reminder that learning isn’t always about acquiring new skills, sometimes it’s about finding smarter ways to use the ones you already have. Better tools don’t just save time, they reshape how we think. What areas have you found this "Doing More with Less" principle to be true? Perhaps in data cleaning or modeling? #Python #DataVisualization #Seaborn #Matplotlib #DataAnalytics
To view or add a comment, sign in
-
🧩 Data Wrangling made simple with pandas! Whether you’re a beginner or a data pro, mastering tidy data principles is key to making your datasets clean, consistent, and analysis-ready. This cheat sheet covers everything you need to organize, reshape, and manipulate data efficiently using pandas - from creating DataFrames to merging, reshaping, filtering, and summarizing. 🔥 Highlights include: -Creating & reshaping DataFrames -Handling missing values the right way -Merging, joining, and filtering data -GroupBy, apply(), and summarization -Regex tricks for advanced data selection -Method chaining for clean, readable code If you work with data — this is your quick reference guide to pandas power moves. Because clean data = better insights. 🚀 📘 Save this cheat sheet for your next data project! #DataScience #Python #Pandas #DataWrangling #MachineLearning #DataCleaning #Analytics #BigData #TidyData #DataAnalysis #DataEngineer #AI #ML
To view or add a comment, sign in
-
-
Insurance Price Prediction — Part 1 (ML Project Series) 🚀 Welcome to Part 1 of our Machine Learning Project Series — Insurance Price Prediction! In this video, we kick off an end-to-end ML project that applies data science to real-world problems. We’ll start with understanding the dataset, exploring variables, and performing essential data checks to prepare for modeling. 🧠 What You’ll Learn Problem Statement & Objective Importing Libraries Loading and Understanding the Dataset Variable Exploration (age, bmi, charges, etc.) Basic Checks: shape, info, datatypes, columns Unique Values & Value Counts Statistical Summary Missing Values & Duplicates Detection 🧰 Tools & Libraries Used Python | Pandas | NumPy | Matplotlib | Seaborn Watch here : https://lnkd.in/gGbcB_kN 📺 Next Video (Part 2): Data Cleaning & Preprocessing (Coming Soon!) 🎯 Why Watch? If you’re starting your Machine Learning journey or want to understand how real-world ML projects are structured — this is the perfect place to begin! #MachineLearning #InsurancePricePrediction #DataScience #MLProject #PythonForDataScience #LearnMachineLearning #AI #DataAnalysis #Kaggle #MLSeries #DataScienceCommunity #LearnML #MachineLearningProjects
To view or add a comment, sign in
-
🏡 House Price Prediction — My End-to-End Machine Learning Project 🚀 I’m thrilled to share my latest project — House Price Prediction, where I built a Linear Regression model to estimate property prices based on various housing features. Here’s how I approached it step-by-step 👇 🔹 1. Data Exploration & Analysis Explored the dataset using Pandas, Matplotlib, and Seaborn to understand how factors like area, bedrooms, bathrooms, and location affect house prices. Identified key trends and correlations between features. 🔹 2. Data Cleaning & Preprocessing Handled missing values, removed outliers, and encoded categorical data properly. Scaled numerical columns to ensure balanced model learning. 🔹 3. Feature Engineering Selected only the most relevant and impactful features for training to improve model accuracy. 🔹 4. Model Building — Linear Regression Implemented Linear Regression using Scikit-learn. Evaluated model performance with R² Score and Mean Squared Error (MSE) to ensure reliability in predictions. 🔹 5. Deployment with Streamlit Built an interactive web app using Streamlit where users can input property details and get instant price predictions — making data science practical and user-friendly! 🧠 Tools & Libraries: Python | Pandas | NumPy | Scikit-learn | Matplotlib | Seaborn | Streamlit | Joblib ` Repo link: https://lnkd.in/d7GrvBYQ #Ai_Engineer #ML_Engineer #Data_Science #Machine_Learning
To view or add a comment, sign in
Explore related topics
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development