Predicting Bike Rental Demand with XGBoost and Python

Just finished my first ever data science project through SRM Insider's AI/ML domain and it was a lot more interesting than I expected. The task was predicting hourly bike rental demand in Washington D.C. using 17000+ hours of real weather and calendar data. What surprised me most was that creating the right features mattered way more than picking a fancy model. Giving the model a memory of what happened an hour ago moved the needle more than any tuning did. Final result: 3x better than the baseline, average error down from 132 to 44 on unseen data. Biggest takeaway: EDA and understanding your data before building anything matters more than the model itself. Built with Python, XGBoost, Pandas and a lot of plot staring 😄 GitHub: https://lnkd.in/gP9MNnis #DataScience #MachineLearning #Python #XGBoost #SRMInsider

To view or add a comment, sign in

More Relevant Posts

Djalila BENSALEM
2w
Report this post
🐍 Data Science tip: automate variable type detection before choosing your preprocessing strategy. One of the most overlooked steps in data preparation is correctly identifying the nature of each variable. Because imputation and transformation strategies depend entirely on variable type. Instead of guessing, you can systematically classify variables using simple Python logic: categorical = df.select_dtypes(include=['object', 'category']).columns numerical = df.select_dtypes(include=['int64', 'float64']).columns ordinal = [col for col in numerical if df[col].nunique() < 10] 💡 Then adapt your preprocessing strategy accordingly: Categorical → mode / encoding Numerical → mean or median Ordinal / discrete → careful handling (depends on context) 🔍 Key idea: Before choosing how to impute or transform data, you must first understand what type of variable you're working with. Good data science starts with structure, not models. #Python #DataScience #MachineLearning #DataEngineering #Pandas
Like Comment
To view or add a comment, sign in
Chandu sri
2w
Report this post
Completed my Pandas assignment today — and honestly, it was a great learning session! 📊 Worked with two real-world datasets — the Iris Flower dataset and the Titanic dataset — and applied a range of data analysis operations using Python & Pandas. Here's what I explored: 🌸 Iris Dataset • Displayed the first 10 rows, shape, data types & summary statistics (mean, std, min, max) • Filtered rows where petal_length > 4.5 and species = "Iris-virginica" • Grouped by species to compute average sepal_length, max petal_width & std deviation of sepal_width • Created a new column "petal_ratio" = petal_length / petal_width and found the average per species 🚢 Titanic Dataset • Selected specific columns: Name, Sex, Age, Fare, Survived • Filtered female passengers with Fare > 30 • Grouped by Pclass and computed: survival rate, average fare & average age → 1st class: ~63% survival | 2nd class: ~47% | 3rd class: ~24%
Like Comment
To view or add a comment, sign in
Sidra tul Muntaha
5d
Report this post
📊 Exploring Data with the Iris DatasetRecently, I worked on a simple yet insightful data visualization task using the famous Iris dataset. This exercise helped me strengthen my understanding of data analysis fundamentals. 🔹 Loaded and explored the dataset using pandas 🔹 Analyzed structure with shape, columns, and summary statistics 🔹 Created visualizations using matplotlib & seaborn: ✔️ Scatter plot to study relationships ✔️ Histogram to understand distribution ✔️ Box plot to identify outliers This task enhanced my skills in data exploration and visualization, which are essential for any data science workflow. #DataScience #Python #DataVisualization #Pandas #Seaborn #Matplotlib #MachineLearning #LearningJourney DevelopersHub Corporation©
Like Comment
To view or add a comment, sign in
Gaurav Rawat
2w
Report this post
t-SNE: Visualizing What We Can't See Imagine 784 dimensions compressed to 2 — and the clusters you see tell you everything about the structure of the data. t-SNE makes the invisible visible. Day 27 of 60 → t-SNE — the most beautiful data visualization tool in ML. PCA finds linear components. t-SNE finds NON-LINEAR structure — preserving local neighborhoods. The idea: 1. Measure which points are close in high-dimensional space 2. Lay them out in 2D preserving those closeness relationships 3. Similar points cluster together, dissimilar ones spread apart What good t-SNE output looks like: → Tight clusters = data has natural groupings → Fuzzy boundaries = gradual transitions between groups → Outlier points far from clusters = anomalies CRITICAL caveats: 1. Distances between clusters are NOT meaningful (only within-cluster distances) 2. Results depend on "perplexity" parameter (try 5, 30, 50) 3. Never interpret the x/y axis — they're arbitrary t-SNE is for EXPLORATION, not prediction. But for making the invisible visible? Nothing compares. #tSNE #DataVisualization #MachineLearning #Python #60DaysOfML
Like Comment
To view or add a comment, sign in
Rishikesh Kumar
3w
Report this post
I worked on predicting house prices using a dataset with 78 features, including structural, area, and categorical attributes. The project involved: Cleaning and preprocessing the data 🧹 Feature engineering and encoding categorical variables 🔧 Training multiple models: Linear Regression, Ridge, Lasso, Gradient Boosting, XGBoost, LightGBM, Random Forest ✅ Results: Best model: Linear Regression with RMSE: 0.12 Feature engineering and encoding significantly improved predictions 📊 Graphs and code are available in my GitHub repository: [https://lnkd.in/g88wm43R] Excited to apply these skills to real-world data science problems! #DataScience #MachineLearning #Python #HousingPrices #FeatureEngineering #PredictiveModeling
Like Comment
To view or add a comment, sign in
Mukesh Boolani
2w
Report this post
I finally understand why data scientists say they spend 80% of their time on data. 📊 This week, instead of just reading about the ML lifecycle, I actually did the second step: Data Collection. 🎯 I built my own dataset called "TMDB Top Rated Movies" using their public API. 🎬 It was interesting to see how data can come from different sources some datasets are already available in formats like CSV and JSON, while others can be retrieved using SQL databases. I also learned that data can be collected through APIs or even web scraping depending on the use case. Nothing fancy. Just: 🐍 Python 📡 A bunch of API calls 🔄 Figuring out how to loop through pages without breaking everything In the end, I pulled together 10,000+ movie records clean, structured, and ready for actual analysis or ML. 📁✅ This part felt more like real engineering than anything I have done in a notebook. 🛠️ Small step. But it's real. 🚀 dataset link: https://lnkd.in/dG7EcE5q #MachineLearning #DataScience #Python #LearningByDoing
1 Comment
Like Comment
To view or add a comment, sign in
Gehad AlKady
4w
Report this post
🚀 Day 3 – #Daily_DataScience_Code Taking the next step in our data science journey 👩💻 Today, we move beyond CSV files and explore how to read Excel files with multiple sheets 📊 💻 What we did today: - Loaded an Excel file directly from the web 🌐 - Read all sheets at once using pandas - Retrieved available sheet names - Accessed a specific sheet using its name (not index) - Displayed the first rows using head() 🎯 Key Insight: When working with Excel files, using sheet names makes your code more robust and readable, especially when dealing with multiple datasets. Let’s keep building step by step 🚀 #DataScience #MachineLearning #Python #AI #DataHandling #LearnByDoing #DataScienceWithDrGehad #DailyDataScienceCode
Like Comment
To view or add a comment, sign in
Hazrat Bilal
1w
Report this post
📊 Recently explored 𝘆𝗱𝗮𝘁𝗮-𝗽𝗿𝗼𝗳𝗶𝗹𝗶𝗻𝗴 pandas library for Exploratory Data Analysis (EDA) and it’s a game changer! It provides a complete summary of the dataset with powerful visualizations, helping to quickly understand: 1️⃣ Dataset overview (structure, types) 2️⃣ Missing values detection 3️⃣ Distribution analysis 4️⃣ Correlation insights 5️⃣ Automatic visual reports 💡 One key takeaway: Before starting any data project, it’s highly valuable to review your dataset at least once using this report by ydata-profiling pandas library. It saves time, highlights hidden patterns, and improves decision-making. 🚀 Turning raw data into insights becomes much more efficient! #DataScience #EDA #Python #DataAnalysis #MachineLearning #LearningJourney

1 Comment
Like Comment
To view or add a comment, sign in
Topfolio

75 followers
1w
Report this post
Data Science tech stack 2020: - pandas - sklearn - matplotlib Data Science tech stack 2026: - pandas (legacy support) - polars (the cool kid) - sklearn - xgboost - lightgbm - shap - langchain - llamaindex - pydantic-ai - weave - mlflow - dvc - optuna - great expectations - prefect - fastapi - streamlit - gradio You don't need all of them. You need the 3-4 that solve YOUR problem. Tag someone still trying to learn every tool. Overwhelmed? Our roadmaps tell you which 3-4 tools per role, in order to learn them: https://lnkd.in/ga9TFJh5 #DataScience #Python #TechStack #MachineLearning #DataEngineering #MLOps #DataHumor #Memes
1 Comment
Like Comment
To view or add a comment, sign in
Shaurab Kumar Jha
1w
Report this post
Day 12 — Pandas DataFrames Deep Dive 🚢 Today I worked with the Titanic dataset and explored how real-world data looks and behaves. Here’s what I did: ✔ Created DataFrames from scratch (list, dict, CSV) ✔ Explored data using shape, info, describe ✔ Handled missing values (NaN) using fillna & dropna ✔ Applied filtering using conditions (AND/OR) ✔ Performed sorting, ranking, and correlation analysis ✔ Created new features using apply() One key learning: 👉 Real data is messy — handling missing values and filtering correctly is the real skill. This is what actual data analysis looks like. GitHub 👇 https://lnkd.in/gmTDWP_x #Day12 #90DaysOfRevision #Pandas #Python #DataAnalysis #MachineLearning
Like Comment
To view or add a comment, sign in

90 followers

2 Posts

View Profile Connect

Predicting Bike Rental Demand with XGBoost and Python

More Relevant Posts

Explore related topics

Explore content categories