Just finished my first ever data science project through SRM Insider's AI/ML domain and it was a lot more interesting than I expected. The task was predicting hourly bike rental demand in Washington D.C. using 17000+ hours of real weather and calendar data. What surprised me most was that creating the right features mattered way more than picking a fancy model. Giving the model a memory of what happened an hour ago moved the needle more than any tuning did. Final result: 3x better than the baseline, average error down from 132 to 44 on unseen data. Biggest takeaway: EDA and understanding your data before building anything matters more than the model itself. Built with Python, XGBoost, Pandas and a lot of plot staring 😄 GitHub: https://lnkd.in/gP9MNnis #DataScience #MachineLearning #Python #XGBoost #SRMInsider
Predicting Bike Rental Demand with XGBoost and Python
More Relevant Posts
-
🐍 Data Science tip: automate variable type detection before choosing your preprocessing strategy. One of the most overlooked steps in data preparation is correctly identifying the nature of each variable. Because imputation and transformation strategies depend entirely on variable type. Instead of guessing, you can systematically classify variables using simple Python logic: categorical = df.select_dtypes(include=['object', 'category']).columns numerical = df.select_dtypes(include=['int64', 'float64']).columns ordinal = [col for col in numerical if df[col].nunique() < 10] 💡 Then adapt your preprocessing strategy accordingly: Categorical → mode / encoding Numerical → mean or median Ordinal / discrete → careful handling (depends on context) 🔍 Key idea: Before choosing how to impute or transform data, you must first understand what type of variable you're working with. Good data science starts with structure, not models. #Python #DataScience #MachineLearning #DataEngineering #Pandas
To view or add a comment, sign in
-
Completed my Pandas assignment today — and honestly, it was a great learning session! 📊 Worked with two real-world datasets — the Iris Flower dataset and the Titanic dataset — and applied a range of data analysis operations using Python & Pandas. Here's what I explored: 🌸 Iris Dataset • Displayed the first 10 rows, shape, data types & summary statistics (mean, std, min, max) • Filtered rows where petal_length > 4.5 and species = "Iris-virginica" • Grouped by species to compute average sepal_length, max petal_width & std deviation of sepal_width • Created a new column "petal_ratio" = petal_length / petal_width and found the average per species 🚢 Titanic Dataset • Selected specific columns: Name, Sex, Age, Fare, Survived • Filtered female passengers with Fare > 30 • Grouped by Pclass and computed: survival rate, average fare & average age → 1st class: ~63% survival | 2nd class: ~47% | 3rd class: ~24%
To view or add a comment, sign in
-
📊 Exploring Data with the Iris DatasetRecently, I worked on a simple yet insightful data visualization task using the famous Iris dataset. This exercise helped me strengthen my understanding of data analysis fundamentals. 🔹 Loaded and explored the dataset using pandas 🔹 Analyzed structure with shape, columns, and summary statistics 🔹 Created visualizations using matplotlib & seaborn: ✔️ Scatter plot to study relationships ✔️ Histogram to understand distribution ✔️ Box plot to identify outliers This task enhanced my skills in data exploration and visualization, which are essential for any data science workflow. #DataScience #Python #DataVisualization #Pandas #Seaborn #Matplotlib #MachineLearning #LearningJourney DevelopersHub Corporation©
To view or add a comment, sign in
-
-
t-SNE: Visualizing What We Can't See Imagine 784 dimensions compressed to 2 — and the clusters you see tell you everything about the structure of the data. t-SNE makes the invisible visible. Day 27 of 60 → t-SNE — the most beautiful data visualization tool in ML. PCA finds linear components. t-SNE finds NON-LINEAR structure — preserving local neighborhoods. The idea: 1. Measure which points are close in high-dimensional space 2. Lay them out in 2D preserving those closeness relationships 3. Similar points cluster together, dissimilar ones spread apart What good t-SNE output looks like: → Tight clusters = data has natural groupings → Fuzzy boundaries = gradual transitions between groups → Outlier points far from clusters = anomalies CRITICAL caveats: 1. Distances between clusters are NOT meaningful (only within-cluster distances) 2. Results depend on "perplexity" parameter (try 5, 30, 50) 3. Never interpret the x/y axis — they're arbitrary t-SNE is for EXPLORATION, not prediction. But for making the invisible visible? Nothing compares. #tSNE #DataVisualization #MachineLearning #Python #60DaysOfML
To view or add a comment, sign in
-
-
I worked on predicting house prices using a dataset with 78 features, including structural, area, and categorical attributes. The project involved: Cleaning and preprocessing the data 🧹 Feature engineering and encoding categorical variables 🔧 Training multiple models: Linear Regression, Ridge, Lasso, Gradient Boosting, XGBoost, LightGBM, Random Forest ✅ Results: Best model: Linear Regression with RMSE: 0.12 Feature engineering and encoding significantly improved predictions 📊 Graphs and code are available in my GitHub repository: [https://lnkd.in/g88wm43R] Excited to apply these skills to real-world data science problems! #DataScience #MachineLearning #Python #HousingPrices #FeatureEngineering #PredictiveModeling
To view or add a comment, sign in
-
-
I finally understand why data scientists say they spend 80% of their time on data. 📊 This week, instead of just reading about the ML lifecycle, I actually did the second step: Data Collection. 🎯 I built my own dataset called "TMDB Top Rated Movies" using their public API. 🎬 It was interesting to see how data can come from different sources some datasets are already available in formats like CSV and JSON, while others can be retrieved using SQL databases. I also learned that data can be collected through APIs or even web scraping depending on the use case. Nothing fancy. Just: 🐍 Python 📡 A bunch of API calls 🔄 Figuring out how to loop through pages without breaking everything In the end, I pulled together 10,000+ movie records clean, structured, and ready for actual analysis or ML. 📁✅ This part felt more like real engineering than anything I have done in a notebook. 🛠️ Small step. But it's real. 🚀 dataset link: https://lnkd.in/dG7EcE5q #MachineLearning #DataScience #Python #LearningByDoing
To view or add a comment, sign in
-
-
🚀 Day 3 – #Daily_DataScience_Code Taking the next step in our data science journey 👩💻 Today, we move beyond CSV files and explore how to read Excel files with multiple sheets 📊 💻 What we did today: - Loaded an Excel file directly from the web 🌐 - Read all sheets at once using pandas - Retrieved available sheet names - Accessed a specific sheet using its name (not index) - Displayed the first rows using head() 🎯 Key Insight: When working with Excel files, using sheet names makes your code more robust and readable, especially when dealing with multiple datasets. Let’s keep building step by step 🚀 #DataScience #MachineLearning #Python #AI #DataHandling #LearnByDoing #DataScienceWithDrGehad #DailyDataScienceCode
To view or add a comment, sign in
-
-
📊 Recently explored 𝘆𝗱𝗮𝘁𝗮-𝗽𝗿𝗼𝗳𝗶𝗹𝗶𝗻𝗴 pandas library for Exploratory Data Analysis (EDA) and it’s a game changer! It provides a complete summary of the dataset with powerful visualizations, helping to quickly understand: 1️⃣ Dataset overview (structure, types) 2️⃣ Missing values detection 3️⃣ Distribution analysis 4️⃣ Correlation insights 5️⃣ Automatic visual reports 💡 One key takeaway: Before starting any data project, it’s highly valuable to review your dataset at least once using this report by ydata-profiling pandas library. It saves time, highlights hidden patterns, and improves decision-making. 🚀 Turning raw data into insights becomes much more efficient! #DataScience #EDA #Python #DataAnalysis #MachineLearning #LearningJourney
To view or add a comment, sign in
-
Data Science tech stack 2020: - pandas - sklearn - matplotlib Data Science tech stack 2026: - pandas (legacy support) - polars (the cool kid) - sklearn - xgboost - lightgbm - shap - langchain - llamaindex - pydantic-ai - weave - mlflow - dvc - optuna - great expectations - prefect - fastapi - streamlit - gradio You don't need all of them. You need the 3-4 that solve YOUR problem. Tag someone still trying to learn every tool. Overwhelmed? Our roadmaps tell you which 3-4 tools per role, in order to learn them: https://lnkd.in/ga9TFJh5 #DataScience #Python #TechStack #MachineLearning #DataEngineering #MLOps #DataHumor #Memes
To view or add a comment, sign in
-
-
Day 12 — Pandas DataFrames Deep Dive 🚢 Today I worked with the Titanic dataset and explored how real-world data looks and behaves. Here’s what I did: ✔ Created DataFrames from scratch (list, dict, CSV) ✔ Explored data using shape, info, describe ✔ Handled missing values (NaN) using fillna & dropna ✔ Applied filtering using conditions (AND/OR) ✔ Performed sorting, ranking, and correlation analysis ✔ Created new features using apply() One key learning: 👉 Real data is messy — handling missing values and filtering correctly is the real skill. This is what actual data analysis looks like. GitHub 👇 https://lnkd.in/gmTDWP_x #Day12 #90DaysOfRevision #Pandas #Python #DataAnalysis #MachineLearning
To view or add a comment, sign in
Explore related topics
- Weather model accuracy and data shortages
- How Data Science Drives AI Development
- Tips for Machine Learning Success
- Real-World Data Science Projects
- ML in high-resolution weather forecasting
- Data Science in Social Media Algorithms
- How To Fine-Tune AI Models On Small Datasets
- AI for Predictive Modeling in Earth Sciences
- Key Insights from Weather Prediction Models
- Advanced AI models for pattern analysis in weather
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development