Learning from Mistakes in Data Science with Python

Mistakes are part of the process Day 7 – #100DaysOfCode ⏰ Time Spent: 2 hours ⚒️ What I Did: * Yesterday I have learned one way to read scatter plots , today I Practiced that . * Modified my function to make it reusable * Plotted relationships between complaints and aggregated features I observed only these two trends: * log(x) vs y → logarithmic trend [ y = a · log(x) + b ] * log(x) vs log(y) → power law [ y = k · xᵃ ] But then I realized something important… I was plotting a sum on the x-axis, which naturally increases the values which created misleading patterns. So I switched to mean,but the trends disappeared. Which implies no relation but I'll experiment with few other transformations before I conclude that --- 🚪 Links: * Repo: [https://lnkd.in/g7zsMygp) --- 🧠 Learning: Bad feature choice can create fake patterns. 📌 Closing: Should try to work on these things when I am not tired ( Mornings / After a nap ) #DataScience #DataAnalytics #Python #CodingJourney

To view or add a comment, sign in

More Relevant Posts

Shreya BJ
1w
Report this post
🚀 From Confusion → Clarity: My Approach to “Sort the People” (LeetCode) Today I solved the Sort the People problem, and instead of jumping straight into sorting tricks, I focused on building clarity step by step 👇 🔍 My Thought Process: First, I paired each name with its corresponding height using a list (like a mini mapping). Then, I sorted this list based on height. Since the problem required descending order, I simply reversed the sorted list. Finally, I extracted only the names in the correct order. 💡 Key Learning: Sometimes, the simplest approach is the best one. Instead of overcomplicating with advanced data structures, breaking the problem into smaller transformations made it super manageable. 🧠 What this improved for me: Understanding how to use lambda for sorting Confidence in handling paired data (name + value problems) Thinking in steps rather than jumping to optimization ⚡ Code Strategy in One Line: Pair → Sort → Reverse → Extract Consistency > Speed. One problem at a time. 💪 📈 If you're also grinding DSA, keep going — progress compounds! #DSA #LeetCode #CodingJourney #Python #ProblemSolving #Consistency #TechGrowth #100DaysOfCode #WomenInTech #FutureEngineer
Like Comment
To view or add a comment, sign in
Preethika Kasthuri
4w
Report this post
I had a funny moment while coding today. 😁 Everything was going great. Data was ready. Then I typed the usual code: X = df.drop("price", axis=1) y = df["price"] And my brain just stopped: "Wait... why do we always use X and y? Who made this a rule?" 🤨😭 I looked it up, and it is actually just simple math 📐: Capital X = A big group of data (many columns). Small y = Just one thing (one column). Big data gets a big letter. Small data gets a small letter. 🤯 Do I have to use them? No. I could use normal names like features and price. But am I going to do that? Nope! Tomorrow I will use X and y again. It is just a habit now! 🌚 It is funny how in ML, the biggest questions come from the smallest things. 😅 Be honest: Do you use normal names, or do you also just use X and y? 👇 #MachineLearning #Python #DataScience #CodingLife #SimpleCode
Like Comment
To view or add a comment, sign in
Nabila T.
1mo
Report this post
One habit I’ve started building when working with data: Before writing any logic, I always run: df.head() df.info() df.describe() It sounds obvious. But early on, I skipped this step. I would immediately start writing transformations. And later realize things like: columns were strings instead of numbers values had unexpected formats missing data existed where I didn’t expect it Now I try to slow down and understand the data first. It saves a surprising amount of time later. 💡 Data engineering lesson I’m learning: Understanding the data is often more important than writing the code. #DataEngineering #Python #Pandas
Like Comment
To view or add a comment, sign in
PRAMIT DESHWAL
2w
Report this post
🚀 Day 7/100: Leveling Up with Advanced Pandas! 🐼 🚀 One full week into my #100DaysOfML challenge! Today, I transitioned from just exploring data to actively modifying and manipulating it. I covered some advanced Pandas techniques that are absolute game-changers for feature engineering and data cleaning. Here is a breakdown of today’s learning: ➕ Adding Columns: Learned how to inject new data using both the direct assignment method and the precise .insert() method for specific column placement. 🎯 Targeted Updates: Explored how to update specific, isolated values within a DataFrame without messing up the rest of the data. ⚡ Bulk Column Operations: This is where Pandas shines! I learned how to update an entire column at once (like applying a 5% increase across an entire 'Salary' column in one line of code!). 🧹 Refining Data: Reinforced my concepts on handling missing values and safely removing/dropping unnecessary columns. Theory is great, but muscle memory is better. Tomorrow is going to be 100% dedicated to hands-on practice with these methods in VS Code! 💻✨ #100DaysOfML #MachineLearning #Pandas #DataScience #DataManipulation #Python #100DaysOfCode #TechJourney #Day7
2 Comments
Like Comment
To view or add a comment, sign in
Neeraj yadav
5d
Report this post
Ever noticed how much time goes into just handling files and data every day? I was stuck in a loop — opening multiple Excel files, cleaning data, fixing formats, updating sheets, and repeating the same steps daily. Easily 1.5–2 hours gone. Then one simple thought hit me — what if this entire flow could run on its own? So I built a automation using: 1. Python 2. Pandas (for data handling) 3. Openpyxl (for working with Excel files) Built-in tools like datetime, pathlib, and logging for structure and tracking Now, what used to take hours runs in just a few minutes. More than saving time, it made me realize — a lot of “routine work” is just an automation waiting to happen. Still learning, but definitely seeing work differently now. #Python #Automation #DataAnalytics #Learning
2 Comments
Like Comment
To view or add a comment, sign in
Ronak Jain
1w Edited
Report this post
I built a complete 𝗨𝘀𝗲𝗱 𝗖𝗮𝗿 𝗣𝗿𝗶𝗰𝗲 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗼𝗿 from scratch, creating a full end-to-end pipeline that handles everything from raw data to a live application. Instead of relying on a pre-built dataset, I identified a unique problem and built my own data source using web scraping. My goal was to move beyond tutorials and mimic a real-world 𝗱𝗮𝘁𝗮 𝘀𝗰𝗶𝗲𝗻𝗰𝗲 workflow. • 𝗦𝗰𝗿𝗮𝗽𝗶𝗻𝗴: Automated data collection to get real-time market prices. • 𝗣𝗿𝗲𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴: Cleaning messy web data into a machine-learning-ready format. • 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴: Training a robust regressor to find the patterns. • 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁: Building a Flask web app to make the model accessible to anyone. The Workflow: 𝗦𝗰𝗿𝗮𝗽𝗲 𝗗𝗮𝘁𝗮 → 𝗖𝗹𝗲𝗮𝗻 & 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺 → 𝗧𝗿𝗮𝗶𝗻 𝗠𝗼𝗱𝗲l → 𝗗𝗲𝗽𝗹𝗼𝘆 #MachineLearning #DataScience #Python #Flask #WebScraping #PortfolioProject Check out the full documentation and code on GitHub: https://lnkd.in/gAZp4iKq
1 Comment
Like Comment
To view or add a comment, sign in
Kirti Singh
3w
Report this post
🚀 Day 344 of solving 365 medium questions on LeetCode! 🔥 Today’s challenge: “89. Gray Code” ✅ Problem: You are given an integer n. Your goal is to generate an n-bit Gray code sequence, which is an array of 2^n integers where every adjacent pair of numbers (including the first and last numbers) differs by exactly one single bit in their binary representation. ✅ Approach (Bit Manipulation / The Formula) You could solve this using backtracking or mirroring, but there is a mathematical cheat code that solves it instantly! Find the Size: First, we need to know exactly how many numbers to generate. For an n-bit sequence, there are exactly 2^n numbers. I used a bitwise left shift (1 << n) to calculate this size instantly. The Magic Formula: The i-th number in a standard Gray code sequence can always be found using the exact formula: i ^ (i >> 1). This takes the number, shifts its bits to the right by one, and applies a bitwise XOR against the original number. List Comprehension: I packed this entire logic into a single Python list comprehension that loops from 0 up to our calculated size. It applies the magic formula to every index i, generating the perfect sequence in one go! ✅ Key Insight Bitwise operations are essentially black magic when you know the right formulas. Recognizing that Gray code has a direct integer-to-sequence mapping completely eliminates the need for messy recursive state-tracking. What looks like a complex combinatorial sequence problem is actually just a one-line math trick! ✅ Complexity Time: O(2^n) — We must iterate to generate exactly 2^n elements for the sequence. Space: O(1) — Excluding the space required for the output array, the mathematical generation uses strictly constant auxiliary memory. 🔍 Python solution attached! 🔥 Flexing my coding skills until recruiters notice! #LeetCode365 #BitManipulation #Math #Python #ProblemSolving #DSA #Coding #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Niharika Landge
6d
Report this post
sat down and started working on a real dataset (Bank Marketing), and honestly… it felt a mix of confusing and exciting at the same time. Here’s what I managed to do: • Loaded the dataset using Pandas• Tried to understand what the data actually looks like• Checked data types (realized not everything is numbers 😅)• Looked for missing values — and found that some are hidden as “unknown”• Ran summary statistics to understand the data better• Tried creating visualizations using Seaborn and Matplotlib• Got errors… fixed them… learned from them (this was the real learning moment) 💡 One thing I understood today:Data is not clean and ready — you have to explore, question, and fix things before doing any real analysis. It wasn’t perfect, but it was a start.And I’m showing up again tomorrow. #LearningInPublic #DataAnalyticsJourney #Python #BeginnerJourney #coding ninjas #skillefied mentor
Like Comment
To view or add a comment, sign in
Priscilla Nzula
6d
Report this post
🔷A simple train test split is not always enough. I learned this the hard way when my model looked great on paper and struggled on real data. 📌Here is what nobody tells you about splitting data properly. The basic split gives you two sets. Training and testing. That works for simple projects. But what if you need to tune your model? You test different settings, pick the best one, and evaluate on the test set. The problem is that you have now indirectly used the test set to make decisions. It is no longer a fair judge. This is where a three way split becomes important. 🔹X_train, X_temp, y_train, y_temp = train_test_split( X, y, test_size=0.3, random_state=42 ) 🔹X_val, X_test, y_val, y_test = train_test_split( X_temp, y_temp, test_size=0.5, random_state=42 ) Now you have three sets. Training set. The model learns here. 70 percent of your data. Validation set. You tune and compare models here. 15 percent. Test set. You evaluate the final model here. Once. Never again. 15 percent. The test set is sacred. You look at it exactly one time at the very end. One more thing that most people miss. Always stratify your split when your target column is imbalanced. 🔹train_test_split(X, y, stratify=y, test_size=0.2) stratify=y makes sure both sets have the same proportion of each class. Without it you might end up with a training set that barely sees the minority class and a model that has no idea it exists. The split is not a formality. It is a decision that shapes every result that follows. Get it right before you touch anything else. ❓What split ratio do you use for your projects and why? #DataScience #MachineLearning #Python
Like Comment
To view or add a comment, sign in
Shafiq Ahmed
2w
Report this post
Best for: Posting late at night or early morning to show your dedication. Headline: While the world sleeps, the code keeps building. 🌙 It’s been a high-energy day with 83+ profile visits. People are asking: "What are you building in that 2024 repo?" The answer is simple: The future of my workflow. I am documenting my journey of turning raw Python scripts into scalable, automated solutions. From smart profiling to interactive Plotly dashboards, I’m building a library that works for me, so I don't have to work twice. If you haven’t seen it yet, come take a look and see why the tech community is stopping by my profile. ⭐ Support the work on GitHub: 🔗 https://lnkd.in/dGvJaB7a #DataScience #Hustle #Coding #Python #Automation #Shafiq73 #GitHub
Like Comment
To view or add a comment, sign in

7 followers

13 Posts

View Profile Follow

Learning from Mistakes in Data Science with Python

More Relevant Posts

Explore related topics

Explore content categories