Just practiced Pandas and data cleaning hits different when you're working with real messy data. Covered data types, type conversion, handling missing values, replacing inconsistent entries, and using category dtype to save memory — FuelType column went from 11488 bytes to 1460 bytes just by changing the dtype. Notebook here 👉 https://lnkd.in/d3djYPvp #Python #Pandas #DataAnalysis #LearningInPublic
Optimizing Pandas Data Cleaning for Real-World Data
More Relevant Posts
-
Didn't know you could extract tables from a Word doc using Python until today. python-docx lets you loop through tables, pull cell data, and load it straight into a DataFrame. Spent some time cleaning it up — splitting on ':', transposing, fixing headers — but it worked. Also practiced groupby() and lambda functions inline. Small things but they make the code so much cleaner. Notebook here 👉 https://lnkd.in/dfTwrvqT #Python #Pandas #DataAnalysis #LearningInPublic
To view or add a comment, sign in
-
Practiced Python functions today — *args, **kwargs, return types, and function types. The one thing that clicked — *args is for unknown number of values (stores as tuple), **kwargs is when you don't know the keys either (stores as dict). Simple but I was mixing them up before. Also learned there are actual types of functions — action, transformation, validation. https://lnkd.in/ducSzXzK #Python #LearningInPublic #DataAnalysis
To view or add a comment, sign in
-
🚀 Day 12/30 of My LeetCode Journey (Python + SQL) Consistency continues… and the concepts are getting sharper! 💻🔥 🔹 **SQL Problem of the Day** 👉 *Invalid Tweets* Given a `Tweets` table, write a query to find tweet IDs where the content length is strictly greater than 15 characters. 💡 *Key Concept:* String functions like `LENGTH()` for validation. 🔹 **Python Problem of the Day** 👉 *Single Number* Given an array where every element appears twice except one, find that single element. 💡 *Key Concept:* Bit manipulation using XOR for optimal O(n) time and O(1) space. Loving how problem-solving is becoming more intuitive day by day ⚡ Day 12 done ✅ #LeetCode #30DaysChallenge #Python #SQL #CodingJourney #Consistency #ProblemSolving #Learning #BitManipulation
To view or add a comment, sign in
-
Day 2 of my LeetCode journey 🚀 Today’s problem: Group Anagrams This challenge was all about grouping strings that share the same characters. I approached it using a dictionary + hashing strategy in Python. For each word, I sorted its characters and used that as a key (converted into a tuple), ensuring all anagrams map to the same bucket. Here’s the core logic I implemented: ▪️Traverse the list of strings ▪️Sort each string → convert to tuple → use as dictionary key ▪️Append original string to the corresponding group ▪️Finally, return all grouped values This approach keeps the implementation clean and scalable. Time Complexity: ▪️Sorting each string takes O(k log k) (where k = length of string) ▪️For n strings → O(n * k log k) overall Space Complexity: ▪️O(n * k) for storing grouped anagrams A solid step forward in understanding how hashing + transformations can simplify complex grouping problems. Staying consistent and leveling up daily 💪 #LeetCode #Day2 #Python #DSA #CodingJourney #ProblemSolving
To view or add a comment, sign in
-
-
We think document extraction should be simple. Less than 10 lines of Python to extract structured data from any document. Define your schema, send a file, get JSON back. About 10 lines of code. Uncertain fields get flagged and you decide what to do with them. Learn how to define schemas: https://lnkd.in/g7TH8VmD
To view or add a comment, sign in
-
-
In real-world data workflows, simple Python list methods do a lot of the heavy lifting. append(), insert(), pop(), count(), index(), reverse(), clear() From quick data cleaning to handling transformations, these basics help keep code efficient and reliable. Strong fundamentals always scale. #Python #DataScience #Data
To view or add a comment, sign in
-
QuillSort — A data sorter Created by Isaiah Tucker Most of the time, Python’s built-in sorted() and list.sort() are all you need. But if you ever try to sort a lot of data—millions to billions of values, big numeric logs, or giant SQL exports—you quickly run into a wall: RAM, speed, or both. So I built Quill-Sort (quill-sort on PyPI). / ... link https://lnkd.in/eHaFZyx4 pubDate Wed, 01 Apr 2026 03:29:53 +0000
To view or add a comment, sign in
-
🚀 Day 6/30 of My LeetCode Journey (Python + SQL) Consistency is slowly turning into confidence 💪📈 🔹 **Python Problem of the Day** 👉 *Plus One* Given an integer represented as an array of digits, increment the number by one and return the resulting array. 💡 *Key Concept:* Handling carry from the last digit (especially edge cases like 9 → 10). 🔹 **SQL Problem of the Day** 👉 *Game Play Analysis I* Given a table of player activity, write a query to find the first login date for each player. 💡 *Key Concept:* GROUP BY with MIN() to extract earliest dates. Every day learning something new, refining logic, and improving speed ⚡ Day 6 done ✅ #LeetCode #30DaysChallenge #Python #SQL #CodingJourney #Consistency #ProblemSolving #Learning
To view or add a comment, sign in
-
🚀 Day 85 of #100DaysOfLeetCode 🔍 Problem Solved: Ransom Note (LeetCode 383) Today’s problem was all about efficiently checking whether one string can be constructed from another — a classic hashing / frequency counting concept. ⚡ What I Learned: - Importance of frequency maps (hash tables) - Writing optimized solutions over naive approaches - How built-in methods can simplify logic but may impact performance 📊 Performance: ✅ Runtime: 0 ms (Beats 100%) ✅ Memory: Efficient usage 🔥 Takeaway: Small optimizations and choosing the right data structure can make a huge difference, even in easy problem #Day85 #LeetCode #CodingJourney #Python #DataStructures #ProblemSolving #100DaysOfCode
To view or add a comment, sign in
-
-
Most beginners treat int64 and Int64 as the same. They’re not. 🔍 Quick insight: • int64 → NumPy type ❌ Does NOT support missing values • Int64 → Pandas nullable type ✅ Handles NaN in real-world datasets 💡 Why this matters: Real data is messy. Choosing the wrong data type can break your entire pipeline. Small detail. Big impact. #DataAnalytics #Python #Pandas #DataCleaning #LearningInPublic
To view or add a comment, sign in
Explore related topics
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
Athira P