Working with messy real-world datasets taught me one thing: The cleaning step takes longer than the actual analysis. So I spent the last few weeks building dfdoctor - an open-source Python library that audits your DataFrame, tells you what’s wrong, and helps you fix it systematically instead of manually. It helps you quickly understand what’s broken and what to fix first. The part I'm most proud of: 5 correlation methods (including Kendall τ and Phi-k) implemented from scratch in pure numpy - no scipy dependency anywhere. 164 tests. CI passing across Python 3.9–3.12. Try it: pip install dfdoctor https://lnkd.in/e-ChV6mE #Python #OpenSource #DataEngineering #Pandas #EDA #DataScience
Introducing dfdoctor: Automated DataFrame Auditor and Cleaner
More Relevant Posts
-
Day 7 of my Python learning journey: Today I focused on classic array and string patterns, and tried to keep solutions clean and efficient. What I solved: Two Sum using brute-force and hash map Valid Palindrome with two pointers Move Zeros with an in-place two-pointer approach Container With Most Water in O(n) Big takeaway: Correctness first, clarity second, optimization third. Small design choices, like in-place updates vs extra arrays, really affect code quality. GitHub link: https://lnkd.in/gGPw8_js #Python #ProblemSolving #Algorithms #TwoPointers #LearningInPublic #CleanCode
To view or add a comment, sign in
-
Python Clarity Series – Episode 24 Topic: Late Binding in Loops (Functions) ⚠️ Advanced pitfall: Late binding in loops funcs = [] for i in range(3): funcs.append(lambda: i) for f in funcs: print(f()) Output: 2 2 2 ❗ 👉 Expected: 0 1 2 👉 Got: same value 💡 Reason: Lambda captures variable, not value. 💡 Fix: funcs.append(lambda i=i: i) 💡 Rule: Default arguments capture current value. This is a classic interview trap. #PythonAdvanced #CodingPitfalls #DeveloperLevel #python #clarity
To view or add a comment, sign in
-
-
🐍 Week 17 – Refining My Python Skills 🐍 This was a shorter week due to some personal commitments, but I focused on implementing the quicksort algorithm. Like last week with merge sort, I wanted to deeply understand how quicksort works. Here are the key concepts I worked on: - Implemented a separate partition function to split around a pivot. In my approach, I used three buckets (left, middle, right) to handle duplicates more cleanly and avoid unnecessary comparisons. - Created a quick_sort function to implement the main algorithm, recursively sorting the partitions until the base case was met. After practicing merge sort, quicksort felt more intuitive, and the concepts connected more easily. In Week 18, I plan to practice implementing the Luhn Algorithm. #Python #CodingJourney #LearningInPublic
To view or add a comment, sign in
-
Beyond String Concatenation When I started, I used to concatenate strings the old-school way. It was messy, prone to errors, and hard to read The Problem: Using + requires manual type conversion (like str(21)) and gets confusing with all the extra quotes and spaces Solution: F-strings Introduced in Python 3.6, F-strings makes your code: ✅ Readable: You see the full sentence structure ✅ Fast: They are more efficient than older methods ✅ Flexible: You can perform math or call methods directly inside { } It’s a small concept, but it’s one of the easiest ways to make code look 10x more professional. #Python #30DaysOfCode #BCA #LearningInPublic #Day21 #JECRC Day 21/30
To view or add a comment, sign in
-
-
Python Clarity Series – Episode 23 Topic: Floating Point Precision Issue 🤯 Why does this happen? print(0.1 + 0.2) Output: 0.30000000000000004 ❗ 👉 This is NOT a Python bug. It’s due to how floating-point numbers are stored in binary. 💡 Fix (when needed): round(0.1 + 0.2, 1) Output: 0.3 💡 Concept: Computers approximate decimal values internally. Important in: ✔ Financial calculations ✔ Data Science Don’t ignore this. #PythonConcepts #FloatingPoint #RealWorldCoding #python #clarity
To view or add a comment, sign in
-
-
Day 109 Backtracking patterns are repeating again — and that’s a good sign. #Day109 🧩 78. Subsets How today went: • Used recursion to explore all elements • At each step, decide to include or skip the current element • Append current subset → explore → then pop to backtrack • Move to the next index and repeat What I’m noticing: Subsets is one of the cleanest backtracking patterns: → choose → explore → undo Another revision day, but clarity is improving. Consistency continues. #LeetCode #DSA #Python #Backtracking #Recursion #LearningInPublic #Consistency
To view or add a comment, sign in
-
-
Most implementations of the State pattern in Python look very “clean”. Lots of small classes. A base interface. One class per state. But if you’ve ever worked with one in a real project, you know the downside: transitions are scattered, behaviour is hard to see in one place, and adding new states often means touching multiple files. In today’s video, I rebuild the State pattern in a very different way. Instead of relying on inheritance, I make the state machine explicit as data and use decorators to define transitions. The result is a small, reusable engine where the entire flow becomes visible at a glance. If you’re interested in writing Python that’s easier to reason about and extend, this is a pattern worth understanding. 👉 Watch here: https://lnkd.in/e9Y3xGNF. #python #softwaredesign #designpatterns #statemachine #cleancode
To view or add a comment, sign in
-
-
Day 106 Some problems feel simple when the pattern clicks. #Day106 🧩 17. Letter Combinations of a Phone Number How today went: • Used a digit → letters map • Built combinations using backtracking • Maintained a string path at each step • One recursive call per choice — no need for complex state handling What I realized: Once you understand the pattern: → choose a letter → move to next digit → build the path Backtracking becomes very natural. Simple problem, but great for building confidence. #LeetCode #DSA #Python #Backtracking #Recursion #LearningInPublic #Consistency
To view or add a comment, sign in
-
-
Day 102 Backtracking patterns are starting to repeat. #Day102 🧩 90. Subsets II How today went: • Similar to the basic Subsets problem • Key difference: handling duplicates • Sorting the array is important • While iterating, skip duplicates to avoid repeating subsets What clicked: Backtracking becomes easier when you: • Recognize the base pattern • Add constraints like duplicate handling Same structure, new rule. That’s how patterns build. #LeetCode #DSA #Python #Backtracking #Recursion #LearningInPublic #Consistency
To view or add a comment, sign in
-
-
Most people use Pandas for EDA. 𝗩𝗲𝗿𝘆 𝗳𝗲𝘄 𝘂𝘀𝗲 𝗶𝘁 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁𝗹𝘆. That’s the difference between spending hours exploring data and getting insights in minutes. Over time, one thing has stood out to me: It’s not just about the insights - it’s about how efficiently you get there. I’ve put together a quick reference: 📊 10 Pandas EDA Tricks that help: • Write cleaner, more readable code • Speed up analysis • Build more reliable workflows 📌 Attached is a cheat sheet for easy reference. 𝗙𝗼𝗿 𝗮 𝗱𝗲𝘁𝗮𝗶𝗹𝗲𝗱 𝗯𝗿𝗲𝗮𝗸𝗱𝗼𝘄𝗻: 🔗 https://lnkd.in/gv6_TmUD What’s one Pandas tricks you use that saves you the most time? #DataAnalytics #DataScience #Python #Pandas #EDA
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
https://github.com/Ajayvarmaramineni/dfdoctor