Introducing dfdoctor: Automated DataFrame Auditor and Cleaner

Working with messy real-world datasets taught me one thing: The cleaning step takes longer than the actual analysis. So I spent the last few weeks building dfdoctor - an open-source Python library that audits your DataFrame, tells you what’s wrong, and helps you fix it systematically instead of manually. It helps you quickly understand what’s broken and what to fix first. The part I'm most proud of: 5 correlation methods (including Kendall τ and Phi-k) implemented from scratch in pure numpy - no scipy dependency anywhere. 164 tests. CI passing across Python 3.9–3.12. Try it: pip install dfdoctor https://lnkd.in/e-ChV6mE #Python #OpenSource #DataEngineering #Pandas #EDA #DataScience

  • graphical user interface

To view or add a comment, sign in

Explore content categories