Automate CSV Join Key Finder with Python Tool

I kept running into the same issue while working with multiple datasets — figuring out which columns to use for JOINs was taking way more time than it should. So I decided to build a small Python tool to handle this. It scans multiple CSV files and automatically finds the right join keys. The interesting part is: It only focuses on meaningful columns (like IDs / ObjectIds) Ignores normal text columns like name, status, etc. Even matches columns with different names (_id, user_id, productId) And checks the full dataset instead of just samples The output is simple and clear, something like: customers._id <> orders.user_id books.book_id <> sales.product_id This made my data analysis work much faster and cleaner, especially when dealing with messy or unknown datasets. Still improving it, but pretty useful already. If you’ve faced similar problems or have ideas to improve this, would love to hear your thoughts 👍 #Python #SQL #DataAnalytics #DataEngineering #Projects

  • text

To view or add a comment, sign in

Explore content categories