Optimize Pandas Dataframe Memory Usage with df.info() and df.memory_usage()

𝗪𝗼𝗿𝗸𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗹𝗮𝗿𝗴𝗲 𝗱𝗮𝘁𝗮𝘀𝗲𝘁𝘀 𝗶𝗻 𝗣𝗮𝗻𝗱𝗮𝘀 𝘁𝗮𝘂𝗴𝗵𝘁 𝗺𝗲 𝗼𝗻𝗲 𝘀𝗶𝗺𝗽𝗹𝗲 𝗹𝗲𝘀𝘀𝗼𝗻 — 𝗺𝗲𝗺𝗼𝗿𝘆 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗺𝗼𝗿𝗲 𝘁𝗵𝗮𝗻 𝘄𝗲 𝘁𝗵𝗶𝗻𝗸. In the beginning, I used to load dataframes without even thinking about how much memory they consume. Everything looked fine… until one day my script slowed down, and sometimes even crashed. That’s when I realized it’s not always about the data size, it’s about how efficiently we handle it. One simple habit that changed things for me is checking memory usage of a dataframe. In Pandas, you can do this very easily: df.info() This gives a quick summary of your dataframe, including memory usage. But if you want a more detailed view, you can use: df.memory_usage(deep=True) This shows how much memory each column is using. Adding deep=True helps you get accurate results, especially for object-type columns like strings. What I found interesting is that sometimes a few columns consume most of the memory. Especially object columns they silently take up a lot of space. Once you know where the memory is going, you can start optimizing: * Convert object columns to category if they have repeated values * Use smaller data types like int32 instead of int64 * Drop unnecessary columns early These small steps make a big difference, especially when working with large datasets. For me, this was a small learning, but very powerful. Now, before doing any heavy operations, I just take a few seconds to check memory usage and it saves me minutes (sometimes hours) later. If you’re working with Pandas, give this a try. It might look small, but it can completely change how your code performs. #BigData #Python #Pandas #DataAnalytics

  • diagram

To view or add a comment, sign in

Explore content categories