Analyzing GMail (Gsuite) with Python and Google Sheets

Analyzing GMail (Gsuite) with Python and Google Sheets

Earlier this week, I published this tweet:

I have a Gsuite account going back a few years for `kensodev.com`. I wanted to clean up my inbox a bit. Obviously, it's a very lengthy process if you go one by one, so I was looking for a way to group my email by the sender and just delete things in one big swoop.

For example, notifications@github.com is a no-brainer, right? But what else?

Google Export

Google has an export function. You can export your entire email archive. They will send you an `mbox` file that contains all of your email.

My file was 13G and contained 300,224 messages.

Exporting to CSV using Python

I wrote a very simple python script that loads the file and exports into a CSV

from mailbox import mbo
from csv import writer, writerow


if __name__ == "__main__":
    writer = writer(open("clean_mail.csv", "w"))
    for message in mbox("mailbox.mbox"):
        writerow([message["from"], message["subject"], message["date"]])
        

Importing to Google Sheets

After you have a CSV, you can import that into google sheets. Here, I encountered another problem. The format of the email was too long like this `Email Sender <email@sender.com>`.

This can be a problem for things like Github notifications, where the username is different but the email is the same.

Google sheets has support for regular expression, so I created a column with the formula:

=iferror(Regexextract(A1,"[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4}"),"")        

Applied that formula to all rows, which gave me a sheet that looks clean:

No alt text provided for this image

Enters a pivot table

Now, we can create a pivot table that counts the email addresses and sorts by them

No alt text provided for this image

And that's it, now I have a sorted sheet that has the most "noisy" emails at the top. I can create more rules, delete them, etc.

Happy hacking!


To view or add a comment, sign in

More articles by Avi Zurel

  • The 3 W’s: A Simple Way to Solve Problems Smarter

    In engineering, we’re constantly tackling problems. But sometimes, what looks like a simple request actually hides a…

  • Our Ephemeral dev environments -driven by Slack

    In one of my recent posts (https://www.linkedin.

  • Investing in multi-cloud

    One of the goals for us this year is to invest in multi-cloud deployments. There are multiple reasons for this, I…

    1 Comment
  • Note I shared internally after the Twitter hack

    If you live on earth and are in tech, you know that Twitter was hacked via social engineering and access to internal…

    1 Comment
  • How we deploy to production

    Yesterday, I was texting with a friend that asked me a question about the load on our ops team during COVID With remote…

    5 Comments
  • Managing a flexible cluster of micro-services - HOWTO in the wild

    We've been managing flexible clusters of micro-services for quite some time now. Over the years, we've learned some…

  • Dishes and Laundry

    I had a conversation today with one of my project managers. It made me think about this "dishes and laundry" term in…

    1 Comment
  • Adhesion

    I recently gave a short talk in our engineering all-hands. I got a ton of responses, leading me to think that this…

  • Managing CI Configurations At Scale - Globality

    Intro You often hear about scale issues when discussing capacity, traffic, users and other resource related topics and…

    3 Comments
  • Chat controlled RC car using a RaspberryPi

    I've been streaming myself coding on Twitch for the last couple of months. I wanted to build something really fun and…

    1 Comment

Others also viewed

Explore content categories