Analyzing GMail (Gsuite) with Python and Google Sheets
Earlier this week, I published this tweet:
I have a Gsuite account going back a few years for `kensodev.com`. I wanted to clean up my inbox a bit. Obviously, it's a very lengthy process if you go one by one, so I was looking for a way to group my email by the sender and just delete things in one big swoop.
For example, notifications@github.com is a no-brainer, right? But what else?
Google Export
Google has an export function. You can export your entire email archive. They will send you an `mbox` file that contains all of your email.
My file was 13G and contained 300,224 messages.
Exporting to CSV using Python
I wrote a very simple python script that loads the file and exports into a CSV
from mailbox import mbo
from csv import writer, writerow
if __name__ == "__main__":
writer = writer(open("clean_mail.csv", "w"))
for message in mbox("mailbox.mbox"):
writerow([message["from"], message["subject"], message["date"]])
Importing to Google Sheets
After you have a CSV, you can import that into google sheets. Here, I encountered another problem. The format of the email was too long like this `Email Sender <email@sender.com>`.
This can be a problem for things like Github notifications, where the username is different but the email is the same.
Google sheets has support for regular expression, so I created a column with the formula:
=iferror(Regexextract(A1,"[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4}"),"")
Applied that formula to all rows, which gave me a sheet that looks clean:
Enters a pivot table
Now, we can create a pivot table that counts the email addresses and sorts by them
And that's it, now I have a sorted sheet that has the most "noisy" emails at the top. I can create more rules, delete them, etc.
Happy hacking!