Spend Tracking and Analysis

Spend Tracking and Analysis

For the first few years after starting a job I barely kept track of spending. There were a couple of half-hearted attempts but I still could not accurately answer what proportion of income I was spending. Recently I made a more serious attempt: I tried to answer that, on a monthly basis, how much and what I am spending on.

One issue I had faced on previous attempts was aggregating the data from different accounts (checking, credit card statements in pdf form). This time around some online research on financial analysis tools threw up Mint.com. This provided a great starting point and I would highly recommend it for anyone looking to practice some fiscal responsibility:

You can connect all your financial accounts to this website and it beautifully aggregates all the money coming in/going out from them, along with categorizing each transaction. While it does have some analytical functionality I wanted more granularity and control so decided to use only its account aggregation functionality.

I was interested exclusively on the 'going out' part as that is what changes from month to month and is discretionary. The transactions can also be exported as a CSV file. To get an idea of the fields in the exported CSV, below is a sample transaction:



I started to pull this data at the end of each month from the website (a tedious process of going to the website, logging in, going to transactions page & saving the CSV) to do the analysis for the past month and while it looks like that would be it, there was a lot more tedious Excel manipulation to be done to gain useful insights.

For a start, I was interested only in the money going out so some filtration was required. I needed 'debit' transactions only ('credit' transactions are money coming into the account and had to be filtered out using the 'Transaction Type' field).

The 'Category' field was too granular and needed to be fixed. While I only cared about how much I spent eating out in a given month, there was no ready-made 'Eating Out' category but a bunch of subsets of that in the 'Category' field:

coffee shops, fast food,food & dining,restaurants

These needed to be grouped as 'Eating Out', for example. This grouping/categorization was what took the longest time, as Mint also makes some major mis-categorizations and I had to go over each transaction to find them.

Using Python however made the whole tedious process a matter of a single click. Code with explanations will be shared below which does all the data import, filtering, grouping, renaming etc.

There were some immediate benefits of doing this exercise. Just by going through the transactions without doing any pre-processing, I found out I was subscribed to Amazon Prime when I thought I had successfully unsubscribed from it (had gotten the trial month to watch 'Silicon Valley'). That membership was promptly canceled. I also came across some duplicate and faulty charges which I reversed with the relevant vendors.

Like I said downloading the data from Mint.com was a tedious process. Fortunately there is a Mint API for Python which reads data from Mint and stores it as a Python dataframe:

import mintapi
import datetime
import pandas as pd

mint = mintapi.Mint('username', 'password')
## Get transactions
transaction=mint.get_transactions() # as pandas dataframe
transaction.describe()
transaction.head()

Only 'debit'/expense transactions were needed and month, date, year fields also needed to be created from the 'Date' field:

debit_transaction = transaction[transaction['transaction_type'] == 'debit']


debit_transaction['Date'] =pd.to_datetime(debit_transaction['date'])
debit_transaction['Month']=debit_transaction['Date'].dt.month
debit_transaction['Year'] =debit_transaction['Date'].dt.year

The categories needed to be grouped properly or just renamed. After a lot of trial and error, I settled on the following high-level categories by combining the ones given by Mint (in brackets):

1) Eating Out (coffee shops, fast food, food & dining, restaurants)

2) Travel & Amusement (movies & dvds, hotel, air travel, travel, amusement, sports, sporting goods, entertainment)

3) Groceries & Shopping (groceries, shopping, books, clothing, gift, home improvement, newspapers & magazines, electronics & software,office supplies)

4) Business Services (shipping, personal care, hair, home services, service & parts, tuition)

5) Charity (gifts & donations, kids activities)

6) Bills & utilities (bills & utilities, mobile phone, internet, utilities, television)

7) Transport (parking, rental car & taxi, auto & transport, public transportation, service & parts, auto insurance)

8) Health (doctor, pharmacy)

filter_list = ['bills & utilities','mobile phone','internet','utilities','television']

debit_transaction.loc[debit_transaction.category.isin(filter_list),'category']='Bills & utilities'

 

 

filter_list = ['coffee shops', 'fast food', 'food & dining','restaurants']

debit_transaction.loc[debit_transaction.category.isin(filter_list),'category']='Eating Out'

 

 

filter_list = ['charity','gifts & donations','kids activities' ]

debit_transaction.loc[debit_transaction.category.isin(filter_list),'category']='Charity'

 

 

filter_list = ['movies & dvds', 'hotel','air travel', 'travel', 'amusement', 'sports' , 'sporting goods','entertainment' ]

debit_transaction.loc[debit_transaction.category.isin(filter_list),'category']='Travel & Amusement'

 

 

filter_list = ['parking', 'rental car & taxi','auto & transport','public transportation','service & parts','auto insurance']

debit_transaction.loc[debit_transaction.category.isin(filter_list),'category']='Transport'

 

 

filter_list = ['groceries','shopping','books','clothing','gift','home improvement','newspapers & magazines','electronics & software','office supplies']

debit_transaction.loc[debit_transaction.category.isin(filter_list),'category']='Groceries & shopping'

 

 

filter_list = ['shipping','personal care','hair','home services','tuition']

debit_transaction.loc[debit_transaction.category.isin(filter_list),'category']='Business services'

 

 

filter_list = ['doctor','pharmacy']

debit_transaction.loc[debit_transaction.category.isin(filter_list),'category']='Health'


Specific restaurants were miscategorized as 'grocery' or 'business services' (like we saw for Zareen's) and some grocery stores were categorized as 'business services'. These needed to be hard-coded by using the 'Description' field:

value_list = ["Zareen's Restaurant","Gulzaar Halaal","Vons"]
debit_transaction.loc[debit_transaction.description.isin(value_list),'category']='Eating Out'

value_list = ['Halal Meats']
debit_transaction.loc[debit_transaction.description.isin(value_list),'category']='groceries & shopping'

As I usually want to see data for current month and year I needed to store that in a variable to be able to filter later:

month=datetime.datetime.now().month
year=datetime.datetime.now().year
 

A lot of further pre-processing needed to be done:

1) Credit card bill payments needed to be removed as those expenses were already being counted under the debit transactions

2) Checks needed to be removed as I only use them for rent and that being non-discretionary spending I was not including in the analysis

3) Transactions from Paypal account needed to be removed as they were being double-counted, as can be seen below, since my Paypal is connected to my credit card :

4) Similarly transactions from Venmo (when it actually pulled money from my checking account, not usage of Venmo balance) were accounted for by my checking account. They needed to be removed as well. Probably not a good idea to connect Paypal and Venmo accounts to Mint in the first place.

Below is the code for all this and filtering by current month and year. That is all the pre-processing that is needed and the output can be exported to Excel:

debit_transaction_current_month_year = debit_transaction[(debit_transaction['Month']==month) & (debit_transaction['Year']==year) & (debit_transaction['category']!='credit card payment') & (debit_transaction['category']!='check')  & (debit_transaction['account_name']!='PayPal Account') & (debit_transaction['account_name']!='Venmo')]
## Exporting to Excel
debit_transaction_current_month_year.to_excel('C:\\Users\\mmujtaba\\Downloads\\month%s_transaction.xlsx' %month,index = False)

The same can be done to get the data for the whole year:

debit_transaction_current_year = debit_transaction[(debit_transaction['Year']==year) & (debit_transaction['category']!='credit card payment') & (debit_transaction['category']!='check')  & (debit_transaction['account_name']!='PayPal Account') & (debit_transaction['account_name']!='Venmo')]
## Exporting to Excel
debit_transaction_current_year.to_excel('C:\\Users\\mmujtaba\\Downloads\\2018_transactions.xlsx',index = False)

For some immediate insights, I found comparing 'Proportion of monthly budget spent' and 'Proportion of month passed' at a given time in the month to be useful as well as looking at the transactions for current month:

Total=debit_transaction_current_month_year['amount'].sum()



print("Monthly spending: %s" %(Total))
monthly_budget=***
## Monthly budget spent compared to month spent 
print("Monthly budget spent: %s" %((Total)/monthly_budget))
x=pd.to_datetime(debit_transaction_current_month_year['Date'])
print("Month spent: %s" %(max(x.dt.day)/30))

debit_transaction_current_month_year[['date','category','description','amount']]

This would give an output like this:



In summary, with a single click to run the code, I am able to see where I stand for the current month.

Further exploration can be done in Excel using the exported, processed dataset. Having already seen the spend for the current month, I wanted to see the monthly trend, something like this:

Note: this is dummy data

To analyze months with higher than usual spend, I wanted to see what categories contributed to that, so I broke out the spends by category and month and that was usually very illuminating. For July, we can see it was 'Travel & Amusement':



Sometimes a lump sum number for a month*category is different from what is expected. In that case we can always drill into the transaction-level Excel data for a month and category to check for errors and see it from the ground up.

Using this tool I feel has given me an unprecedented control on finances. Hope it helps : )

Mehdi it would be more interesting if this process could run weekly and send alerts (email/push notification) based on parameters.

Like
Reply

To view or add a comment, sign in

More articles by Mehdi Mujtaba

  • LinkedIn Recruiter Spam

    One somewhat frustrating thing about this platform is getting messages in Focused Inbox for contract roles despite…

    1 Comment
  • Creating Business Value as a Product Data Scientist

    While a Machine Learning Data Scientist might be able to create measurable business value by, say, improving product…

  • Djokovic's Tiebreak 'Lockdown Mode' - By the Numbers

    Delving deeper into Djokovic's recent French Open win, one stat was absolutely mind-blowing: not only did Djokovic win…

  • SQL Data Scientist Interview (Part 2)

    In the last article we went over SELF JOINS: Apart from them, there was another type of rather interesting and…

    1 Comment
  • SQL Data Scientist Interview (Part 1)

    SQL is the the most fundamental skill for a data analyst/data scientist-analytics/insights/product (whatever you want…

    1 Comment
  • Automating (aspects of) Job Search - LinkedIn Scraping

    In my last article I (automatically) went through my Gmail Inbox to find recruiter email addresses and send…

  • Automating (aspects of) Job Search

    After 2 years at Quotient Technology (formerly Coupons.com) I was recently back on the job market: One aspect of this…

  • Federer vs. Djokovic : A data-driven Analysis

    Since 2011 Djokovic has been the dominant player on the Mens’ Tour, which is an understatement. Not even the foremost…

  • San Jose to Honululu Airfare Analysis

    One of the perks of working on the West Coast is closer access to places like Hawaii and Alaska. Having already…

    2 Comments
  • Data Science Resources for Beginners

    In my coursework and projects I have had to search for many resources (practice datasets, intuitive explanations of…

    3 Comments

Others also viewed

Explore content categories