Pivot Table in Python
Drag & Drop Data Analysis in Python
Introduction
Exploratory Data Analysis in Python was typically done by using the Pandas library. If you're using Python for data analysis you’ll know that analyzing data with different situations wasn’t accessible, because you’ve to write separate code for various conditions (filters, aggregations).
In contrast with analyzing data in Tableau, we can just drag and drop columns into rows & columns to create different levels of aggregation as well as use the filters card to create drop-down menus to interactively filter the values in our table.
We all know how useful python is to clean data, write calculations and conduct analysis but sometimes, especially at the beginning of the analysis, we just want to quickly run through different levels of aggregations and filter conditions to answer a few business questions (understand the data in business perspective).
Therefore, I would usually start exploring the data with Tableau and Python interchangeably so that I could benefit from both tools.
With that said, it makes me think that what if Python could have a similar feature (drag & drop analysis) with Tableau? Then I came across a random post from my connections on LinkedIn sharing about this python library called ipypivot and pivottablejs.
Install Library
If you haven’t already, you’ll need to install the library.
Data & Resources
Data: GDrive Link
Medium: medium.com/@foocheechuan
Youtube: Chee-Chuan
Read Data
import pandas as pd
df = pd.read_excel(f'{filepath}/Global Superstore.xlsx')
df
Pivot Table (pivottablejs)
from pivottablejs import pivot_ui
pivot_ui(df)
Drag & Drop pivot table
If you’ve used Tableau (cross table) or Excel (pivot table) before, the idea is pretty similar. We need a Dimension (Categorical field) and a Measure (Numerical field).
We’ll need to drag a Dimension into the view and then select the aggregation method and measure field in the drop-down menu. Unlike the usual drag & drop analytics, that drags both dimension and measure into the view. If we drag Sales into the view, it will be treated as a dimension.
Problem with (pivottablejs)
There is one problem with the pivottablejs library. There are unexpected null values in my table when the raw data wasn’t supposed to have null values.
I changed the aggregation to count to understand what happened.
There are 51,2941 null values and 51,290 non-null values which means it's a duplication of my raw data (inclusive of column name).
I solved this problem thanks to the solution stated on this page.
To put it in my code, I modified the solution a little bit.
# We already defined df before this
# If you're starting new define your df first
def pivot_ui(df, **kwargs):
import pivottablejs
class _DataFrame(pd.DataFrame):
def to_csv(self, **kwargs):
return super().to_csv(**kwargs).replace("\r\n", "\n")
return pivottablejs.pivot_ui(_DataFrame(df), **kwargs)
pivot_ui(df)
Problem Solving
Now let’s answer some basic data exploration questions with this pivot table.
Question 1:
Which Sub-Categories have the top 3 highest total profit? For these Sub-Categories, are they in the same Category? Which Market in these Sub-Categories has the highest total profit?
Q1 Solution:
Drag Category into the vertical container followed by Sub-Category below it.
Drag Market into the horizontal container
Aggregation method changed to Sum
Measure choose Profit
Sort descending order vertically (Arrow beside aggregation drop-down menu)
Q1 Answer:
Copiers, Phones, and Bookcases have the top 3 highest total profits.
Copiers and Phones are in the Technology Category while Bookcases are in the Furniture Category.
Copiers and Phones have the highest total profit in the APAC market while bookcases have the highest total profit in the EMEA market.
Question 2:
Which market has the highest total profit in the “Furniture” category?
Q2 Solution:
Using the same pivot table removes the Sub-Category from the vertical container (drag it back to the left container).
Drag Market from the horizontal container to the vertical container.
Filter Category to “Furniture” -> Click on the arrow beside Category -> uncheck all check boxes except “Furniture”.
Q2 Answer:
The APAC market has the highest total profit for the furniture Category.
Video Tutorial
Final Words
Data analysis used to be a code-only profession. With technology improving day by day data analysis has become easier. Many see this opportunity and offer code-free saas tools for drag & drop analysis but charge a premium for it. This makes new learners hard to start their first project with limited resources. So I’ll try my best to provide free tutorials to enable new learners to learn easier and faster.
With that said, kindly leave your comment down below on what kind of tutorials you’d like me to do. I’ll keep creating content like this on my medium and youtube channel so follow me on these platforms.
Thank you for reading till the end. See you at the next one. :)