SQL + Pandas: The Ultimate Data Duo

SQL + Pandas: The Ultimate Data Duo

Who knew SQL and Pandas could team up to make data analysis this fast, fun, and a little addictive?

PANDASQL - pandasql lets you run SQL queries directly on your Pandas dataframes—so you get the power of SQL without leaving Python!

Article content

SQL is useful for easily filtering rows, aggregating data, or applying multi-condition logic. Python, on the other hand, offers advanced tools for statistical analysis and custom computations, as well as set-based operations, which extend beyond SQL’s capabilities. When used together, SQL simplifies data selection, while Python adds analytical flexibility and versatility.

1. How to Use pandasql in Google Colab?

To run pandasql Inside Colab, start with the following code :

Step 1: Install pandasql

Run this in a Colab cell:

!pip install pandasql        

Step 2: Import the necessary libraries

import pandas as pd
from pandasql import sqldf        

# Create a helper function for running queries

pysqldf = lambda q: sqldf(q, globals())        
# Example: Select all rows where Age > 30

query = "SELECT * FROM df WHERE Age > 30;"
result = pysqldf(query)
result        

# Real-World Project: Analyzing Uber Driver Performance Data

In this project, we analyze Uber driver performance data to evaluate and optimise bonus strategies, aiming to improve efficiency and driver engagement.

// Data Exploration and Analytics

Now, let’s explore the datasets. First, we will load the data.

// Initial Dataset Loading

Let’s load the dataset by using just pandas.

import pandas as pd
import numpy as np
df = pd.read_csv('dataset_2.csv')        

// Exploring the Data

Now let’s review the dataset.

df.head(10)        

The dataset includes columns such as driver name, number of trips completed, acceptance rate (percentage of trip requests accepted), total supply hours (hours spent online), and average rating. Before we dive into the analysis, it’s important to verify the column names, check the data types of each column, and identify any missing values. This ensures that we reference the data correctly and handle any data quality issues upfront, making our analysis accurate and reliable.

Article content
UBER Drivers Analysis

The dataset includes columns such as driver name, number of trips completed, acceptance rate (percentage of trip requests accepted), total supply hours (hours spent online), and average rating. Before starting the analysis, let’s also check the data types of each column and identify if there are any missing values

Article content
Checking the data types and null values

The dataset has five different columns, and there are no missing values. I have used both SQL and Python to answer the questions.

Question 1: Who Qualifies for Bonus Option 1?

$50 for each driver who is online at least 8 hours, accepts 90% of requests, completes 10 trips, and has a rating of 4.7 or better during the time frame.

// Step 1: Filtering the Qualifying Drivers with SQL (pandasql)

In this step, I'm using pandasql.

In the following code, I have selected all drivers who meet the conditions for the Option 1 bonus using the WHERE clause and the An An AND operator for linking multiple conditions.

 

import pandas as pd
from pandasql import sqldf

# Who qualifies for Bonus
query = """
SELECT Name
FROM df
WHERE Hours_Worked >= 60
AND Rating >= 4.05
"""

# Run the query
result = sqldf(query, locals())
print(result)
        

// Step 2: Finishing in Pandas

Using the following Python code, I have calculated the total payout by multiplying the number of qualified drivers (using len()) by the $50 bonus per driver.

pay_out = 50*len(result)
print(f"payout_is :${pay_out:,}")
payout_is :$800        

Here, I’ve presented Q1 as an example. For the complete set of questions and detailed analysis, check out the full file.

Here, I've shown Q1 as an example. To explore the full set of questions and detailed analysis, check out the full file via the link.

 https://colab.research.google.com/drive/1GEbGwkCQGPr0O9tjk4BpKMYucCPW6TZm?usp=sharing



Thank you so much, Mohammad Amin Bigdeli. You were the very first to like my post, and it honestly made me so happy 😊. Your little gesture truly brightened my day!

To view or add a comment, sign in

More articles by Bhawna Kaushik

Others also viewed

Explore content categories