SQL + Pandas: The Ultimate Data Duo

Bhawna Kaushik

Published Oct 15, 2025

Who knew SQL and Pandas could team up to make data analysis this fast, fun, and a little addictive?

PANDASQL - pandasql lets you run SQL queries directly on your Pandas dataframes—so you get the power of SQL without leaving Python!

SQL is useful for easily filtering rows, aggregating data, or applying multi-condition logic. Python, on the other hand, offers advanced tools for statistical analysis and custom computations, as well as set-based operations, which extend beyond SQL’s capabilities. When used together, SQL simplifies data selection, while Python adds analytical flexibility and versatility.

1. How to Use pandasql in Google Colab?

To run pandasql Inside Colab, start with the following code :

Step 1: Install pandasql

Run this in a Colab cell:

!pip install pandasql

Step 2: Import the necessary libraries

import pandas as pd
from pandasql import sqldf

# Create a helper function for running queries

pysqldf = lambda q: sqldf(q, globals())

# Example: Select all rows where Age > 30

query = "SELECT * FROM df WHERE Age > 30;"
result = pysqldf(query)
result

# Real-World Project: Analyzing Uber Driver Performance Data

In this project, we analyze Uber driver performance data to evaluate and optimise bonus strategies, aiming to improve efficiency and driver engagement.

// Data Exploration and Analytics

Now, let’s explore the datasets. First, we will load the data.

// Initial Dataset Loading

Let’s load the dataset by using just pandas.

import pandas as pd
import numpy as np
df = pd.read_csv('dataset_2.csv')

Recommended by LinkedIn

Pandas like an SQL

sakshyam ghimire 2 years ago

The Top 15 Functions of Pandas Library

Leonardo A. 3 years ago

Build "Visual Data Integration Checks" using Python…

Barani Dakshinamoorthy 5 years ago

// Exploring the Data

Now let’s review the dataset.

df.head(10)

The dataset includes columns such as driver name, number of trips completed, acceptance rate (percentage of trip requests accepted), total supply hours (hours spent online), and average rating. Before we dive into the analysis, it’s important to verify the column names, check the data types of each column, and identify any missing values. This ensures that we reference the data correctly and handle any data quality issues upfront, making our analysis accurate and reliable.

The dataset has five different columns, and there are no missing values. I have used both SQL and Python to answer the questions.

Question 1: Who Qualifies for Bonus Option 1?

$50 for each driver who is online at least 8 hours, accepts 90% of requests, completes 10 trips, and has a rating of 4.7 or better during the time frame.

// Step 1: Filtering the Qualifying Drivers with SQL (pandasql)

In this step, I'm using pandasql.

In the following code, I have selected all drivers who meet the conditions for the Option 1 bonus using the WHERE clause and the An An AND operator for linking multiple conditions.

import pandas as pd
from pandasql import sqldf

# Who qualifies for Bonus
query = """
SELECT Name
FROM df
WHERE Hours_Worked >= 60
AND Rating >= 4.05
"""

# Run the query
result = sqldf(query, locals())
print(result)

// Step 2: Finishing in Pandas

Using the following Python code, I have calculated the total payout by multiplying the number of qualified drivers (using len()) by the $50 bonus per driver.

pay_out = 50*len(result)
print(f"payout_is :${pay_out:,}")
payout_is :$800

Here, I’ve presented Q1 as an example. For the complete set of questions and detailed analysis, check out the full file.

Here, I've shown Q1 as an example. To explore the full set of questions and detailed analysis, check out the full file via the link.

https://colab.research.google.com/drive/1GEbGwkCQGPr0O9tjk4BpKMYucCPW6TZm?usp=sharing

Osheen Tandon 6mo

Very well articulated! ✨

1 Reaction

Bhawna Kaushik 6mo

Thank you so much, Mohammad Amin Bigdeli. You were the very first to like my post, and it honestly made me so happy 😊. Your little gesture truly brightened my day!

SQL + Pandas: The Ultimate Data Duo

Bhawna Kaushik

1. How to Use pandasql in Google Colab?

Step 1: Install pandasql

Step 2: Import the necessary libraries

# Real-World Project: Analyzing Uber Driver Performance Data

// Data Exploration and Analytics

// Initial Dataset Loading

Recommended by LinkedIn

// Exploring the Data

Question 1: Who Qualifies for Bonus Option 1?

// Step 1: Filtering the Qualifying Drivers with SQL (pandasql)

// Step 2: Finishing in Pandas

More articles by Bhawna Kaushik

Others also viewed

Which programming language should I pick in the data area?

🔍 Article #19: Working with Databases – The Heartbeat of Data Analytics

Introductory SQL learning article for you being a data analyst - Part Two

How to connect Google Data Studio with everything ?

How to make a complex SQL query (ADQL) that joins many tables?

Snowflake dataframes behind the scenes (snowpark and modin styles)

Using web-scraping to build an awesome data science project

Intro to pandas from SQL perspective

Inserting Pandas DataFrames Into Database using INSERT

Why Every Data Scientist Should Know Command Line Tools

Explore content categories

1. How to Use pandasql in Google Colab?

Step 1: Install pandasql

Step 2: Import the necessary libraries

# Real-World Project: Analyzing Uber Driver Performance Data

// Data Exploration and Analytics

// Initial Dataset Loading

Recommended by LinkedIn

// Exploring the Data

Question 1: Who Qualifies for Bonus Option 1?

// Step 1: Filtering the Qualifying Drivers with SQL (pandasql)

// Step 2: Finishing in Pandas

More articles by Bhawna Kaushik

Data Cleaning in MySQL — Practical Workflow

Lux & Pandas: Instant Charts for Lazy Data Analysts ( LUX - THE NEW SUPERHERO)

Others also viewed

Which programming language should I pick in the data area?

🔍 Article #19: Working with Databases – The Heartbeat of Data Analytics

Introductory SQL learning article for you being a data analyst - Part Two

How to connect Google Data Studio with everything ?

How to make a complex SQL query (ADQL) that joins many tables?

Snowflake dataframes behind the scenes (snowpark and modin styles)

Using web-scraping to build an awesome data science project

Intro to pandas from SQL perspective

Inserting Pandas DataFrames Into Database using INSERT

Why Every Data Scientist Should Know Command Line Tools

Explore content categories