Master SQL for Data Science and Business Analyst Interviews

Master SQL for Data Science and Business Analyst Interviews

Introduction

What do Meta, Google, Uber, Revolut, Amazon all have in common? They all have SQL rounds in Data Science and Business Analyst interviews. Mastering SQL is a crucial step on your journey. In this comprehensive guide, you will discover the SQL skills you need to succeed in these competitive fields.

Know the Most Common Functions

Filtering Your Data

WHERE

Filters rows based on a given condition.

SELECT *
FROM users
WHERE age > 25;        

Use-case: Select records of users who are older than 25.

AND & OR

Used to combine multiple conditions.

# Use both the conditions together
SELECT *
FROM users
WHERE age > 25
AND gender = 'male';

# At least one of the condition has to be true
SELECT *
FROM users
WHERE age > 25
OR age < 18;        

Use-case: Filter users based on multiple conditions like age and gender.

NOT

Negates a condition.

SELECT * FROM users WHERE NOT age > 25;        

Use-case: Find users who are 25 years old or younger.

Ranged Data Filtering

BETWEEN

Filters rows within a range.

SELECT * FROM users WHERE age BETWEEN 18 AND 25;        

Use-case: Find users whose ages fall between 18 and 25 (inclusive).

ANY & ALL

Used with subqueries to compare values.

SELECT *
FROM users
WHERE age = ANY
(SELECT age
FROM users
WHERE gender = 'female'); 

SELECT *
FROM users
WHERE age = ALL
(SELECT age
FROM users
WHERE gender = 'female');        

Use-case: Filter data based on conditions met in a subquery.

EXISTS

Checks for the existence of rows in a subquery.

SELECT *
FROM users
WHERE EXISTS
(SELECT *
FROM orders
WHERE users.id = orders.user_id);        

Use-case: Find users who have placed at least one order.

Grouping and Summarizing Your Data

GROUP BY

Groups rows that have the same values in specified columns.

SELECT
country,
COUNT(*)
FROM users
GROUP BY country;        

Use-case: Get a count of users from each country.

Aggregation Functions

AVG, COUNT, MAX, MIN, SUM

SELECT
country,
AVG(age),
COUNT(id),
MAX(height),
MIN(weight),
SUM(purchase_amount)

FROM transactions
GROUP BY country
HAVING COUNT(id) >= 100;        

Use-case: Generate summary statistics for transactions grouped by country.

HAVING

Filters groups based on a condition.

SELECT
country,
COUNT(*)
FROM users
GROUP BY country
HAVING COUNT(*) > 10;        

Use-case: Retrieve countries that have more than 10 users.

String Operations

SUBSTRING

Extracts part of a string.

SELECT
SUBSTRING('Hello, world!', 7, 5)
AS greeting;        

Use-case: Extract the word 'world' from the text 'Hello, world!'.

LOWER & UPPER

Converts a string to lowercase or uppercase.

SELECT
LOWER('HELLO, WORLD!') AS greeting;

SELECT UPPER('hello, world!') AS greeting;        

Use-case: Standardizing text data for case-insensitive comparisons.

More Advanced SQL Functions

Set Operations

1. Union / Union All

Combines the result sets of two or more queries into a single result set, eliminating duplicate rows. Union All includes duplicates.

SELECT name
FROM employees
WHERE department = 'HR'

UNION

SELECT name
FROM contractors
WHERE department = 'HR';        

Use-case: Retrieve all unique names (Union) or all names including duplicates (Union All) from both the employees and contractors tables where the department is 'HR'.

Data Coalescing

2. Coalesce

Returns the first non-NULL value in a list of expressions.

SELECT COALESCE(NULL, '', 'Hello', 'World');        

Use-case: Return the first non-NULL and non-empty string. In this example, it will return 'Hello'.

Date and Time Manipulation

3. CAST Function

SELECT CAST('2023-09-03' AS DATE);        

Use-case: Convert a string to a date type.Datetime and Date Format

Conditional Logic

4. Case When

Provides if-then-else logic to SQL.

SELECT name,
CASE WHEN age < 20 THEN 'Teenager'
WHEN age < 30 THEN 'Young Adult'
ELSE 'Adult'
END AS age_group
FROM users;        

Use-case: Categorize users into different age groups based on their age.

Advanced Querying Techniques

5. CTE and Subquery

Common Table Expressions (CTE) and Subqueries break down complex queries into simpler parts.

WITH age_data AS
( SELECT age,
COUNT(*)
FROM users
GROUP BY age )

SELECT age
FROM age_data
WHERE count > 1;        

Use-case: Both queries return ages that appear more than once in the users table. The first query uses a CTE for better readability and the second one uses a subquery.

Eliminating Duplicates

6. Distinct

Removes duplicate rows from the result set.

SELECT DISTINCT country FROM users;        

Use-case: Retrieve all unique countries from the users table.

What Interviewers Look For

  1. Accuracy: Your query should produce the correct output.
  2. Completion: Your code should be complete and executable.
  3. Explanation: Can you explain your query and the rationale behind it?
  4. Efficiency: Avoid suboptimal queries that could run slowly on large datasets. Compare and contrast different approaches when possible.
  5. Readability: Your code should be easy to understand. Include comments if necessary.

Front-Load Your SQL Prep

Many candidates who succeeded in interviews at companies like Meta, Google, and Uber put a significant emphasis on SQL from the beginning of their preparation. Aim to solve a minimum of 100 SQL problems to feel confident.

Final Thoughts

Mastering these SQL functions and knowing when to use them can significantly impact your problem-solving ability in data science and business analyst interviews. So, practice diligently to enhance your SQL skills.

Looking for a place to practice SQL? Check out this SQL practice pad 👉 InterviewQuery.

To view or add a comment, sign in

More articles by Federico Sciuca

Others also viewed

Explore content categories