SQL for Data Projects: Key Commands You Need
WSDA News | February 07, 2025
Structured Query Language (SQL) is an indispensable tool for data scientists and analysts. From retrieving data to performing complex aggregations, SQL helps unlock insights from raw data efficiently. If you're in data science, mastering a core set of SQL statements can save you time and improve the accuracy of your analysis.
In this guide, we’ll explore essential SQL queries that cover a wide range of data tasks. These queries form the foundation of most data projects, whether you're performing basic queries or working with large-scale data transformations.
1. Selecting Data (SELECT)
This is the most fundamental SQL statement, allowing you to fetch data from a table.
SELECT
*
FROM
customers;
What It Does: Retrieves all columns from the customers table. For more precise queries, specify the columns you want.
2. Filtering Rows (WHERE)
To narrow down your results, use the WHERE clause.
SELECT
name,
age
FROM
customers
WHERE
age > 30;
What It Does: Filters the dataset to include only customers older than 30. You can combine multiple conditions using AND or OR.
3. Sorting Results (ORDER BY)
Organizing data by specific columns helps when reviewing large datasets.
SELECT
name,
age
FROM
customers
ORDER BY
age DESC;
What It Does: Sorts customers by age in descending order. Use ASC for ascending order.
4. Grouping and Aggregation (GROUP BY)
You can summarize data with aggregate functions like COUNT(), SUM(), and AVG().
SELECT
city,
COUNT(*) AS customer_count
FROM
customers
GROUP BY
city;
What It Does: Counts the number of customers in each city.
5. Joining Tables (JOIN)
Combining data from multiple tables is crucial in data science projects.
SELECT
customers.name,
orders.order_date
FROM
customers
JOIN
orders ON customers.customer_id = orders.customer_id;
What It Does: Retrieves customer names along with their order dates by linking the customers and orders tables.
6. Handling Duplicates (DISTINCT)
Remove duplicates from your results with the DISTINCT keyword.
SELECT
DISTINCT city
FROM
customers;
What It Does: Returns a list of unique cities where customers are located.
7. Limiting Results (LIMIT)
Use LIMIT to restrict the number of rows returned by your query.
SELECT
*
FROM
customers
LIMIT
10;
What It Does: Fetches only the first 10 rows from the customers table.
Recommended by LinkedIn
8. Calculating Averages (AVG)
You can calculate summary statistics using aggregate functions.
SELECT
AVG(salary) AS average_salary
FROM
employees;
What It Does: Returns the average salary of all employees.
9. Finding Null Values (IS NULL)
Data cleaning often involves identifying missing values.
SELECT
*
FROM
orders
WHERE
order_date IS NULL;
What It Does: Finds all records where the order_date is missing.
10. Creating New Columns (CASE Statement)
Create conditional logic within your query using CASE.
SELECT
name,
CASE
WHEN age < 18 THEN 'Minor'
ELSE 'Adult'
END AS age_group
FROM
customers;
What It Does: Categorizes customers as either "Minor" or "Adult" based on their age.
11. Calculating Running Totals (Window Functions)
Window functions allow cumulative calculations across rows.
SELECT
order_id,
SUM(order_amount) OVER (ORDER BY order_date) AS running_total
FROM
orders;
What It Does: Calculates a running total of order amounts based on the order date.
12. Union Queries (UNION)
Use UNION to combine the results of multiple queries.
SELECT
name
FROM
customers_in_europe
UNION
SELECT
name
FROM
customers_in_america;
What It Does: Returns a combined list of customer names from both Europe and America, removing duplicates.
13. Deleting Data (DELETE)
If you need to remove specific records, use the DELETE statement carefully.
DELETE FROM
customers
WHERE
last_order_date < '2022-01-01';
What It Does: Deletes all customers who haven’t placed an order since January 1, 2022.
Tips for Writing Efficient SQL Queries
Conclusion
SQL is a powerful tool for data professionals, offering a wide array of commands to manipulate and analyze data. Mastering these core queries will enhance your productivity and help you tackle diverse data science challenges with confidence. Remember, consistent practice is key to becoming proficient.
Data No Doubt! Check out WSDALearning.ai and start learning Data Analytics and Data Science Today!