SQL Cheat Sheet for Data Engineers & Analysts

🔷 SQL Cheat Sheet for Data Engineers & Analysts 🔷 Mastering SQL is a must-have skill for anyone in data — whether you're working in analytics, backend, or data engineering. I’ve created this simple SQL cheat sheet covering all the essential concepts in one place: ✔️ Basic Commands (SELECT, INSERT, UPDATE, DELETE) ✔️ Filtering & Sorting Data ✔️ Joins (INNER, LEFT, RIGHT, FULL, CROSS) ✔️ Aggregations & Grouping ✔️ Subqueries & Set Operations ✔️ Indexing & Transactions ✔️ Views, Triggers & CTEs ✔️ Window Functions (RANK, ROW_NUMBER, etc.) ✔️ Date & Time Functions ✔️ Conditional Logic 💡 Whether you're preparing for interviews or working on real-world data pipelines, this will help you revise quickly. Save it for later and share with someone who is learning SQL 🚀 #SQL #DataEngineering #MySQL #BigQuery #Database #Analytics #LearnSQL #TechLearning #DataAnalytics #DataEngineer #100DaysOfCode

2 Comments

Priyanka Magar 3d

Great cheat sheet . clean, concise, and very useful for both beginners and experienced data professionals. Definitely a handy reference for day-to-day SQL work

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Ajay Kadiyala
2w
Report this post
Most people think SQL is just about writing queries. But real difference comes from 𝗸𝗻𝗼𝘄𝗶𝗻𝗴 𝘁𝗵𝗲 𝗿𝗶𝗴𝗵𝘁 𝗽𝗮𝘁𝘁𝗲𝗿𝗻 𝗮𝘁 𝘁𝗵𝗲 𝗿𝗶𝗴𝗵𝘁 𝘁𝗶𝗺𝗲. Over the years, I’ve seen one thing very clearly: The better your SQL patterns are, the better your thinking becomes as a Data Engineer. Whether you are building pipelines, debugging data issues, optimizing reports, or preparing for interviews, some SQL concepts come up again and again. That’s why I put together this quick visual on: Top 10 SQL Patterns Every Data Engineer Must Know It covers patterns like: **Joins, CTEs, Window Functions, Aggregations, Subqueries, CASE WHEN, Ranking Functions, Running Totals, Deduplication, and Date-based Analysis** These are practical patterns we use in real projects when working with messy data, business logic, reporting needs, and performance challenges. If your SQL foundation is strong, your data engineering work becomes much easier and much cleaner. A lot of people keep learning tools. But many times, better SQL itself can solve the problem faster. Which SQL pattern do you use the most in your day-to-day work? For me, CTEs and Window Functions are absolute game changers. Download Data Engineering 𝗦𝗤𝗟 𝗞𝗜𝗧 here: https://lnkd.in/g_V8gDg3? Join My Telegram Channel here: https://lnkd.in/g88ic2Ja #SQL #DataEngineering #DataEngineer #Analytics #ETL #BigData #Database #TechCareers #DataAnalytics #LearnSQL
1 Comment
Like Comment
To view or add a comment, sign in
Gautam Kumar
1w
Report this post
SQL looks scary until you realize most real-world queries run on a handful of core concepts. Master these 20 SQL concepts and you’ll already be ahead of many aspiring data analysts/devs: ✅ SELECT ✅ WHERE ✅ JOIN ✅ GROUP BY ✅ ORDER BY ✅ Subqueries ✅ HAVING ✅ INSERT / UPDATE / DELETE …and more. Don’t try to learn everything in one day — build queries, break them, debug them, repeat. That’s how SQL actually sticks 🚀 Which SQL concept took you the longest to understand? For me, JOINs and Subqueries were the real boss fights 😅 ♻Follow Gautam Kumar for more data & interview insights #SQL #DataAnalytics #DataEngineering #Database #LearningSQL #SQLQueries #TechSkills #Programming #CareerGrowth #DataAnalyst #SoftwareEngineering #BeginnersGuide
Like Comment
To view or add a comment, sign in
Vishnu Vardhan
1w
Report this post
SQL looks scary until you realize most real-world queries run on a handful of core concepts. Master these 20 SQL concepts and you’ll already be ahead of many aspiring data analysts/devs: ✅ SELECT ✅ WHERE ✅ JOIN ✅ GROUP BY ✅ ORDER BY ✅ Subqueries ✅ HAVING ✅ INSERT / UPDATE / DELETE and more. Don’t try to learn everything in one day — build queries, break them, debug them, repeat. That’s how SQL actually sticks. 🚀 Which SQL concept took you the longest to understand? For me, JOINs and Subqueries were the real boss fights 😅 Credits: Sumit Gupta Thanks for this 💯 #SQL #DataAnalytics #DataEngineering #Database #LearningSQL #SQLQueries #TechSkills #Programming #CareerGrowth #DataAnalyst #SoftwareEngineering #BeginnersGuide
55 Comments
Like Comment
To view or add a comment, sign in
Shrimathi Shankar
1w Edited
Report this post
Great visual summarizing core SQL concepts—from SELECT and JOIN to indexing and keys. In real-world data engineering, understanding when and how to use these effectively makes a huge difference in performance and data quality. Currently revisiting these fundamentals while working on dbt models—always good to strengthen the basics.
Vishnu Vardhan
1w

SQL looks scary until you realize most real-world queries run on a handful of core concepts. Master these 20 SQL concepts and you’ll already be ahead of many aspiring data analysts/devs: ✅ SELECT ✅ WHERE ✅ JOIN ✅ GROUP BY ✅ ORDER BY ✅ Subqueries ✅ HAVING ✅ INSERT / UPDATE / DELETE and more. Don’t try to learn everything in one day — build queries, break them, debug them, repeat. That’s how SQL actually sticks. 🚀 Which SQL concept took you the longest to understand? For me, JOINs and Subqueries were the real boss fights 😅 Credits: Sumit Gupta Thanks for this 💯 #SQL #DataAnalytics #DataEngineering #Database #LearningSQL #SQLQueries #TechSkills #Programming #CareerGrowth #DataAnalyst #SoftwareEngineering #BeginnersGuide
Like Comment
To view or add a comment, sign in
Dr. Cecelia Allison (Educator / Author)
4d
Report this post
So true. I focus heavily on these core concepts in the SQL courses that I teach and offer. #SQL #AI #DataAnalytics #CareerSkills #statistics #research #dataanalytics #dataanalysis #career #careeradvice #sqlserver #jobsearch #programing #codingcommunity #datamanagement #tech #newproje
Vishnu Vardhan
1w

SQL looks scary until you realize most real-world queries run on a handful of core concepts. Master these 20 SQL concepts and you’ll already be ahead of many aspiring data analysts/devs: ✅ SELECT ✅ WHERE ✅ JOIN ✅ GROUP BY ✅ ORDER BY ✅ Subqueries ✅ HAVING ✅ INSERT / UPDATE / DELETE and more. Don’t try to learn everything in one day — build queries, break them, debug them, repeat. That’s how SQL actually sticks. 🚀 Which SQL concept took you the longest to understand? For me, JOINs and Subqueries were the real boss fights 😅 Credits: Sumit Gupta Thanks for this 💯 #SQL #DataAnalytics #DataEngineering #Database #LearningSQL #SQLQueries #TechSkills #Programming #CareerGrowth #DataAnalyst #SoftwareEngineering #BeginnersGuide
1 Comment
Like Comment
To view or add a comment, sign in
Venkatesh Gunasekaran
4w
Report this post
💬 SQL Challenge of the Day Problem: Given a table "sales_data" with the following columns: order_id, customer_id, order_date, and revenue. Write a SQL query to calculate the total revenue for each customer up to the current order date, including the current order. Query: ```sql SELECT order_id, customer_id, order_date, SUM(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS total_revenue FROM sales_data; ``` Answer: The SQL query calculates the total revenue for each customer up to the current order date, including the current order, using a window function. Explanation: The query uses the SUM() window function along with the PARTITION BY clause to partition the data by customer_id and ORDER BY clause to order the data by order_date. This allows us to calculate the running total revenue for each customer. Example: Consider the "sales_data" table: order_id | customer_id | order_date | revenue 1 | 101 | 2022-01-01 | 100 2 | 101 | 2022-01-03 | 150 3 | 102 | 2022-01-02 | 200 The query would output: order_id | customer_id | order_date | total_revenue 1 | 101 | 2022-01-01 | 100 2 | 101 | 2022-01-03 | 250 3 | 102 | 2022-01-02 | 200 #Hashtags #PowerBIChallenge #PowerInterview #LearnPowerBi #LearnSQL #TechJobs #DataAnalytics #DataScience #BigData #DataAnalyst #MachineLearning #Python #SQL #Tableau #DataVisualization #DataEngineering #ArtificialIntelligence #CloudComputing #BusinessIntelligence #Data
Like Comment
To view or add a comment, sign in
Roshan Kumar Sharma
3w
Report this post
[𝗦𝗤𝗟 𝗖𝗛𝗔𝗟𝗟𝗘𝗡𝗚𝗘 #6]: 𝗧𝗵𝗲 "𝗗𝗮𝘁𝗮 𝗕𝘂𝗰𝗸𝗲𝘁𝗲𝗲𝗿" Raw numbers are great, but for a Finance or Product team, a list of 10,000 transactions is just noise. To find patterns, we need to see the 𝗱𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻. Are most of our users spending small amounts, or are we driven by "whales"? Today’s challenge is about building a frequency histogram, a must-have skill for any analyst performing exploratory data analysis (EDA). 𝗧𝗵𝗲 𝗦𝗲𝘁𝘂𝗽 You have a transactions table. The CFO wants a high-level summary of transaction volume across specific price ranges (buckets). Your task is to categorize every transaction and count how many fall into each range. 𝗧𝗵𝗲 𝗦𝗰𝗵𝗲𝗺𝗮: CREATE TABLE transactions ( txn_id INT, amount NUMERIC(10,2) ); 𝗧𝗵𝗲 𝗠𝗶𝘀𝘀𝗶𝗼𝗻 Write a query that groups transaction amounts into the following four buckets: 1️⃣ 0-100 2️⃣ 101-500 3️⃣ 501-1000 4️⃣ 1000+ 𝗘𝘅𝗽𝗲𝗰𝘁𝗲𝗱 𝗢𝘂𝘁𝗽𝘂𝘁: | 𝗯𝘂𝗰𝗸𝗲𝘁 | 𝗰𝗼𝘂𝗻𝘁 | | 0-100 | 45 | | 101-500 | 123 | | 501-1000 | 67 | | 1000+ | 12 | 𝗧𝗵𝗲 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗦𝗲𝘀𝘀𝗶𝗼𝗻: There are a few ways to slice this. The most common is using a CASE WHEN statement, but some dialects have specialized functions like WIDTH_BUCKET or floor math tricks. How would you ensure the buckets appear in the correct order (numerical rather than alphabetical)? And how do you handle the upper boundaries to make sure no transaction is counted twice? Drop your code in the comments! Tell us which SQL engine you’re using and your favorite trick for bucketing data. Let’s see those solutions! #SQL #DataAnalysis #DataScience #DataEngineering #PostgreSQL #MySQL #BigQuery #CodingChallenge #Statistics #LearnSQL
Like Comment
To view or add a comment, sign in
Divyanshi Garg
1w
Report this post
I reviewed 200 SQL submissions from data engineering candidates last year. 90% had the same problem — and it wasn't wrong answers. They were writing SQL to get results. Senior engineers write SQL their teammates can debug at 3am during an incident. That's the gap nobody talks about. These are the 7 patterns that make the difference: 𝟬𝟭 — 𝗪𝗶𝗻𝗱𝗼𝘄 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 — stop writing subqueries that run once per row. SUM() OVER (PARTITION BY...) does it in one scan. 𝟬𝟮 — 𝗟𝗔𝗚 / 𝗟𝗘𝗔𝗗 — stop self-joining tables to compare rows. Two lines of window syntax replaces 12 lines of JOIN logic. 𝟬𝟯 — 𝗚𝗮𝗽𝘀 & 𝗜𝘀𝗹𝗮𝗻𝗱𝘀 — date minus ROW_NUMBER creates a constant for consecutive dates. This one pattern solves 80% of streak problems. 𝟬𝟰 — 𝗖𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗮𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗶𝗼𝗻 — COUNT(DISTINCT CASE WHEN channel='paid' THEN user_id END) gives you a full pivot in one scan, zero PIVOT syntax. 𝟬𝟱 — 𝗦𝗺𝗮𝗿𝘁 𝗱𝗲𝗱𝘂𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 — never SELECT DISTINCT in production. ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) encodes your business rule. 𝟬𝟲 — 𝗥𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲 𝗖𝗧𝗘 — org trees, hierarchies, graph traversal. Always add WHERE depth < N. Without it, cyclic data crashes your job every time. 𝟬𝟳 — 𝗦𝗲𝘀𝘀𝗶𝗼𝗻𝗶𝘀𝗮𝘁𝗶𝗼𝗻 — LAG detects the inactivity gap. Cumulative SUM assigns the session ID. Two window functions. One scan. No self-join. The real insight: Every one of these replaces a slow, hard-to-read subquery or self-join with a single readable window function. 𝗧𝗵𝗮𝘁 𝗶𝘀 𝘄𝗵𝗮𝘁 𝘀𝗲𝗻𝗶𝗼𝗿𝘀 𝗿𝗲𝘃𝗶𝗲𝘄 𝗳𝗼𝗿. 𝗡𝗼𝘁 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗻𝗲𝘀𝘀. 𝗥𝗲𝗮𝗱𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲. Save this image before your next SQL interview or code review. Which of these 7 do you still reach for last — and which one completely changed how you write SQL? Drop it in the comments 👇 #DataEngineering #SQL #DataEngineer #WindowFunctions #SQLInterview
7 Comments
Like Comment
To view or add a comment, sign in
Venkatesh Gunasekaran
2w
Report this post
💬 SQL Challenge of the Day Problem: You have a table named "sales_data" that contains the following columns: - order_id (unique identifier for each order) - order_date (date of the order) - product_id (unique identifier for each product) - quantity (number of units sold for the product in the order) - revenue (revenue generated by the product in the order) Write a SQL query to calculate the cumulative revenue for each product over time, ordered by the order_date in ascending order. Query: ```sql SELECT order_date, product_id, SUM(revenue) OVER (PARTITION BY product_id ORDER BY order_date) AS cumulative_revenue FROM sales_data ORDER BY product_id, order_date; ``` Answer: The SQL query to calculate the cumulative revenue for each product over time is provided below. Explanation: In this query, we use a window function with the SUM() function to calculate the cumulative revenue for each product. The PARTITION BY clause partitions the data by product_id, and the ORDER BY clause orders the data by order_date. This allows us to calculate the running total of revenue for each product. Example: Consider the "sales_data" table: order_id | order_date | product_id | quantity | revenue 1 | 2022-01-01 | A | 2 | 100 2 | 2022-01-02 | A | 1 | 50 3 | 2022-01-01 | B | 3 | 150 4 | 2022-01-03 | A | 2 | 120 The output of the query would be: order_date | product_id | cumulative_revenue 2022-01-01 | A | 100 2022-01-02 | A | 150 2022-01-01 | B | 150 2022-01-03 | A | 270 #Hashtags #PowerBIChallenge #PowerInterview #LearnPowerBi #LearnSQL #TechJobs #DataAnalytics #DataScience #BigData #DataAnalyst #MachineLearning #Python #SQL #Tableau #DataVisualization #DataEngineering #ArtificialIntelligence #CloudComputing #BusinessIntelligence #Data
Like Comment
To view or add a comment, sign in
SAMRAT ASHOK CHAKKARAWARTHY A K
1w
Report this post
People think SQL problems are new ❌ They’re not 🔸They’re the same real-world patterns repeating again and again 🔁 --- Here’s the twist 👇 🔸The same logic works in SQL and Spark DataFrame API ⚡ --- 🔷 Duplicate records 🔁 Same student marked twice → SQL: GROUP BY + HAVING → Spark: groupBy + count + filter --- 🔷 Second highest salary 🥈 Runner-up in a race → SQL: subquery / window → Spark: dense_rank() --- 🔷 Top 3 salaries 🏆 Top performers → SQL: ORDER BY + LIMIT → Spark: orderBy + limit --- 🔷 Revenue per product 💰 Which item earns most → SQL: SUM + GROUP BY → Spark: groupBy + agg --- 🔷 No department ❌ Missing relationships → SQL: LEFT JOIN + NULL → Spark: left join + isNull --- 🔷 Loyal customers 🤝 Never returned items → SQL: NOT IN / NOT EXISTS → Spark: left anti join --- 🔷 Orders per customer 📊 Visit frequency → SQL: COUNT → Spark: groupBy + count --- 🔷 Joined in 2023 📅 New employees → SQL: EXTRACT(YEAR) → Spark: year() --- 🔷 Avg order value 📈 Spending behavior → SQL: AVG → Spark: avg() --- 🔷 Latest order 🕒 Last interaction → SQL: MAX(date) → Spark: max() --- Same logic Two implementations --- The real skill? 🧠 🔸Not SQL 🔸Not Spark 🔹Understanding patterns once and applying everywhere 🚀 --- That’s how you move from writing queries to building scalable data systems 🔥 #dataengineering #sql #pyspark #bigdata #datapipelines #learningjourney #careergrowth

1 Comment
Like Comment
To view or add a comment, sign in

2,699 followers

50 Posts

View Profile Connect

SQL Cheat Sheet for Data Engineers & Analysts

More Relevant Posts

Explore content categories