[𝗦𝗤𝗟 𝗖𝗛𝗔𝗟𝗟𝗘𝗡𝗚𝗘 #6]: 𝗧𝗵𝗲 "𝗗𝗮𝘁𝗮 𝗕𝘂𝗰𝗸𝗲𝘁𝗲𝗲𝗿" Raw numbers are great, but for a Finance or Product team, a list of 10,000 transactions is just noise. To find patterns, we need to see the 𝗱𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻. Are most of our users spending small amounts, or are we driven by "whales"? Today’s challenge is about building a frequency histogram, a must-have skill for any analyst performing exploratory data analysis (EDA). 𝗧𝗵𝗲 𝗦𝗲𝘁𝘂𝗽 You have a transactions table. The CFO wants a high-level summary of transaction volume across specific price ranges (buckets). Your task is to categorize every transaction and count how many fall into each range. 𝗧𝗵𝗲 𝗦𝗰𝗵𝗲𝗺𝗮: CREATE TABLE transactions ( txn_id INT, amount NUMERIC(10,2) ); 𝗧𝗵𝗲 𝗠𝗶𝘀𝘀𝗶𝗼𝗻 Write a query that groups transaction amounts into the following four buckets: 1️⃣ 0-100 2️⃣ 101-500 3️⃣ 501-1000 4️⃣ 1000+ 𝗘𝘅𝗽𝗲𝗰𝘁𝗲𝗱 𝗢𝘂𝘁𝗽𝘂𝘁: | 𝗯𝘂𝗰𝗸𝗲𝘁 | 𝗰𝗼𝘂𝗻𝘁 | | 0-100 | 45 | | 101-500 | 123 | | 501-1000 | 67 | | 1000+ | 12 | 𝗧𝗵𝗲 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗦𝗲𝘀𝘀𝗶𝗼𝗻: There are a few ways to slice this. The most common is using a CASE WHEN statement, but some dialects have specialized functions like WIDTH_BUCKET or floor math tricks. How would you ensure the buckets appear in the correct order (numerical rather than alphabetical)? And how do you handle the upper boundaries to make sure no transaction is counted twice? Drop your code in the comments! Tell us which SQL engine you’re using and your favorite trick for bucketing data. Let’s see those solutions! #SQL #DataAnalysis #DataScience #DataEngineering #PostgreSQL #MySQL #BigQuery #CodingChallenge #Statistics #LearnSQL
Roshan Kumar Sharma’s Post
More Relevant Posts
-
I used to avoid SQL window functions like the plague. Whenever I needed to compare a single row to a total (like calculating a percentage), I’d write these massive, messy subqueries or self-joins. It worked, but it was a nightmare to read and even worse to debug. Then I finally learned OVER() — and my code got 10x cleaner overnight. 🚀 If you’re still using subqueries for basic comparisons, here is the breakdown using the "Classic Models" dataset that finally made it "click" for me: 🔍 The Problem: "The Revenue Contribution" Imagine you have a payments table. You want to see: What % of total company revenue does each individual check represent? The "Old" Way: You’d have to write a subquery just to get that grand total revenue figure before you could divide. It’s bulky and easy to break. The Window Function Way: SQL SELECT customerNumber, checkNumber, amount, ROUND(amount * 100.0 / SUM(amount) OVER (), 4) AS pct_of_total_revenue FROM payments ORDER BY amount DESC LIMIT 10; Clean. Readable. One query. No extra joins required. 💡 The Magic: PARTITION BY This is where the real power kicks in. SUM() OVER () → Total across the entire table (as seen above). SUM() OVER (PARTITION BY customerNumber) → Total per customer. That one keyword — PARTITION — is a game changer. It allows you to group data for calculations without losing the granular detail of each individual row. I’m currently documenting my full data analytics journey — covering SQL, Python, Power BI, and everything in between. If you’re leveling up your data skills too, let's connect and build this together. 🤝 💬 What’s one SQL concept that finally "clicked" and changed how you write queries? Let's swap tips in the comments! ⬇️ #SQL #DataAnalytics #DataAnalyst #WindowFunctions #LearningInPublic #DataScience #MySQL #DataTips #CareerTransition
To view or add a comment, sign in
-
-
🔷 SQL Cheat Sheet for Data Engineers & Analysts 🔷 Mastering SQL is a must-have skill for anyone in data — whether you're working in analytics, backend, or data engineering. I’ve created this simple SQL cheat sheet covering all the essential concepts in one place: ✔️ Basic Commands (SELECT, INSERT, UPDATE, DELETE) ✔️ Filtering & Sorting Data ✔️ Joins (INNER, LEFT, RIGHT, FULL, CROSS) ✔️ Aggregations & Grouping ✔️ Subqueries & Set Operations ✔️ Indexing & Transactions ✔️ Views, Triggers & CTEs ✔️ Window Functions (RANK, ROW_NUMBER, etc.) ✔️ Date & Time Functions ✔️ Conditional Logic 💡 Whether you're preparing for interviews or working on real-world data pipelines, this will help you revise quickly. Save it for later and share with someone who is learning SQL 🚀 #SQL #DataEngineering #MySQL #BigQuery #Database #Analytics #LearnSQL #TechLearning #DataAnalytics #DataEngineer #100DaysOfCode
To view or add a comment, sign in
-
-
SQL window functions changed how I think about data. Before I learned them, I was writing subqueries for everything. Clunky. Repetitive. Hard to read. Then I discovered window functions, and the same logic became cleaner, faster, and easier for anyone to follow. The one I kept reaching for: ROW_NUMBER() It assigns a unique rank to each row within a group. Simple idea. Powerful in practice. Real example: find the most recent order per customer. Without window functions: → Write a subquery to get max date per customer → Join it back to the original table → Hope nothing breaks With ROW_NUMBER(): → Partition by customer → Order by date descending → Filter where row = 1 Same result. Half the code. Much easier to explain to a colleague. I used this constantly when building SQL pipelines, pulling the latest record per entity from multi-source business data. It saved time and made my queries reviewable. If you're writing SQL regularly and haven't touched window functions yet, ROW_NUMBER() is where I'd start. Small function. Big shift in how you think. Which SQL concept clicked everything into place for you? Drop it below 👇 #SQL #DataAnalytics #DataScience #LearningInPublic
To view or add a comment, sign in
-
-
💬 SQL Challenge of the Day Problem: You have a table named "sales_data" that contains the following columns: - order_id (unique identifier for each order) - order_date (date of the order) - product_id (unique identifier for each product) - quantity (number of units sold for the product in the order) - revenue (revenue generated by the product in the order) Write a SQL query to calculate the cumulative revenue for each product over time, ordered by the order_date in ascending order. Query: ```sql SELECT order_date, product_id, SUM(revenue) OVER (PARTITION BY product_id ORDER BY order_date) AS cumulative_revenue FROM sales_data ORDER BY product_id, order_date; ``` Answer: The SQL query to calculate the cumulative revenue for each product over time is provided below. Explanation: In this query, we use a window function with the SUM() function to calculate the cumulative revenue for each product. The PARTITION BY clause partitions the data by product_id, and the ORDER BY clause orders the data by order_date. This allows us to calculate the running total of revenue for each product. Example: Consider the "sales_data" table: order_id | order_date | product_id | quantity | revenue 1 | 2022-01-01 | A | 2 | 100 2 | 2022-01-02 | A | 1 | 50 3 | 2022-01-01 | B | 3 | 150 4 | 2022-01-03 | A | 2 | 120 The output of the query would be: order_date | product_id | cumulative_revenue 2022-01-01 | A | 100 2022-01-02 | A | 150 2022-01-01 | B | 150 2022-01-03 | A | 270 #Hashtags #PowerBIChallenge #PowerInterview #LearnPowerBi #LearnSQL #TechJobs #DataAnalytics #DataScience #BigData #DataAnalyst #MachineLearning #Python #SQL #Tableau #DataVisualization #DataEngineering #ArtificialIntelligence #CloudComputing #BusinessIntelligence #Data
To view or add a comment, sign in
-
💬 SQL Challenge of the Day Problem: Given a table "sales_data" with the following columns: order_id, customer_id, order_date, and revenue. Write a SQL query to calculate the total revenue for each customer up to the current order date, including the current order. Query: ```sql SELECT order_id, customer_id, order_date, SUM(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS total_revenue FROM sales_data; ``` Answer: The SQL query calculates the total revenue for each customer up to the current order date, including the current order, using a window function. Explanation: The query uses the SUM() window function along with the PARTITION BY clause to partition the data by customer_id and ORDER BY clause to order the data by order_date. This allows us to calculate the running total revenue for each customer. Example: Consider the "sales_data" table: order_id | customer_id | order_date | revenue 1 | 101 | 2022-01-01 | 100 2 | 101 | 2022-01-03 | 150 3 | 102 | 2022-01-02 | 200 The query would output: order_id | customer_id | order_date | total_revenue 1 | 101 | 2022-01-01 | 100 2 | 101 | 2022-01-03 | 250 3 | 102 | 2022-01-02 | 200 #Hashtags #PowerBIChallenge #PowerInterview #LearnPowerBi #LearnSQL #TechJobs #DataAnalytics #DataScience #BigData #DataAnalyst #MachineLearning #Python #SQL #Tableau #DataVisualization #DataEngineering #ArtificialIntelligence #CloudComputing #BusinessIntelligence #Data
To view or add a comment, sign in
-
𝐇𝐢𝐝𝐝𝐞𝐧 𝐒𝐐𝐋 𝐠𝐞𝐦𝐬 We’ve all heard it:"𝑮𝒂𝒓𝒃𝒂𝒈𝒆 𝑰𝒏, 𝑮𝒂𝒓𝒃𝒂𝒈𝒆 𝑶𝒖𝒕." You can build the most stunning Power BI report in the world, but if your underlying SQL queries are messy, your insights will be too. As a Data Analyst, I’ve found that the best place to fix "dirty" data is at the source. While SELECT * gets the job done, mastering these 5 Hidden SQL Gems is what separates a basic query from a production-ready data pipeline: 1️⃣ COALESCE: Stop fearing NULL values. This is my go-to for replacing missing data with defaults (like 'N/A' or 0) in a single line. It keeps your calculations from breaking downstream. 2️⃣ Window Functions: Ever tried to calculate a running total or a rank using just GROUP BY? It’s a headache. RANK(), LEAD(), and LAG() are absolute magic for row-level analysis without complex self-joins. 3️⃣ CTEs (Common Table Expressions): If your query has five nested subqueries, it’s time for a CTE. It makes your code modular, readable, and—most importantly—easy for your teammates to debug. 4️⃣ TRIM & REGEX: Raw text is rarely perfect. Using TRIM to kill leading spaces or REGEXP_REPLACE for surgical text cleaning saves hours of manual cleanup in Excel later. 5️⃣ Temp Tables: When the logic gets heavy, I use Temp Tables as my "sandbox." It lets you break down complex transformations into manageable steps without altering the master database. 𝐓𝐡𝐞 𝐠𝐨𝐚𝐥 𝐢𝐬𝐧'𝐭 𝐣𝐮𝐬𝐭 𝐭𝐨 𝐰𝐫𝐢𝐭𝐞 𝐜𝐨𝐝𝐞 𝐭𝐡𝐚𝐭 𝐰𝐨𝐫𝐤𝐬; 𝐢𝐭'𝐬 𝐭𝐨 𝐰𝐫𝐢𝐭𝐞 𝐜𝐨𝐝𝐞 𝐭𝐡𝐚𝐭'𝐬 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐚𝐧𝐝 𝐦𝐚𝐢𝐧𝐭𝐚𝐢𝐧𝐚𝐛𝐥𝐞. I’m curious—what’s the one SQL function that saved your life during a massive data migration or cleanup project? Let’s swap some "Query Secrets" in the comments! 👇 #SQL #DataAnalytics #Database #DataCleaning #BusinessIntelligence #DataEngineering #CodingTips #DataAnalystLife #DataAnalyst #Tips
To view or add a comment, sign in
-
-
One of the biggest hurdles in moving from Basic to Intermediate SQL is mastering the Subquery. As I’ve been diving deeper into SQL for Data Engineering, I’ve realized that a subquery isn’t just "code inside a code"—it’s a completely independent mini-task that SQL runs to give your main query the data it needs. Here is the breakdown of how I’m using them to build cleaner logic: 🔹 Subqueries in WHERE (The Filter): Used to filter rows based on a value you don't know yet—like finding employees who earn more than the AVG salary. 🔹 Subqueries in SELECT (The Calculation): Acts like a "lookup" tool. Great for pulling in a specific count or calculation from another table for every row in your result set. 🔹 Subqueries in FROM (The Temporary Table): This is where the real Data Engineering magic happens. You can pre-aggregate or "clean" data in a subquery before treating it as a source for your main query. The Golden Rule: A subquery is an island. 🏝️ It doesn't know what the outer query is doing unless you specifically link them (Correlated Subqueries). To get an "apples-to-apples" comparison, you have to be explicit with your filters in both places! Mastering these "queries within queries" is a game-changer for keeping data accurate and preventing "fan-out" issues during complex joins. Next up on my roadmap: Common Table Expressions (CTEs) and Window Functions. 🚀 What’s your preference for complex logic: Nested Subqueries or CTEs? Let's discuss below! 👇 #SQL #DataEngineering #Database #LearningJourney #DataAnalytics #DataArchitecture #CodingTips
To view or add a comment, sign in
-
🔥 SQL That Actually Makes You Stand Out If you’re learning SQL, don’t just memorize syntax - understand how things work in real scenarios MUST-KNOW SQL TERMINOLOGIES: SELECT → Pick the columns you need FROM → Choose your data source WHERE → Filter rows before grouping GROUP BY → Aggregate data into groups HAVING → Filter after aggregation ORDER BY → Sort your results LIMIT → Control how much data you return 🔗 JOINS (Most Asked in Interviews): JOIN → Combine tables INNER JOIN → Only matching records LEFT JOIN → All from left + matches RIGHT JOIN → All from right + matches FULL JOIN → Everything from both sides DATA COMBINATION: UNION → Merge & remove duplicates UNION ALL → Merge & keep duplicates ADVANCED (Game-Changers): CASE WHEN → Add logic inside SQL WINDOW FUNCTIONS → Analyze across rows ROW_NUMBER() → Unique row ranking RANK() vs DENSE_RANK() → Handle ties smartly CTE (WITH) → Clean, readable queries SUBQUERY → Query inside a query Pro Tip: Knowing when to use these > just knowing what they are. Save this. Revisit it. Practice it. Consistency is what turns basics into real skill. #SQL #DataEngineering #DataAnalytics #LearnSQL #TechSkills #CareerGrowth #DataScience #Programming #Analytics #BusinessAnalytics #DataAnalyst #DataDriven #DataVisualization #BigData #ETL #DataWarehouse #Dashboarding #PowerBI #Tableau #PythonForData #AnalyticsEngineering #DataSkills
To view or add a comment, sign in
-
📌 Advanced SQL Cheat Sheet (For Real Projects & Interviews) If basic SQL is done, this is what actually matters 👇 🔹 Execution Order (Most Important) FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT 👉 That’s why you can’t use window functions in WHERE. 🔹 COUNT Differences COUNT(*) → counts all rows COUNT(column) → ignores NULLs 🔹 Handling Duplicates GROUP BY + HAVING → identify ROW_NUMBER() → remove duplicates safely 🔹 Window Functions (Game Changer) Used when GROUP BY is not enough Examples: • Top N per group • Running totals • Ranking 🔹 ROW_NUMBER vs RANK vs DENSE_RANK ROW_NUMBER → unique rank RANK → skips numbers DENSE_RANK → no gaps 🔹 JOIN Mistakes (Very Common) 👉 Wrong joins = wrong data Always check: • Row count before & after join • Duplicate keys • Join condition 🔹 Subquery vs CTE CTE → readable & reusable Subquery → quick but messy in complex logic 🔹 EXISTS vs IN EXISTS → faster for large data IN → okay for small datasets 🔹 CASE vs WHERE CASE → for transformation WHERE → for filtering 🔹 NULL Logic (Tricky) = NULL ❌ (won’t work) IS NULL ✔ 🔹 Performance Tips • Avoid SELECT * • Use indexes wisely • Filter early (WHERE) • Use proper joins 🔹 Real Analyst Thinking Before writing SQL, ask: • What is the business question? • What defines this metric? • Is my data clean? 💡 SQL is not about writing long queries. It’s about writing correct and efficient logic. 🎯 Save this if you're preparing for real-world SQL, not just basics. #SQL #AdvancedSQL #DataAnalytics #DataAnalyst #SQLTips #DataEngineering #BusinessIntelligence #Analytics #LearnSQL #TechCareers
To view or add a comment, sign in
-
Most people think they know SQL… until real-world queries hit 😅 Joins, aggregations, nested queries — that’s where things get tricky. I recently came across a powerful SQL guide that simplifies everything from basics to advanced concepts. Think of it as your go-to cheat sheet: ✔️ Master the Core: SELECT, INSERT, UPDATE, DELETE ✔️ Filtering like a pro with WHERE & logical operators ✔️ Clear understanding of INNER, LEFT & RIGHT JOINs ✔️ Handle data effortlessly using AVG(), COUNT(), SUM() Whether you're a beginner or aiming to level up as a Data Analyst or Developer, this can seriously boost your SQL game. If you want it: 1️⃣ Like this post 2️⃣ Comment “SQL” 3️⃣ Follow for more valuable tech resources 🔔 Let’s build strong data skills together 🚀 #SQL #DataScience #Coding #WebDevelopment #Database #CareerGrowth
To view or add a comment, sign in
More from this author
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development