I spent two hours debugging a query yesterday, and the culprit was a single character. Someone had written COUNT(column_name) instead of COUNT(*). If you work with SQL, this is one of those silent data killers that can quietly derail your entire reporting structure🙁 COUNT(*) counts every single row in your dataset, period. COUNT(column_name) counts only the rows where that specific column is NOT NULL. It entirely ignores the blanks. Because the column in question was an optional field, their "Total Customer Count" wasn't actually the total. It was just the count of "customers who happened to fill out that optional field." Nobody noticed this for three months. The dashboard never threw an error. The line chart still went up and to the right. Everything looked perfectly fine. But behind the scenes, half the company was making strategic business decisions based on an accidentally filtered dataset. It’s a brutal reminder of two crucial rules in data: Know your aggregates: Always double-check how your SQL functions handle NULL values. They will not warn you when they drop data. Up doesn't mean accurate: Don't blindly trust a dashboard just because the trends look positive. Anyway, I pushed the fix. Of course, now the baseline numbers look worse!! #DataAnalytics #SQL #DataEngineering #Analytics #DataQuality #TechTruths
Nehal Varshney’s Post
More Relevant Posts
-
You don't have time to find NULLs column by column. Here's a simpler way: I think one of the biggest pains of a data analyst who cares about data quality is having to check for nulls manually. Especially when datasets are large and deadlines, tight. A simpler way to identify NULL values in the data set is using COUNT(*) - COUNT (column). Count(*): Counts all rows (null or not) Count(column): Counts only non-NULL values You subtract one from the other, and there you go: You have the number of NULLs in each column. Once you have the NULLs mapped, you can proceed to treat the null values according to your business rules with COALESCE(). ⚠️IMPORTANT: In this example, I used averages to fill missing values in delivery_days, but this is not a standard rule. Sometimes using averages makes sense (e.g., when the value exists but wasn’t recorded). Other times, it can distort your analysis. Always choose your approach based on the context. What other tricks you use to make your EDA/Data checks faster? Leave it in the comments👇 📌Found it useful? Save it for later. #SQLTips #DataAnalytics #DataScience #SQL #Analytics #BusinessIntelligence #DataEngineer #LearnSQL
To view or add a comment, sign in
-
-
Hot take: 90% of analysts overcomplicate their SQL. Here are the 8 functions that cover 80% of real analytics work: 1️⃣ ROW_NUMBER() : rank rows within a group → Use it: find the latest record per customer 2️⃣ LAG() / LEAD() : compare current row to previous or next → Use it: month-over-month change without a self-join 3️⃣ SUM() OVER() : running totals without collapsing rows → Use it: cumulative revenue that still shows each day 4️⃣ CASE WHEN : conditional logic inline → Use it: segment customers by behaviour in one query 5️⃣ DATE_TRUNC() : truncate timestamps to week, month, quarter → Use it: group daily data into monthly trends instantly 6️⃣ COALESCE() : replace NULLs with a fallback value → Use it: clean up messy source data before aggregating 7️⃣ COUNT(DISTINCT) : unique counts, not total rows → Use it: actual active users, not just session counts 8️⃣ WITH (CTE) : readable, reusable query logic → Use it: break a 200-line monster into human-readable steps Most dashboards I've built in 9 years? These 8 functions did the heavy lifting. Save this. Your future Monday-morning self will thank you. Which one do you use most and which one took you longest to actually get? #SQL #DataAnalytics #BI #DataAnalyst #Analytics
To view or add a comment, sign in
-
Your data can pass NULL checks and still be completely wrong. Day-to-day data work can get messy. Even after removing/treating NULLs and duplicates, your dataset might still not be reliable. That’s where validity checks come in. Some errors happen earlier in the pipeline, so the data reaches your database with values that don’t make sense: ❌negative amounts ❌unrealistic ages 🚨inconsistent dates (like a churn date before registration) This kind of noise can distort your analysis and erode stakeholders’ trust. A simple way to catch these issues is using WHERE NOT. You define what should be true according to your business logic and flag everything that breaks it. It’s like saying: “If any of these conditions are not met, there’s a problem. Show me those rows.” What’s your go-to trick to validate data faster? Leave it in the comments 👇 📌 Found this useful? Save it for later. #SQLTips #DataAnalytics #DataScience #SQL #Analytics #BusinessIntelligence #DataEngineer #LearnSQL
To view or add a comment, sign in
-
-
⚠️ Your SQL JOIN is silently duplicating data… and you don’t even know it. This is one of the most common mistakes I see in data pipelines 👇 🔥 **The Problem** You write a simple JOIN like this: SELECT * FROM orders o JOIN customers c ON o.customer_id = c.customer_id; Looks correct, right? ❌ But if `customers` have duplicate records… 👉 Your data will multiply 👉 Metrics will be wrong 👉 Dashboards = misleading 💡 **Why this happens** JOINs don’t “match once” They match **ALL possible combinations** So: 1 row × 3 matching rows = 3 rows 😬 ✅ **How to fix it** ✔ Ensure uniqueness before JOIN: SELECT * FROM orders o JOIN ( SELECT DISTINCT customer_id, customer_name FROM customers ) c ON o.customer_id = c.customer_id; ✔ Or use aggregation: SELECT customer_id, MAX(customer_name) AS customer_name FROM customers GROUP BY customer_id; 🚀 **Pro Tips:** ✔ Always check row counts before & after JOIN ✔ Validate uniqueness of keys ✔ Use COUNT(*) vs COUNT(DISTINCT key) ✔ Don’t blindly trust source tables 📌 **Golden Rule:** “If your JOIN increases row count unexpectedly, something is wrong.” 💬 Have you ever debugged a data issue that turned out to be a JOIN problem? #SQL #DataEngineering #Databricks #BigQuery #DataQuality #Analytics #SQLTips
To view or add a comment, sign in
-
-
Day 5 of my Data Analytics journey I’m starting to realize something important: Writing SQL queries is easy. Writing good queries is not. Understanding when to use JOINs, how to filter efficiently, and how tables are structured makes all the difference. Still early, but I’m focusing on building strong fundamentals instead of rushing ahead. Consistency continues. #SQL #DataAnalytics #LearningJourney
To view or add a comment, sign in
-
Day 7: Mastering the Magic of SQL Window Functions! Today was a game-changer in my SQL journey. I dived deep into Window Functions, and I finally understand why they are a data analyst's best friend. Unlike standard aggregate functions, these allow me to look at individual rows while still calculating totals, averages, or rankings across a specific "window" of data. Key Takeaways from Today: PARTITION BY: This is like a "Group By" that doesn’t hide your rows. It breaks the data into logical chunks (e.g., grouping by Department) so the function can calculate values within those specific groups. OVER(): The "magic wand" that tells SQL, "Treat this as a window function." It defines exactly which rows the function should look at. RANK(): Perfect for finding the "Top N" items. It assigns a rank to each row based on a specific order. Window Frames (The "Rules"): I explored how to define moving windows using: ROWS PRECEDING: Looking back at previous rows. CURRENT ROW: Including the data right where we are. FOLLOWING: Peeking ahead. Why this matters? This is how we calculate Running Totals, Moving Averages, and Year-over-Year growth effortlessly. It’s the difference between seeing a flat table and seeing the story and trends within the data. Feeling more confident with every query! #SQL #DataAnalytics #LearningJourney #WomenInTech #DataScience #ContinuousLearning #SQLWindowFunctions
To view or add a comment, sign in
-
🚀 𝐓𝐡𝐞 𝐏𝐨𝐰𝐞𝐫 𝐨𝐟 𝐒𝐐𝐋 𝐋𝐢𝐞𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐒𝐦𝐚𝐥𝐥𝐞𝐬𝐭 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧𝐬 Behind every clean dashboard and accurate insight, there’s one common step — data preparation. And when it comes to handling text data, SQL string functions do more than just basic operations… they bring structure to chaos. Using functions like 𝐓𝐑𝐈𝐌(), 𝐒𝐔𝐁𝐒𝐓𝐑𝐈𝐍𝐆(), 𝐋𝐄𝐅𝐓(), 𝐚𝐧𝐝 𝐑𝐈𝐆𝐇𝐓(), you can: ✔ Eliminate inconsistencies ✔ Extract only what matters ✔ Standardize raw text into usable data 💡 These are not just functions — they are the foundation of reliable analysis. #SQL #DataAnalytics #DataCleaning #DataAnalyst #Analytics #LearnSQL
To view or add a comment, sign in
-
-
I once spent hours debugging a “data issue.” Turns out… there was no issue. A stakeholder reached out: “The numbers in the dashboard are wrong.” So I started checking everything: → SQL logic → Data pipeline → Source tables Everything matched. Still, the numbers felt “off” to them. Then I asked one question: “What number are you expecting?” That’s when it clicked. They were comparing: This month’s partial data With: Last month’s complete data Same metric. Different context. The data wasn’t wrong. The comparison was. Since then, before debugging anything, I ask: → What exactly looks wrong? → What are you comparing it with? → Is the context the same? Because sometimes… The problem isn’t in the data. It’s in the expectation. Have you ever chased a “data issue” that turned out to be a misunderstanding? #Data #SQL #Dataanalyst #Skills #analyst
To view or add a comment, sign in
-
I inherited a table with 340 columns. Nobody knew what half of them meant. Column names like "is_flg_cust_actv_v2" - written by someone who left 2 years ago. Analysts spent more time figuring out how to query the data than actually using it. That's what missing data modeling looks like in production. ───────────────── Data modeling is deciding how to structure data so it's: → Fast to query → Easy to understand → Consistent across every team The most important concept: GRAIN. The grain = what one row represents. One row per order? Per order line item? Per click event? Get it wrong → every report built on that table is subtly wrong. I've seen teams rebuild entire models from scratch because they got the grain wrong early. Define your grain before you write a single line of SQL. ───────────────── What's the worst data model you've ever inherited? 👇 #DataEngineering #DataModeling #SQL #Analytics #LearningInPublic #DataEngineer
To view or add a comment, sign in
-
𝗚𝗥𝗢𝗨𝗣 𝗕𝗬 𝗖𝗵𝗮𝗻𝗴𝗲𝗱 𝗛𝗼𝘄 𝗜 𝗜𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁 𝗗𝗮𝘁𝗮 Earlier, I used to group data without thinking much. Just adding columns to GROUP BY because SQL required it. The query worked. But the insight didn’t. That’s when it clicked. 𝗚𝗿𝗼𝘂𝗽𝗶𝗻𝗴 𝗶𝘀 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗮 𝘁𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝘀𝘁𝗲𝗽. 𝗜𝘁 𝗱𝗲𝗳𝗶𝗻𝗲𝘀 𝗵𝗼𝘄 𝘁𝗵𝗲 𝗯𝘂𝘀𝗶𝗻𝗲𝘀 𝘀𝗲𝗲𝘀 𝘁𝗵𝗲 𝗱𝗮𝘁𝗮. Before writing GROUP BY, I now ask: 𝗔𝘁 𝘄𝗵𝗮𝘁 𝗹𝗲𝘃𝗲𝗹 𝘀𝗵𝗼𝘂𝗹𝗱 𝘁𝗵𝗶𝘀 𝗱𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗯𝗲 𝗺𝗮𝗱𝗲? For example, if the goal is to understand region-level performance: SELECT region, SUM(revenue) AS total_revenue FROM sales_transactions GROUP BY region; Earlier, I used to group by too many columns. That broke the summary and made the output too detailed to interpret. 𝗦𝗮𝗺𝗲 𝗱𝗮𝘁𝗮. 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗴𝗿𝗼𝘂𝗽𝗶𝗻𝗴. 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗯𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝘀𝘁𝗼𝗿𝘆. What changed for me: GROUP BY defines the level of analysis Every non-aggregated column in SELECT affects that grouping level Wrong grouping leads to misleading insights More columns ≠ better analysis “I use GROUP BY to control aggregation level so the output aligns with the business question.” That’s when SQL stopped being syntax and started becoming analysis. #SQL #DataAnalytics #DataAnalyst #BusinessAnalysis #AnalyticsThinking #LearnSQL #DataCareer
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
Good work keep it up