🚀 SQL Server Execution Plan — The Secret Behind Every Slow Query Most developers try to optimize queries by guessing. But the real answer is always in one place: 👉 Execution Plan Let’s break it down in a practical way 👇 🔍 Estimated vs Actual Execution Plan Estimated Execution Plan Generated before query execution Based on statistics Shows what SQL Server thinks will happen Actual Execution Plan Generated after execution Shows real data (rows processed, actual cost) Helps identify real performance issues 💡 Rule: If your query is slow → always check the Actual Execution Plan ⚙️ Key Operators You MUST Understand 1. Nested Loop (🔁) Best for small datasets Works like: For each row in table A → scan table B ✅ Fast when: One table is small Proper indexes exist ❌ Bad when: Large datasets → becomes very slow 2. Hash Match (🧠) Used for large, unsorted data Creates a hash table in memory ✅ Good for: Large joins No indexes ❌ Downside: High memory usage 3. Merge Join (🔀) Requires sorted data Very efficient for large datasets ✅ Fast when: Both inputs are sorted (via index) ❌ Problem: Sorting itself can be expensive 🤔 Why SQL Chooses One Over Another? SQL Server Query Optimizer decides based on: Table size Available indexes Data distribution (statistics) Estimated cost 👉 It always picks the lowest estimated cost plan But here's the catch: ⚠️ If statistics are outdated → SQL can choose the wrong plan 🐢 Real Example: Fixing a Slow Query Problem: A query was taking 8–10 seconds SELECT o.OrderId, c.CustomerName FROM Orders o JOIN Customers c ON o.CustomerId = c.Id WHERE c.City = 'Lahore' 🔍 Execution Plan Analysis: Operator used: Hash Match Reason: No index on Customers.City SQL scanned entire table 😬 ✅ Fix: Added index: CREATE INDEX IX_Customers_City ON Customers(City); ⚡ Result: Execution plan switched to Nested Loop Query time reduced: 👉 10 sec → 200 ms 💡 Key Takeaways ✔ Don’t guess — read the execution plan ✔ Indexes can completely change performance ✔ Hash Match ≠ bad, but depends on scenario ✔ Always compare Estimated vs Actual 🔥 Pro Tip Next time your API is slow: 👉 Open Execution Plan 👉 Look for: Table Scans High cost operators Missing indexes That’s where the real problem is. 💬 If you found this useful, comment “PLAN” and I’ll share more real-world SQL optimization tricks. #SQLServer #DatabasePerformance #QueryOptimization #ExecutionPlan #SQLTips #BackendDevelopment #DotNet #SoftwareEngineering #TechCareers #Developers #Programming #CodeOptimization #APIPerformance #DataEngineering #SQLTuning #PerformanceTuning #LearnSQL #CodingTips #TechCommunity #LinkedInTech
SQL Server Execution Plan Optimization
More Relevant Posts
-
🚨 Database Series #21: Functions in SQL Server Ever written the same SQL logic over and over again? Same calculation… same formatting… same business rule… That’s not just inefficient — it’s a maintainability nightmare. This is where Functions in Microsoft SQL Server come in. 🔍 Core Concept A Function is a reusable block of SQL logic that returns a value. Think of it as: ➡️ “Write once, reuse everywhere” There are two main types: 1️⃣ Scalar Functions Return a single value Use case: ✔ Calculations ✔ Formatting ✔ Business rules Example: CREATE FUNCTION dbo.GetFullName ( @FirstName VARCHAR(50), @LastName VARCHAR(50) ) RETURNS VARCHAR(100) AS BEGIN RETURN @FirstName + ' ' + @LastName END Usage: SELECT dbo.GetFullName('John', 'Doe') 2️⃣ Table-Valued Functions (TVFs) Return a table Use case: ✔ Encapsulating queries ✔ Reusable datasets Example: CREATE FUNCTION dbo.GetActiveUsers() RETURNS TABLE AS RETURN ( SELECT Id, Name FROM Users WHERE IsActive = 1 ) Usage: SELECT * FROM dbo.GetActiveUsers() 🧠 Deterministic vs Non-Deterministic This is where things get interesting: ✔ Deterministic Function → Same input = Same output → Predictable → Can be indexed Example: String concatenation Mathematical calculations ❌ Non-Deterministic Function → Same input ≠ Same output → Depends on external state Examples: GETDATE() NEWID() 🧩 Visual Diagram Functions │ ├── Scalar → returns 1 value │ Example: FullName │ └── Table-Valued → returns table Example: ActiveUsers Behavior │ ├── Deterministic → predictable ✅ └── Non-Deterministic → unpredictable ⚠️ ⚠️ Common Mistake Using Scalar Functions inside SELECT on large datasets Example: SELECT dbo.GetFullName(FirstName, LastName) FROM Users 🚨 This can kill performance due to row-by-row execution. ✅ Practical Takeaway Use functions when: ✔ Logic needs reuse ✔ You want cleaner queries Avoid: ❌ Heavy scalar functions in large queries ❌ Hidden performance costs Prefer: ➡ Inline Table-Valued Functions for better performance 💬 Let’s Discuss Have you ever used a function that silently degraded your query performance? Or do you avoid scalar functions completely? 🔜 Next in the series: Triggers — Automation or Hidden Danger?
To view or add a comment, sign in
-
-
Top 10 SQL Queries Every Data Engineer Uses Daily 👇 1⃣ 𝗦𝗘𝗟𝗘𝗖𝗧 + 𝗪𝗛𝗘𝗥𝗘 ↳ Retrieve specific rows from a table based on conditions ↳ The foundation of every query you will ever write ↳ Master filters before anything else 2⃣ 𝗚𝗥𝗢𝗨𝗣 𝗕𝗬 ↳ Aggregate data into meaningful groups ↳ Counts, sums, averages by category ↳ Where raw data starts becoming business insight 3⃣ 𝗝𝗢𝗜𝗡 (𝗜𝗡𝗡𝗘𝗥 𝗝𝗢𝗜𝗡) ↳ Combine matching records from two or more related tables ↳ Only returns rows that exist in both sides ↳ The default join for most analytical work 4⃣ 𝗟𝗘𝗙𝗧 𝗝𝗢𝗜𝗡 ↳ Return all records from the left table ↳ Plus any matches from the right ↳ Perfect for finding gaps and missing data 5⃣ 𝗢𝗥𝗗𝗘𝗥 𝗕𝗬 ↳ Sort query results ascending or descending ↳ Critical for ranked analysis and readability ↳ Pair with LIMIT to get top or bottom N rows 6⃣ 𝗟𝗜𝗠𝗜𝗧 / 𝗧𝗢𝗣 ↳ Control exactly how many rows your query returns ↳ Essential for testing queries on large tables ↳ Saves compute and speeds up exploration 7⃣ 𝗖𝗔𝗦𝗘 𝗪𝗛𝗘𝗡 ↳ Add conditional if then logic directly inside a query ↳ Categorize, bucket, or flag rows on the fly ↳ The SQL equivalent of an if else statement 8⃣ 𝗪𝗶𝗻𝗱𝗼𝘄 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 (𝗥𝗢𝗪_𝗡𝗨𝗠𝗕𝗘𝗥) ↳ Rank, number, or analyze rows within defined partitions ↳ Does not collapse your result set like GROUP BY ↳ Separates intermediate SQL users from advanced ones 9⃣ 𝗖𝗧𝗘 (𝗖𝗼𝗺𝗺𝗼𝗻 𝗧𝗮𝗯𝗹𝗲 𝗘𝘅𝗽𝗿𝗲𝘀𝘀𝗶𝗼𝗻) ↳ Write cleaner, reusable query blocks with WITH ↳ Makes complex logic easier to read and debug ↳ Replaces messy nested subqueries 🔟 𝗦𝘂𝗯𝗾𝘂𝗲𝗿𝘆 ↳ Nest one query inside another ↳ Great for advanced filtering and complex logic ↳ Use when a CTE would be overkill 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆: ↳ SQL mastery is not about memorizing syntax ↳ It is about knowing which query solves which problem ↳ Fluency comes from pattern recognition, not memorization 𝗣𝗿𝗼 𝘁𝗶𝗽: Good data engineers do not write the fanciest queries. They write the simplest query that gets the job done. Which SQL pattern took you the longest to truly understand? 👇 🔗 Useful links PostgreSQL Official Tutorial → https://lnkd.in/ebjMjRUE MySQL 8.0 Reference Manual → https://lnkd.in/eYvvD6Sf freeCodeCamp SQL for Analytics (Free Course) → https://lnkd.in/ermU3E24 Harvard University CS50 SQL (freeCodeCamp YouTube) → https://lnkd.in/e-rxk3Sd ♻️ Repost to help someone works with data 📌 P.S: I post FREE Data Engineering and AI resources everyday! Subscribe to my newsletter -> https://lnkd.in/emXYKQw4
To view or add a comment, sign in
-
-
🧠 Ever wondered how SQL Server decides HOW to run your query? You write: SELECT * FROM Orders WHERE CustomerId = 10 But SQL Server has multiple ways to execute it: 👉 Index Seek 👉 Table Scan 👉 Different join strategies So how does it choose? Welcome to the world of the SQL Server Query Optimizer 👇 ⚙️ What is the Query Optimizer? 👉 It’s the brain of SQL Server Its job: Find the most efficient way to execute your query 💡 Not the fastest always… 👉 The lowest estimated cost plan 🔍 What Happens Under the Hood? When you run a query, SQL Server goes through steps: 1. Parsing 👉 Checks syntax & converts query into a logical tree 2. Optimization (Magic happens here ✨) 👉 SQL generates multiple execution plans It evaluates: Table size Indexes Statistics Join methods 👉 Then assigns a cost to each plan 3. Plan Selection 👉 Picks the plan with lowest estimated cost 4. Execution 👉 Runs the chosen plan 💡 This is what you see in the Execution Plan 🤔 Why Sometimes SQL Makes Bad Decisions? Because it depends on: 👉 Statistics If stats are: Outdated Inaccurate 👉 SQL may choose: ❌ Table Scan instead of Index Seek ❌ Wrong join type ⚡ Real Example Query: SELECT * FROM Orders WHERE CustomerId = 10 🔍 Scenario 1: Index exists Few matching rows 👉 Plan: ✔ Index Seek (fast) 🔍 Scenario 2: No index Large table 👉 Plan: ❌ Table Scan (slow) 🔥 Scenario 3 (Interesting): Index exists But SQL thinks many rows will match 👉 Plan: ❌ Table Scan (even though index exists 😬) 💡 Key Concepts You Should Know 👉 Cost-based optimization 👉 Cardinality estimation (row prediction) 👉 Plan caching (reuse execution plan) 💡 These directly affect performance 🐢 Real-World Insight Ever seen this? 👉 Query fast sometimes 👉 Slow other times 💡 Reason: Cached execution plan Different parameters Wrong estimation 🔥 Pro Tips ✔ Keep statistics updated ✔ Create proper indexes ✔ Always check execution plan ✔ Don’t blindly trust SQL—it can be wrong 💡 Final Thought SQL Server is powerful… But it’s not magic. 👉 It makes decisions based on data it knows If that data is wrong… 👉 Your performance will be too 💬 Comment “OPTIMIZER” and I’ll break down execution plan operators next. #SQLServer #QueryOptimizer #DatabasePerformance #ExecutionPlan #SQLTips #BackendDevelopment #DotNet #SoftwareEngineering #PerformanceTuning #DataEngineering #Developers #Programming #SystemDesign #CodingTips #TechCareers #TechCommunity #LearnSQL
To view or add a comment, sign in
-
-
📊 SQL Concepts I Revisited Recently — and Why They Matter More Than You Think! As someone working in the tech space, I always knew SQL was important. But recently when I sat down to truly practice it, I realised how many small but critical mistakes we make without even noticing. Here are 13 concepts that genuinely changed how I think about writing queries: 1️⃣ NULL is special Never compare NULL with = — it will always fail silently. Always use IS NULL or IS NOT NULL 2️⃣ WHERE vs HAVING WHERE filters rows before grouping. HAVING filters after grouping and only works with aggregate functions. Mixing these up is one of the most common SQL mistakes! 3️⃣ GROUP BY logic Always group by the column you want to summarise across — not the unique ID. Grouping by a unique ID defeats the entire purpose! 4️⃣ Finding duplicates GROUP BY + HAVING COUNT > 1 is your best friend here. Simple but powerful. 5️⃣ Finding missing records LEFT JOIN on the table you want all records from + IS NULL on the right table column — this pattern will save you in real investigations! 6️⃣ Aggregate functions belong in HAVING Never use SUM, COUNT or AVG in WHERE — they only work after grouping happens. 7️⃣ Subqueries When you can't answer something in one step — break it into two. Inner query runs first, outer query uses that result. 8️⃣ CASE WHEN SQL's version of IF/ELSE. Cleaner than you think and incredibly useful for handling NULLs and conditional logic. 9️⃣ SELECT with GROUP BY When using GROUP BY, your SELECT can only have the grouped column or aggregate functions. No individual columns allowed! 🔟 COUNT vs COUNT(DISTINCT) COUNT(column) counts everything including duplicates. COUNT(DISTINCT column) counts only unique values. Know when to use which! 1️⃣1️⃣ JOIN direction matters Always start your JOIN from the table you want all records from — the direction changes your entire result set! 1️⃣2️⃣ When to use which JOIN Want all records from one side → LEFT JOIN Want only matching records → INNER JOIN 1️⃣3️⃣ Finding missing records using JOINs LEFT JOIN on the main table + IS NULL on the right table column = powerful pattern to find records that don't exist in another table! SQL is one of those skills that seems simple on the surface but has so many nuances underneath. The best way to learn it? Practice with real world scenarios, not just textbook examples. What's the one SQL concept that took you the longest to truly understand? Drop it in the comments! 👇 #SQL #DataAnalysis #TechLearning #FinTech #CareerGrowth #LearningInPublic
To view or add a comment, sign in
-
-
⚠️ DAY 14/15 — SQL TRAP: No Index vs Index Same query. Same table. Same result. One takes 120 seconds. Other takes 0.1 seconds. The difference? One line of code. 👇 🎯 The Situation: Your table has 1 million employees. You search for one employee by name. No error. Query runs. Result comes. But it takes 2 whole minutes. 😵 Your dashboard is freezing. Users are complaining. Your manager is asking questions. All because of one missing setup. 🧠 Here's what's happening: Without an index — SQL has no idea where your data lives. So it checks every single row. One by one. All 1 million of them. This is called a FULL TABLE SCAN. Every. Single. Row. Checked. Even when only 1 row matches. With an index — SQL knows exactly where Alice is. Jumps directly to her row. 1 row scanned. Done. ✅ The Numbers: No Index → Rows scanned = 1,000,000 Time = ~120 seconds ❌ With Index → Rows scanned = 1 Time = ~0.1 seconds ✅ 1200x faster. Just from one line — CREATE INDEX idx_name ON employees(name) 💡 Real Life Example: Without index = finding a word in a book by reading every single page from start to finish. 😵 With index = opening the book's index at the back, finding the word, jumping straight to that page. ✅ Same book. Same word. One way is just exhausting. 📌 Save This Rule: → No index = full table scan = checks every row = very slow → Index = direct lookup = jumps to result = very fast → Always index columns used in WHERE, JOIN, ORDER BY → Too many indexes slow down INSERT and UPDATE — balance is key → Use EXPLAIN command to check if your query is using an index 🔑 One Line to Remember: No Index = reads entire table Index = jumps directly to answer On small tables you won't notice. On production data with millions of rows — this is the difference between a fast app and a broken one. 💬 Real Talk: Most slow dashboards and freezing reports in real companies are not caused by bad queries. They are caused by missing indexes. This is one of the first things senior engineers check when a query is running slow. 😎 🙋 Have you ever had a query that worked fine in testing but became super slow in production? Missing index was probably the reason! Comment below 👇 Follow for Day 15 tomorrow — the FINAL day of the series! 🚀 #SQL #SQLForBeginners #DataAnalytics #LearnSQL #SQLTips #DataEngineering #InterviewPrep
To view or add a comment, sign in
-
-
🚀 Handling Large Data in MS SQL Server — Practical Optimization StrategiesWorking with millions (or billions) of records in MS SQL Server? Performance issues like slow queries, blocking, and deadlocks are common—but avoidable.Here are some battle-tested optimization techniques I use in real-world systems 👇🔹 1. Indexing is EverythingCreate covering indexes for frequently used queriesUse filtered indexes for selective dataAvoid over-indexing (it slows down writes)👉 Example:CREATE NONCLUSTERED INDEX IX_Email_MessageId ON TR_SES_EmailDetails(MessageId) INCLUDE (EventType, CreatedDate);🔹 2. Process Data in Batches (Never Full Load)❌ Bad:UPDATE Table SET ...✅ Good:WHILE 1=1 BEGIN UPDATE TOP (50000) Table SET ... WHERE IsProcessed = 0; IF @@ROWCOUNT = 0 BREAK; END✔ Reduces locks ✔ Improves concurrency ✔ Prevents log growth🔹 3. Avoid Long TransactionsKeep transactions short and focusedCommit frequently in batch operationsLong transactions = deadlocks + blocking🔹 4. Use Proper Isolation LevelsDefault: READ COMMITTEDFor heavy read systems → consider:READ COMMITTED SNAPSHOTNOLOCK (only when safe)🔹 5. Optimize Joins & Data TypesEnsure same data types across joinsAvoid functions in WHERE clause:-- Bad WHERE LEFT(MessageId, 10) = 'ABC' -- Good WHERE MessageId = 'ABC...'🔹 6. Partition Large TablesSplit massive tables by date or rangeImproves query performance & maintenance🔹 7. Reduce TempDB & Logging PressureUse minimal logging where possibleDrop temp tables earlyAvoid unnecessary large temp datasets🔹 8. Monitor & Tune ContinuouslyUse:Execution PlansQuery StoreDMVs (sys.dm_exec_query_stats)🔹 9. Handle Deadlocks GracefullyDeadlocks are normal in high-load systems.✔ Use retry logic ✔ Maintain consistent table access order ✔ Reduce transaction scope🔹 10. Think Like a System, Not Just a QueryPerformance is not just SQL:Use queues (RabbitMQ, Service Bus)Offload heavy processing to background jobsDesign event-driven systems💡 Real Insight: In one system processing 7M+ records, just switching to:Batch processingProper indexingBreaking transactions👉 Reduced runtime from hours → minutes.🔥 Key Takeaway: “Fast SQL is not about writing complex queries. It’s about designing smart data flow.”#SQLServer #DatabaseOptimization #PerformanceTuning #BackendDevelopment #DotNet #Microservices #TechLeadership
To view or add a comment, sign in
-
How Clustered Index Works in SQL Server with Examples Indexes in SQL Server are vital for improving data retrieval performance. Today, let’s dive deep into Clustered Indexes and understand how they work with examples. Understanding the B-Tree Structure SQL Server internally uses a Balanced Tree (B-Tree) structure for indexes. In a clustered index, the leaf nodes contain the actual table data, unlike non-clustered indexes. This structure allows SQL Server to quickly locate data. Visual representation: The root node leads to intermediate nodes, which finally point to leaf nodes containing the actual table rows. What is a Clustered Index? A Clustered Index determines the physical order of data in a table. Since leaf nodes store the actual data, a table can have only one clustered index. By default, SQL Server creates a clustered index when a Primary Key is defined. Tables with a clustered index are called clustered tables. Without it, rows are stored in an unordered structure. Example: Clustered Index on Employee Table CREATE TABLE Employee ( Id INT PRIMARY KEY, Name VARCHAR(50), Salary INT, Gender VARCHAR(10), City VARCHAR(50), Dept VARCHAR(50) ); Here, the Id column becomes a clustered index automatically due to the primary key. Insert some data in random order: INSERT INTO Employee VALUES (3,'Pranaya',4500,'Male','New York','IT') INSERT INTO Employee VALUES (1,'Anurag',2500,'Male','London','IT') INSERT INTO Employee VALUES (4,'Priyanka',5500,'Female','Tokyo','HR') INSERT INTO Employee VALUES (5,'Sambit',3000,'Male','Toronto','IT') INSERT INTO Employee VALUES (7,'Preety',6500,'Female','Mumbai','HR') INSERT INTO Employee VALUES (6,'Tarun',4000,'Male','Delhi','IT') INSERT INTO Employee VALUES (2,'Hina',500,'Female','Sydney','HR') Fetch data: SELECT * FROM Employee; Notice the rows are sorted by Id even though they were inserted randomly. This demonstrates how a clustered index organizes the data physically. Can We Create Multiple Clustered Indexes? No. Since a clustered index defines the physical storage order, a table can have only one clustered index. Attempting to create another clustered index on, say, the Salary column, will throw an error. CREATE CLUSTERED INDEX IX_Employee_Salary ON Employee(Salary) -- Error: Cannot create more than one clustered index on table 'Employee' Composite Clustered Index SQL Server allows clustered indexes on multiple columns, called composite clustered indexes. Example: Drop the primary key index (via SQL Server Management Studio, not directly via DROP INDEX). Create a composite clustered index: CREATE CLUSTERED INDEX IX_Employee_Gender_Salary ON Employee(Gender DESC, Salary ASC); Fetch data: SELECT * FROM Employee; ✅ Key Takeaways: Clustered Index stores actual data at leaf nodes. A table can have only one clustered index. Primary Key automatically creates a clustered index by default. Composite clustered indexes allow ordering by multiple columns.
To view or add a comment, sign in
-
-
I ran into something interesting the other day that reminded me why understanding your database engine matters more than just knowing SQL. 🤔 I had a query running perfectly fine on MS SQL Server. Same query, moved to Databricks SQL- boom, error. Here's a scenario many data professionals might have faced and knowing this one difference might save you hours of debugging. You write a query with SELECT DISTINCT and ORDER BY a column that isn't in your SELECT list. It runs fine on MS SQL Server but it's a big 'NO' for Databricks. 𝙼̲𝚂̲ ̲𝚂̲𝚀̲𝙻̲ ̲𝚂̲𝚎̲𝚛̲𝚟̲𝚎̲𝚛̲ ̲𝙲̲𝚘̲𝚍̲𝚎̲ ̲𝙴̲𝚡̲𝚊̲𝚖̲𝚙̲𝚕̲𝚎̲:̲ Select Distinct AccountNo, CAST(MODIFY_TIME AS DATE) AS TransactionDate, Amount From TransactionTable Order by MODIFY_TIME 𝙳̲𝚊̲𝚝̲𝚊̲𝚋̲𝚛̲𝚒̲𝚌̲𝚔̲𝚜̲ ̲𝙲̲𝚘̲𝚍̲𝚎̲ ̲𝙴̲𝚡̲𝚊̲𝚖̲𝚙̲𝚕̲𝚎̲ ̲(̲𝙸̲𝚗̲𝚌̲𝚘̲𝚛̲𝚛̲𝚎̲𝚌̲𝚝̲ ̲𝚅̲𝚎̲𝚛̲𝚜̲𝚒̲𝚘̲𝚗̲)̲:̲ Select Distinct AccountNo, Date( MODIFY_TIME ) AS TransactionDate, Amount From TransactionTable Order by MODIFY_TIME Why??? 𝗠𝗦 𝗦𝗤𝗟 𝗦𝗲𝗿𝘃𝗲𝗿 → Permissive MS SQL Server allows you to ORDER BY a column that is not in the SELECT list even when using SELECT DISTINCT, as long as it can resolve the column unambiguously. It internally handles the sorting before applying DISTINCT, so it doesn't complain. In other words, MS SQL Server processes the query like this internally: 1. FROM & JOINs 2. WHERE filtering 3. ORDER BY MODIFY_TIME ← applied before DISTINCT 4. SELECT DISTINCT columns So it sorts first, then removes duplicates and doesn't require the ORDER BY column to be in the SELECT list. 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀 𝗦𝗤𝗟 → Strict Databricks follows a stricter ANSI SQL standard. According to ANSI SQL rules: 𝘞𝘩𝘦𝘯 𝘚𝘌𝘓𝘌𝘊𝘛 𝘋𝘐𝘚𝘛𝘐𝘕𝘊𝘛 𝘪𝘴 𝘶𝘴𝘦𝘥, 𝘢𝘭𝘭 𝘤𝘰𝘭𝘶𝘮𝘯𝘴 𝘪𝘯 𝘖𝘙𝘋𝘌𝘙 𝘉𝘠 𝘮𝘶𝘴𝘵 𝘢𝘱𝘱𝘦𝘢𝘳 𝘪𝘯 𝘵𝘩𝘦 𝘚𝘌𝘓𝘌𝘊𝘛 𝘭𝘪𝘴𝘵. This is because Databricks/Spark SQL processes things differently: 1. FROM & JOINs 2. WHERE filtering 3. SELECT DISTINCT columns ← applied first 4. ORDER BY ← must work with already-distinct result set Since MODIFY_TIME (raw TIMESTAMP) was never selected, Databricks doesn't know how to sort by it after DISTINCT is applied -hence the error. This isn't a bug, but a fundamental difference in how each engine interprets the SQL standard. 𝗧𝗵𝗲 𝘀𝗮𝗳𝗲𝘀𝘁 𝗳𝗶𝘅 𝘁𝗵𝗮𝘁 𝘄𝗼𝗿𝗸𝘀 𝗲𝘃𝗲𝗿𝘆𝘄𝗵𝗲𝗿𝗲: → Use the column alias: ORDER BY TransactionDate → Use the expression: ORDER BY date(MODIFY_TIME)/ ORDER BY CAST(MODIFY_TIME AS DATE) → Use the position: ORDER BY 2 Any of these alternatives work across BOTH platforms because they reference what's actually in the SELECT list. The lesson? Write SQL that's portable, especially if you're working in multi-platform data environments. Have you ever been caught off guard by a database engine behaving differently than expected? Drop it in the comments 👇 #Databricks #SQLServer #SparkSQL #DataEngineering #CloudMigration #SQLTips #Analytics #DataArchitecture
To view or add a comment, sign in
-
Next up: SQL… and wait, is it just me or does this feel easier than Excel? 💻 Not saying Excel isn’t important, it really is. But SQL? It almost feels… simpler? 👀 Maybe it’s the structure, or maybe it’s how everything just clicks once you start understanding it. So after setting up SQL Server and Microsoft SQL Server Management Studio, we got into the foundations: what SQL is, why analysts use it, and where it fits in real-world work. And honestly, it started making sense very quickly. SQL is mainly used for storing and managing data in relational databases, but more importantly, it’s about accessing and working with data directly. No waiting for exports. No relying on pre-built reports. Just you… and the data. What really stood out to me is how practical SQL is for analysts; it allows you to pull exactly what you need, clean and prepare data for analysis, and even combine multiple tables using joins to get a full picture, like linking customers to their purchase behavior. And once you write your queries, you can reuse them or automate tasks like daily reports, which saves a lot of time. The more I learned, the more I understood why SQL is such a big deal: • It’s actually easy to understand once you get the basics • You can access and analyze data directly • It allows you to audit and replicate your work easily • You can work across multiple tables at once • And it helps answer more complex questions than dashboards alone We also went deeper into: • Data types • SQL commands and how they’re categorized • Keys and different types of keys • Creating relationships between tables And that’s where things got really interesting…Because you start to see that data is not just stored, it’s connected. Another thing that stood out to me is how SQL compares to tools like Excel. Where Excel can struggle with large datasets, SQL is built to handle millions (even billions) of records efficiently. It’s optimized for speed, so queries run fast, and results are reliable. It also brings a level of transparency and integrity; your queries are clear, reproducible, and less prone to hidden errors like in spreadsheets. Plus, it enforces structure, so you’re working with clean, consistent data, not guesswork. One mindset shift for me was this: A database isn’t just where data is stored. It’s a system designed for structured access, analysis, and decision-making. And SQL is the language that makes that possible. We’re not just using tools anymore…we’re learning how to think with data at scale 💻✨ Akinyemi Feyisara #DataAnalysisJourney #SQL #DataAnalytics #GrowthJourney
To view or add a comment, sign in
-
-
Just completed a focused deep dive into MySQL – the backbone of so many data analytics workflows! Here’s what stood out: 🔹 OLTP vs OLAP • OLTP (Online Transaction Processing) → Built for real-time transactions. Normalized databases, lightning-fast inserts/updates, high concurrency. Think order processing, banking apps. • OLAP (Online Analytical Processing) → Optimized for complex analysis. Often denormalized (star/snowflake schemas), perfect for aggregations, dashboards, and business intelligence. MySQL shines in OLTP but can power OLAP workloads with smart indexing, partitioning, and proper design. 🔹 SQL Command Categories • DDL (Data Definition Language): CREATE, ALTER, DROP, TRUNCATE → Define & modify structure • DQL (Data Query Language): SELECT → Retrieve data • DML (Data Manipulation Language): INSERT, UPDATE, DELETE → Manipulate records • DCL (Data Control Language): GRANT, REVOKE → Manage permissions • TCL (Transaction Control Language): COMMIT, ROLLBACK, SAVEPOINT → Handle transactions safely 🔹 Key MySQL Data Types for Data Analytics • Numeric: INT/BIGINT (IDs & counts), DECIMAL (financial precision), FLOAT/DOUBLE (scientific/approximate calculations) • String: VARCHAR/TEXT (descriptions & logs), JSON (flexible semi-structured data) • Date/Time: DATE, DATETIME, TIMESTAMP (critical for time-series, trend analysis & reporting) These fundamentals have sharpened my database design and query optimization skills for real-world analytics projects. What’s one MySQL trick or data type you rely on most in analytics? Share in the comments 👇 Let’s learn together! Harshitha K Ranjith Kalivarapu Rakesh Viswanath Krishna Mantravadi Upendra Gulipilli #MySQL #SQL #DataAnalytics #DataEngineering #OLTP #OLAP #DatabaseDesign #LearningInPublic #flm #frontlineedutech
To view or add a comment, sign in
-
Explore related topics
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development