SQL Data Transformation: Creating New Columns

1mo

SQL Tutorial: Creating new columns with data transformations 👇 So far in this series we've covered querying, aggregating, and grouping data. But what if the column you need doesn't exist yet? That's where data transformation comes in — creating new columns from existing ones using formulas, applied row by row. 🔹 Basic arithmetic transformation Keep all existing columns with SELECT *, then add your formula: SELECT *, ad_spend_usd * 0.92 AS ad_spend_eur FROM campaign_spend; Every row gets its own calculated value. One formula, applied consistently across the entire dataset. 🔹 Calculating ratios Ratios divide one value by another to reveal efficiency and performance — the kind of insight raw numbers hide. SELECT *, 100 * paid_orders / paid_sessions AS pcvr, ad_spend / paid_orders AS cpo FROM campaign_performance ORDER BY campaign_day; 🔹 Always handle division by zero If any row has a zero denominator, your query breaks. Use NULLIF to convert zeros to NULL before dividing — NULL divided by anything returns NULL, not an error: SELECT *, 100 * paid_orders / NULLIF(paid_sessions, 0) AS pcvr, ad_spend / NULLIF(paid_orders, 0) AS cpo FROM campaign_performance; NOTE: BigQuery has SAFE_DIVIDE() which does the same thing more cleanly. PostgreSQL users stick with NULLIF. 🔹 One thing most tutorials skip Ratios become unreliable with small sample sizes. A conversion rate of 50% sounds impressive — until you realise it's based on 1 order from 2 visits. One fewer order and it's 0%. The math is correct, but the insight is misleading. Always check your denominator size before drawing conclusions from a ratio. Next up: more advanced transformation techniques. #SQL #PostgreSQL #DataAnalysis #LearningInPublic

1 Comment

Ruan Aucamp 1mo

All SQL code from these posts is saved to my GitHub as I go: 🔗 https://github.com/aucampr/sql And if you want to know more about my work and experience: 🌐 https://ruanaucamp.me/

To view or add a comment, sign in

More Relevant Posts

Sai Venkat Abhiram Kotipam
2w
Report this post
🚀 SQL Journey – Day 26 to Day 28: Deep Dive into Subqueries Over the last three days, I explored one of the most powerful and widely used SQL concepts - Subqueries. These concepts helped me understand how to break complex problems into simpler, logical steps. 🔹 Day 26 – Introduction to Subqueries A subquery is a query inside another query used for intermediate calculations and filtering. 💼 Real-world perspective: • Which product performs well? • Which store performs well? • Which customers perform well? 👉 Subqueries help answer these by comparing values with averages or totals. ✔ Learned about: • Correlated Subqueries (row-wise execution) • Non-correlated Subqueries (single execution) 🔹 Day 27 – Types of Subqueries (Based on Result Type) • Scalar Subquery → Returns single value • Row Subquery → Returns one row, multiple columns • Table Subquery → Returns multiple rows & columns 💡 Practice focus: ✔ Customers spending above average ✔ Filtering & counting results dynamically 🔹 Day 28 – Correlated vs Non-Correlated Subqueries • Correlated Subquery → Depends on outer query → Executes for each row → Used for detailed comparisons • Non-Correlated Subquery → Independent → Executes once → Faster and efficient 📊 Key Learnings: ✔ Breaking complex queries into smaller steps ✔ Dynamic filtering using subqueries ✔ Choosing the right type improves performance ✔ Strong foundation for real-world analytics 💡 Key Takeaway: Subqueries are essential for solving real-world business problems, especially when dealing with comparisons, filtering, and analytical queries. 🔥 Consistency + Practice = Mastery in SQL #SQL #DataAnalytics #LearningJourney #Subqueries #SQLPractice #Database #TechLearning #30DaysOfSQL #AIStudent 🚀
Like Comment
To view or add a comment, sign in
Prajakta Sawant
6d
Report this post
Window functions are where SQL goes from functional to powerful. Most analysts use them for rank and row_number and stop there. Here's what's worth actually understanding: PARTITION BY is not GROUP BY. GROUP BY collapses rows. PARTITION BY adds a column while keeping every row. This distinction matters constantly: retention curves, running totals, period-over-period comparisons, session attribution. LAG and LEAD are underused. If you've ever written a self-join to compare a row to the row before it, LAG eliminates that. Cleaner query, one table scan instead of two. Frames matter more than most people realize. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW gives you a running total. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW gives you a rolling 7-day window. The default frame depends on whether you have ORDER BY. Leaving it implicit is a bug waiting to happen. QUALIFY (in BigQuery, Snowflake, DuckDB) lets you filter on a window function result without a subquery. It's genuinely one of the nicest quality-of-life additions to modern SQL dialects. The hardest thing about window functions isn't the syntax. It's learning to recognize the class of problems they solve: anything involving ordering, adjacency, or comparing a row to a set of related rows. Once that click happens, you'll find yourself reaching for them constantly. What's the window function use case that surprised you the most when you first figured it out?
Like Comment
To view or add a comment, sign in
MadhuKanth Kella
4w
Report this post
day-29 🚀 Day 29 of My SQL Journey – 𝗦𝘂𝗯𝗾𝘂𝗲𝗿𝗶𝗲𝘀 (𝗕𝗮𝘀𝗲𝗱 𝗼𝗻 𝗟𝗼𝗰𝗮𝘁𝗶𝗼𝗻) Today I focused on understanding how subqueries can be used based on their location within SQL queries. Instead of just learning concepts, I explored where exactly subqueries fit in real-world scenarios. 🔹 𝗦𝘂𝗯𝗾𝘂𝗲𝗿𝘆 𝗶𝗻 𝗦𝗘𝗟𝗘𝗖𝗧 – Used to display calculated values alongside each row 🔹 𝗦𝘂𝗯𝗾𝘂𝗲𝗿𝘆 𝗶𝗻 𝗪𝗛𝗘𝗥𝗘 – Helps filter data based on conditions 🔹 𝗦𝘂𝗯𝗾𝘂𝗲𝗿𝘆 𝗶𝗻 𝗙𝗥𝗢𝗠 (Derived Table) – Creates temporary tables for further analysis 🔹 𝗦𝘂𝗯𝗾𝘂𝗲𝗿𝘆 𝗶𝗻 𝗛𝗔𝗩𝗜𝗡𝗚 – Filters grouped results using aggregate conditions 💡 𝗞𝗲𝘆 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆: Subqueries make complex problems easier by breaking them into smaller steps and placing logic exactly where it’s needed. 📊 Practicing these concepts is helping me think more analytically and write more efficient SQL queries. hashtag#SQL hashtag#LearningJourney hashtag#DataAnalytics hashtag#Database hashtag#SQLQueries hashtag#Subqueries hashtag#TechSkills hashtag#StudentDeveloper day - 30 🚀 Day 30 of my SQL Journey – 𝗦𝘂𝗯𝗾𝘂𝗲𝗿𝗶𝗲𝘀 𝗠𝗮𝘀𝘁𝗲𝗿𝘆! 30 days down, 10 more to go! This journey has been incredibly rewarding, and today I dived deeper into advanced SQL concepts using real 𝗟𝗲𝗲𝘁𝗖𝗼𝗱𝗲 problems. 🔍 𝗪𝗵𝗮𝘁 𝗜 𝘄𝗼𝗿𝗸𝗲𝗱 𝗼𝗻: • Identifying employees whose managers left the company • Finding top 3 salaries per department using ranking functions • Discovering the most connected users through aggregation • Analyzing movie ratings with joins + subqueries 💡 𝗞𝗲𝘆 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴𝘀: • Subqueries simplify complex filtering logic • Correlated subqueries handle row-level dependencies • Ranking functions like RANK() and DENSE_RANK() are powerful • Real-world problems require combining multiple SQL concepts Only 10 days left—excited to finish strong! 💯 hashtag#SQL hashtag#DataAnalytics hashtag#LeetCode hashtag#LearningJourney hashtag#Consistency
Like Comment
To view or add a comment, sign in
Lakshmanan M
3w
Report this post
🚀 SQL Window Functions – Deep Dive (Hands-on Practice) Taking my SQL skills further by working on Window Functions using real datasets 👇 🔹 What are Window Functions? They perform calculations across a set of rows while retaining individual rows (unlike GROUP BY). OVER() → defines the window PARTITION BY → like GROUP BY (without collapsing rows) ORDER BY → defines order within partition 🔹 1. ROW_NUMBER() – Unique row number per group SELECT product, unitPrice, ROW_NUMBER() OVER (PARTITION BY product ORDER BY unitPrice DESC) AS ROW_Number FROM samples.bakehouse.sales_transactions; 🔹 2. RANK() – Same rank for ties (with gaps) SELECT product, unitPrice, RANK() OVER (PARTITION BY product ORDER BY unitPrice DESC) AS RANK FROM samples.bakehouse.sales_transactions; 🔹 3. DENSE_RANK() – Same rank for ties (no gaps) SELECT product, unitPrice, DENSE_RANK() OVER (PARTITION BY product ORDER BY unitPrice DESC) AS DENSE_RANK FROM samples.bakehouse.sales_transactions; 🔹 4. SUM() OVER() – Total per partition SELECT product, unitPrice, SUM(unitPrice) OVER (PARTITION BY product) AS SUM_Over FROM samples.bakehouse.sales_transactions; 🔹 5. AVG() OVER() – Average per partition SELECT product, unitPrice, AVG(unitPrice) OVER (PARTITION BY product) AS Avg_Over FROM samples.bakehouse.sales_transactions; 🔹 6. LAG() / LEAD() – Previous & Next values SELECT product, unitPrice, LAG(unitPrice) OVER (PARTITION BY product) AS Pre_Sales FROM samples.bakehouse.sales_transactions; SELECT product, unitPrice, LEAD(unitPrice) OVER (PARTITION BY product) AS Next_Sales FROM samples.bakehouse.sales_transactions; 💡 Key Learnings: Window functions are powerful for analytics & trend analysis Unlike GROUP BY, they don’t reduce rows Widely used in ranking, running totals, comparisons 🔥 Most asked interview use case: 👉 Top N records per group using ROW_NUMBER() #SQL #DataAnalytics #DataEngineering #WindowFunctions #PySpark #Databricks #Learning #100DaysOfCode
Like Comment
To view or add a comment, sign in
Dinesh Sahu
1mo
Report this post
🚀 The SQL Roadmap: From Zero to Expert To truly master SQL, you must progress through these core layers: • The Foundation: Understand DDL (Data Definition) for managing structures like tables and DML (Data Manipulation) for handling the data itself. • Querying & Filtering: Mastering SELECT, WHERE, and logical operators like AND/OR to extract exactly what you need. • Aggregations & Grouping: Using functions like SUM(), AVG(), and COUNT() with GROUP BY to generate summary statistics. • Advanced Joins: Moving beyond INNER JOIN to master LEFT, RIGHT, and FULL OUTER joins for complex data relationships. 💡 Pro-Level Concepts to Ace Your Interview If you want to stand out, focus on these advanced topics often asked by top tech companies: • Window Functions: Commands like RANK(), DENSE_RANK(), and LEAD/LAG allow for powerful calculations across rows without collapsing your data. • CTEs vs. Subqueries: Common Table Expressions (CTEs) are often more readable and efficient for complex, multi-step queries. • Performance Optimization: Understanding Indexes (Clustered vs. Non-Clustered) to speed up data retrieval. 🧠 Can You Answer These? Interviewers love "Conceptual" questions to test your depth. Do you know the difference between: WHERE vs. HAVING? (Row-level vs. Aggregate filtering). DELETE vs. TRUNCATE? (Logged row removal vs. fast table clearing). UNION vs. UNION ALL? (Removing duplicates vs. keeping them for speed). 🛠️ Practice Resources Knowledge is nothing without practice. Check out these platforms: Beginner: W3Schools, SQLBolt, SQLZoo. Intermediate/Expert: LeetCode (Top 50 SQL Plan), DataLemur, and HackerRank. SQL isn't just about writing code; it's about solving problems and uncovering insights. What SQL concept took you the longest to "click"? Let’s discuss in the comments! 👇 👉 Follow: Dinesh Sahu #SQL #DataScience #DataEngineering #InterviewPrep #TechCareers #DatabaseManagement #CareerGrowth

41 Comments
Like Comment
To view or add a comment, sign in
Gabriel Marvellous
3w
Report this post
🚀Day 83 of My 100 Days Data Analysis Journey At some point, you stop just writing SQL… and you start reading the database like a system. Today’s focus wasn’t just on queries — it was on understanding structure, context, and accuracy while working inside a real environment. Working with a dataset like this (films table), a few things become very clear: 🔹 Column awareness matters It’s not enough to know SELECT. You need to understand what each column represents, title, release_year, rental_rate, length each one tells a different story. 🔹 Precision over guesswork One small mistake in a column name or reference… and your entire query breaks. SQL is strict, and that’s what makes it powerful. 🔹 Reading query results properly Getting 900+ rows back isn’t the goal. The real skill is knowing: What you’re looking for Why it matters How to refine it 🔹 Environment matters Using tools like MySQL Workbench isn’t just about writing queries, it’s about: Navigating schemas Understanding table structures Interpreting outputs and errors What this reinforces for me: SQL is not just about writing queries. It’s about understanding data at a deeper level — structure, relationships, and meaning. Every day, it becomes less about “learning syntax” and more about thinking like a data analyst. #SQL #DataAnalytics #LearningInPublic #MySQL #DataSkills #TechJourney #100DaysOfCode
Like Comment
To view or add a comment, sign in
Ajaychandra Arekal
3d
Report this post
⏱️ I once wrote a SQL query that took around 12 minutes to run. The output was correct. But, the execution was flawed. I remember staring at the loading screen thinking, Is this normal? I discussed it with teammates, searched online for better approaches, and rewrote the same query. The result came back in under 5 minutes. That moment taught me something I had been ignoring: getting the right answer is only half the job. Getting it efficiently is the other half. Here are a few small things that helped me write faster queries: ● Filter early, not late : Apply WHERE conditions before joins whenever possible. Less data entering a join means less work for the database. For instance, think of joining an orders table with a customers table and then filtering for orders placed in 2024. Moving that date filter before the join reduces the data going into the join significantly. ● Avoid SELECT * : Only retrieve the columns you actually need. Pulling unnecessary columns wastes memory and processing time, especially on wide tables. For instance, a product table with 60+ columns queried with SELECT * just to display a name and price is doing a lot of unnecessary work. Selecting product_name, price is all that is needed. ● Joins become expensive on large data : Joins are powerful, but they are not free. The larger the tables, the more work the compute unit has to do. Reducing rows before joining can improve performance a lot. For instance, joining a raw transactions table with 8 million rows to a users table before filtering is expensive. Filter down to the relevant users first, then join the smaller result. ● Check indexes before rewriting everything: If the column used in your filter or join does not have an index, the database may need to scan the entire table row by row. In many cases, adding the right index can improve speed significantly. Example Scenario: Imagine a library with no catalog system. To find one book, you would have to walk through every single shelf. An index works the same way. It gives the database a shortcut to find rows without scanning the entire table. Query optimization is one of those topics where the more you learn, the more you realize there is still left to understand. What is a SQL lesson that took you longer than it should have to learn? Curious to hear 👇 #SQL #QueryOptimization #Analytics #DataScience #Programming
2 Comments
Like Comment
To view or add a comment, sign in
Savvana Lohitha
3w
Report this post
🚀 𝗗𝗮𝘆 𝟯𝟮 & 𝗗𝗮𝘆 𝟯𝟯 𝗼𝗳 𝗠𝘆 𝗦𝗤𝗟 𝗝𝗼𝘂𝗿𝗻𝗲𝘆 These two days were packed with some powerful SQL concepts — from recursion to performance optimization! 💡 🔍 𝗗𝗮𝘆 𝟯𝟮 – 𝗥𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲 𝗖𝗧𝗘𝘀 & 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗤𝘂𝗲𝗿𝗶𝗲𝘀 Dived into how recursion works in SQL by splitting problems into base case + recursive step. 📌 𝗘𝘅𝗽𝗹𝗼𝗿𝗲𝗱 𝗵𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝘆-𝗿𝗲𝗹𝗮𝘁𝗲𝗱 𝗰𝗼𝗻𝗰𝗲𝗽𝘁𝘀 𝗹𝗶𝗸𝗲: • START WITH • CONNECT BY PRIOR • LEVEL • SYS_CONNECT_BY_PATH • CONNECT_BY_ROOT 🧠 𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗲𝗱 𝗽𝗿𝗼𝗯𝗹𝗲𝗺𝘀: 1️⃣ Generated numbers from 1 to N using recursion 2️⃣ Identified multiple missing values in a sequence This really helped me understand how SQL can handle tree-like and hierarchical data structures efficiently 🌳 ⚡ 𝗗𝗮𝘆 𝟯𝟯 – 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴, 𝗢𝗟𝗔𝗣 𝘃𝘀 𝗢𝗟𝗧𝗣 & 𝗜𝗻𝗱𝘂𝘀𝘁𝗿𝘆 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀 📌 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴 (𝗜𝗻𝘁𝗿𝗼): Indexes improve query performance by reducing the time needed to fetch data — like a shortcut to locate records quickly. 📊 𝗢𝗟𝗧𝗣 𝘃𝘀 𝗢𝗟𝗔𝗣: • 𝗢𝗟𝗧𝗣 (𝗢𝗻𝗹𝗶𝗻𝗲 𝗧𝗿𝗮𝗻𝘀𝗮𝗰𝘁𝗶𝗼𝗻 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴): – Handles real-time operations (insert/update/delete) – Fast and optimized for transactions • 𝗢𝗟𝗔𝗣 (𝗢𝗻𝗹𝗶𝗻𝗲 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝗮𝗹 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴): – Used for analysis and reporting – Works on large datasets with complex queries 🏢 𝗜𝗻𝗱𝘂𝘀𝘁𝗿𝘆 𝗣𝗲𝗿𝘀𝗽𝗲𝗰𝘁𝗶𝘃𝗲: 🚀 𝗦𝘁𝗮𝗿𝘁𝘂𝗽 𝗖𝗼𝗺𝗽𝗮𝗻𝗶𝗲𝘀: • Cost: High • Time: Less • Efficiency: High • Tools: BigQuery, Snowflake 🏢 𝗣𝗿𝗼𝗱𝘂𝗰𝘁-𝗯𝗮𝘀𝗲𝗱 𝗠𝗡𝗖𝘀: • Cost: Medium • Time: Medium • Efficiency: Medium • Tools: PostgreSQL, DB Query tools 🏢 𝗦𝗲𝗿𝘃𝗶𝗰𝗲-𝗯𝗮𝘀𝗲𝗱 𝗠𝗡𝗖𝘀: • Cost: Low • Time: More • Tools: Oracle 📌 𝗩𝗶𝗲𝘄𝘀 (𝗜𝗻𝘁𝗿𝗼): A view is a virtual table created using a query — it helps simplify complex queries and improves reusability. 💪 Slowly building a strong foundation in SQL, one concept at a time! #SQL #LearningJourney #Day32 #Day33 #RecursiveCTE #Indexing #OLAPvsOLTP #Database #Coding
Like Comment
To view or add a comment, sign in
Ajit Kumar Barick
3w
Report this post
How do you get good at complex data manipulation in SQL? Imagine being able to make informed business decisions. And write easy-to-understand SQL. That is what SQL proficiency is. The expectation from an advanced SQL practitioner is not just the ability to answer complex questions. But the ability to answer complex questions with easy-to-understand SQL. 1. Master the "Logical Order of Execution" 🧠 SQL doesn't run in the order it’s written. The SELECT statement is actually one of the last things the engine processes. The flow: FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Why it matters: Once you realize the WHERE clause happens before your aliases are created, your "Column not found" errors disappear. 2. Think in "Windows," Not Just "Groups" 🪟 GROUP BY is a sledgehammer; it collapses everything. Window Functions (OVER, PARTITION BY) are a scalpel. Want a running total? Use a Window. Need to find the "Top 3 sales per region"? Use DENSE_RANK(). Comparing this month to last month? LAG() is your best friend. 3. Modularize with CTEs (Common Table Expressions) 🧱 If your query looks like a 200-line "spaghetti code" nest of subqueries, it will break. Use WITH statements to break your logic into steps. Step A: Clean the data. Step B: Join the sets. Step C: Final aggregation. Your future self (and your teammates) will thank you for the readability. 4. Solve the "Hard" Problems 🧩 You don't get better by doing simple Joins. You get better by tackling: Gaps and Islands: Finding sequences of consecutive data. Pivoting: Turning "Long" data into "Wide" reports manually. Self-Joins: Managing hierarchical data (like Org Charts). Complex SQL isn't about knowing more commands; it’s about knowing how to structure your logic before you even touch the keyboard. #SQL #DataEngineering #DataAnalytics #BusinessIntelligence #DataScience #CodingTips

1 Comment
Like Comment
To view or add a comment, sign in
Gabriel Marvellous
2w
Report this post
🚀Day 85 of My 100 Days Data Analysis Journey What makes this kind of resource powerful for beginners is simple... It doesn’t just teach SQL commands, it shows how everything connects. If SQL ever felt overwhelming… it’s not because it’s complex, it’s because it wasn’t structured properly. That’s why resources like a well-organized SQL cheat sheet change everything. Instead of scattered syntax, it brings clarity to what actually matters: 🔹 Core Query Structure Understanding how SELECT, FROM, and WHERE work together, the true foundation of every query. 🔹 Filtering & Conditions Using operators, LIKE, BETWEEN, and logical conditions to refine data with precision. 🔹 Sorting & Limiting Results ORDER BY and LIMIT, simple, but essential for making outputs meaningful. 🔹 Aggregations & Grouping COUNT, SUM, AVG, paired with GROUP BY and HAVING; turning raw data into insights. 🔹 Joins & Relationships INNER, LEFT, RIGHT JOIN; where SQL moves from single tables to real-world data connections. 🔹 DDL vs DML Understanding the difference between structuring data (CREATE, ALTER) and working with it (SELECT, INSERT, UPDATE, DELETE). And once that connection clicks, SQL becomes less about memorizing… and more about thinking clearly with data. If you find this helpful, kindly repost to share this with others. #SQL #DataAnalytics #LearningInPublic #DataSkills #TechJourney #100DaysOfCode #Databases

3 Comments
Like Comment
To view or add a comment, sign in

778 followers

82 Posts

View Profile Follow

SQL Data Transformation: Creating New Columns

More Relevant Posts

Explore content categories