Name: Excel vs Python for Data Cleaning: Power Query vs Pandas | My Online Training Hub posted on the topic | LinkedIn
Uploaded: 2026-03-19T11:45:00.088Z
Duration: 13 min 4 s
Channel: My Online Training Hub

My Online Training Hub

8,239 followers

1mo

Python is great for data science. But using it to clean data is overkill. A popular YouTube tutorial shows how to clean SurveyMonkey data using Python and Pandas, it took the developer 1 hour. The same transformation in Power Query? 5 minutes. Most data analysts don't realize Excel can do this. They assume Python is the only serious option for data cleaning. But Power Query has been built into Excel since 2010, and it handles transformations like unpivoting, merging, grouping, and calculated columns without writing a single line of code. In this video, I walk through the exact same dataset and show you how to clean it 12x faster using Power Query. If you've been putting off learning Python just to clean data, you don't need to. Watch the video and download the practice file: https://lnkd.in/d7E3TiDU ❓Do you use Python or Power Query for data cleaning? #Excel #Python #DataCleaning

13 Comments

Transcript

Python is amazing for data science, but using it to clean data is like using a chainsaw to cut a birthday cake. In this popular YouTube tutorial by Shashank Kalanithi that's been viewed over three and a half million times, he shows how to clean SurveyMonkey data using Python And Pandas. This is a real job he did for a client, which took him one hour. And to be clear, Shashank does everything correctly. His process is solid, and he's great at explaining it. But here's the thing, there's a much faster way to do this. A way that's fully automated, 100% repeatable, and requires 0 coding. Many data analysts don't realize it's possible, but Power Query in Excel can do the same transformation in just 5 minutes. No scripting, no debugging, just as simple, efficient process that anyone can use. In this video I'll show you how to clean the same data set 12 times faster so you can work smarter, not harder, and even outbid Python pros for jobs like this. Looking at the data you can see the first problem is we need to fix the headers because they're split over 2 rows and they should be in a single row. If I scroll across, you can also see the questions I'll split into individual columns with additional columns for each response. Plus the question numbers are only listed for the first response, meaning you can't just concatenate these two rows together because we won't know what question number that response relates to. So we need to fill these empty cells first. I scroll back to the left towards the front. We have some empty columns which Shashank removed for privacy. We don't need them in the final output anyway. Speaking of the final output, let's take a look. The client wanted the data in a vertical or tabular layout where each question, question and sub question and answer are in their own columns, with additional columns for the count of unique respondents per question and the number of respondents that have the same answer for each question and sub question. With this layout we can easily analyse and visualise the data in Excel. Tableau or Power BI. Now going back to the original data, the first thing we need to do is fill in these blanks cells that we have in the first row and then concatenate the two headers together into one. Shashanka actually used Excel to do this by transposing them with copy and paste into separate columns. He then wrote formulas to copy down the question numbers, which you can see there, and he repeated for the 2nd row of headers. It actually wasn't required. And then finally he concatenated. These two columns together into one header, which he copied and pasted back as his single row of headers. Interestingly, these steps took 10 minutes and they're the most watched part of the whole video. So let me show you how to use Excel's built-in Power Query tool that's been around since Excel 2010 to transform this data into a tabular layout in a fraction of the time. I'll start by going to sell a one and then control A to select all the data. Then on the Data tab of the ribbon. We're going to go from table range. Excel wants to know if my data has headers. Well it does, but they're split over 2 rows so they're no good to me. So I'm going to leave that unchecked and click OK. This opens the Power Query window and I have a preview of the data. We'll collapse the queries pages to give ourselves a little bit more space and then on the transform tab I'm going to transpose so we can see now our header rows are now in separate columns. The first thing I need to do is fill down these null or empty. Self, which we can do by the fill drop down and then down, and I'm holding down shift. I'm going to select column one and two, and then on the add column tab, we're going to merge the columns into one. Here I can choose a separator. Shashank used on I'm going to go with. If you want to use a hyphen, you can choose custom. The new column name doesn't matter, so I'll leave it as merged. Click OK, and if we scroll across to the far right, there's my single column of headers. Now the reason I'm merging them via the Add column menu is so that the separator isn't included on these first 7 rows. You can see here there's no value in column two, so there's no point in having a colon after these labels in column one. If you merge them via the transformed tab and merge columns, you'd end up with the colon there as well. So the trick is to add column merge columns. That way you don't get the redundant colon on these first 7 rows. Right? With that I'm going to right click move it to the beginning. Now I don't need these columns anymore, so holding shift to select them, press delete. And now all I need to do is transform and transpose it back. You can see there's my single row of headers. Let's promote them. Use first row as headers. And that job's done. That wraps up the data cleaning Shashank did with Excel copy and paste and formulas. From here onward, Shashanka's Python. I'm going to continue with Power Query. I'll start by tidying up the columns I don't need. We don't need start date through to Custom Data 1, so just holding Shift to select them. Press Delete. That's done. Next, let's rename these demographic data columns to make them more succinct and easier to work with. There are two division columns that will differentiate them with Division, Primary and Division. Secondary and then we have position generation. Gender, Tenure, and finally Employment type. Next I need to unpivot these question and response columns, but first notice that I have some cells without answers contain NULL and some contain blanks. I need them all to be consistent, so I'm going to go back and select the question and Response columns holding shift, scroll to the far right and select the last column and then on the Transform tab I'm going to replace. Please and here I want to replace null values with blanks. If I leave the null values there, when I unpivot, it's going to emit those rows that have no answers. So let's scroll back to the far left and then I'm going to select the columns that I don't want to unpivot as all of this demographic data and the respondent ID. And then I'm going to right click and unpivot other columns by upvoting other columns if in future I have survey data. With a different number of questions and answers, I won't have to make any changes to this query in order to reuse it. Loving how easy Power Query makes data cleaning. This is just the beginning. If you want to stay ahead of the curve and master automation, check out my Power Query course. You'll learn how to eliminate hours of manual work with hands on practical lessons. Plus get personal support and mentoring from me so you can stop wasting time on tedious tasks. You'll find the link in the description and pin comment. So join today and take your Excel skills to the next level. So that's my data. I'm pivoted now all I need to do. Is count the number of respondents per question. And for that I need to extract the question number into its own column while also maintaining this question and response column. So I'm going to add a column, extract text before delimiter. Remember our delimiter is the colon. Click OK. Now hit scroll to the right. There's my question column. There's my question and response column. Let me just left click and drag this so that we have question, then question and response, then answer. Let's rename these questions. Question plus sub question and this one is answer. OK, we're done with the data cleaning. The final task is to add columns for the number of respondents with the same answer and number of respondents for each question. But first let's just call this query on pivoted data. And then let's expand the queries. I'm going to right click this query and reference it. Let's scroll across to the right. I need to filter out the blank answers so I don't incorrectly count them as responses. And then I want to extract just the question column holding down control and the respondent ID column. Right click, Remove other columns, and with the question column selected, we're going to go to the Home tab and we're going to group by. It's gripping by the question column. The new column name doesn't matter, but the operation needs to be count distinct rows click OK and there's my distinct count of respondents per question. Let's rename this query number of respondents. And will repeat for our same answer count, so right click reference. Again, we need to filter out the blank answers. Click OK Now here I need the question and subquestion the answer column. And holding control the respondent ID, right click, remove other columns and again we're going to group by. Now this time we need an advanced grouping because we need to add a grouping for the answer. The new column name can be COUNT. The operation is COUNT distinct rows click OK, and there's the number of employees who gave the same answer for each question and subquestion. Let's quickly rename this same answer count. And now I'm ready to bring the number of respondents and same answer count into my unpiloted data. To do this I'm going to merge queries as new. The first table is my on pivoted data and then I'm going to merge it with the number of respondents. Scrolling across, I need to select the columns that match. So it's the question column here and the question column here. The joined kind will be a left outer that is all the records from the first table and only matching records from the second table. Click OK. And over on the far right, we have a new column with a table. Clicking on the expand button, I'm going to bring in the count column, and we'll use the original column name as the prefix. So just save me time renaming it because what I want to rename it is number of respondents. So we'll just delete count on the end, press enter. That's that one done. Let's repeat. We're going to remain in this new query called Merge 1. And I'm just going to merge queries, not merge queries as new Merge 1 is our first query. And then I want to bring in my same answer count. This time I need to match the question and subquestion holding control and the answer column with the question and subquestion holding control and the answer column. So you notice this is column one, this is column two. Likewise, column one and column two. And again, we're using a left outer, so click OK clicking on the Expand button here. I just want the count and we'll use the original column name. Double click and we'll get rid of count on the end and that's that done. Let's rename this query final and I'm ready to load the data back to Excel for the client. So I'm going to close and load 2. This is just going to allow me to choose where to load the data. I'm going to choose only create connection because I don't want to load all of the queries. I only want this final one loaded. So I'm going to right click it, load 2 and we're going to pop it in a table on a new worksheet. And there we have the cleaned data ready for the client to use. And if I go to the very last row, you can see it's 17, 000 and 29 which includes the header row. If we go back to Shashank Data, let's just check he had the same number of rows 17029. So job done in a fraction of the time and 0 coding. But remember I said at the beginning that this was 100% repeatable. Well, if we go to the data tab and I'll open up the queries and connections. Let's double Click to open the query editor on the right hand side. Here you can see in the applied steps *** Power Query has recorded every transformation step, much like a macro recorder. So if I want to reuse these queries for a new SurveyMonkey data set, I can simply point these queries to a new file and then close the query editor. On the Data tab of the ribbon I can click Refresh All and Power Query will apply all of those transformations to the new data or if there are changes. To my data set here that I'm connected to. I can simply go to my final query and refresh all to get those updates. And of course anything connected to this data like reports, charts, pivot tables, etc will automatically pick up the updates. Pretty cool. But this is just the start. Power Query can automate so much more than just survey data cleaning. In my next video I'll show you 6 more ways to use Power Query to save hours of manual work. So click here to watch now and take your Power Query. Skills even further. I'll see you there.

Deep Dive Data Consulting 1mo

I agree with your point: for some specific tasks, power query and excel is the optimum solution, especially for a companies that already use MS - Office. #DataCleaning #Excel #PowerQuery

1 Reaction

Isaac Hernandez 1mo

I agree. Excel has gotten a bad rap because I don’t think there a lot of people who even know about Power Query. I started learning about it in 2014 and I have used it ever since. It’s awesome.

1 Reaction

Dr Martin Chambers ACMA, CGMA 1mo

Power query when the source data are Excel files but really if it’s coming from a database most of the data work should be in SQL code due to the inherent performance and scalability limitations with MS products.

1 Reaction

Jaime Pérez 1mo

Mantenlo siempre simple... Lo importante es llegar a dar el resultado. caminos hay muchos, pero hay que buscar lo práctico, pienso yo.

1 Reaction

Noes Boera 3w

Many Thanks 👍👍 for posting this Very Educational Tutorial. Great to see how messy Data can be cleaned in a short amount of time using Power Query. Keep Up The Excel-lent Work 💪💪.

Scott A. 1mo

Power Query, it is a great rinse and repeat tool for data cleaning.

1 Reaction

Matthew Barros 1mo

Been watching your Youtube videos on this and have to say that PQ is the best thing for data cleaning and so much more! Thank you!

Shazia Siddiq 1mo

Power query far quicker

Steven Swajanen 1mo

Great stuff Thank you.

See more comments

To view or add a comment, sign in

More Relevant Posts

Noes Boera
3w
Report this post
Check out this Very Useful Post & #Tutorial from My Online Training Hub ⬇️ to see how messy #Data can be cleaned in a short amount of time, using #PowerQuery in #Microsoft #Excel. #MicrosoftExcel Rulezzzz Forever 🤩😍💪💪🙌🙌. #ExcelTutorials #DataCleaning #ExcelTips #ExcelTricks

My Online Training Hub

8,239 followers
1mo

Python is great for data science. But using it to clean data is overkill. A popular YouTube tutorial shows how to clean SurveyMonkey data using Python and Pandas, it took the developer 1 hour. The same transformation in Power Query? 5 minutes. Most data analysts don't realize Excel can do this. They assume Python is the only serious option for data cleaning. But Power Query has been built into Excel since 2010, and it handles transformations like unpivoting, merging, grouping, and calculated columns without writing a single line of code. In this video, I walk through the exact same dataset and show you how to clean it 12x faster using Power Query. If you've been putting off learning Python just to clean data, you don't need to. Watch the video and download the practice file: https://lnkd.in/d7E3TiDU ❓Do you use Python or Power Query for data cleaning? #Excel #Python #DataCleaning
Like Comment
To view or add a comment, sign in
Muhammad Rizwan
1mo Edited
Report this post
I created this simple SQL and Python cheat sheet to quickly revise the most important concepts every data analyst should know. From querying data in SQL to analyzing it with Pandas, this covers the essentials in one place. Save it for later & share with someone learning data analytics. #DataAnalysis #SQL #Python #Pandas #DataScience #Learning #Analytics
Like Comment
To view or add a comment, sign in
Ratnajit Chakraborty
2w
Report this post
🚀 Time Series Analysis in SQL & Python — Real-World Challenges & Solutions Time series calculations in SQL can be surprisingly frustrating… At first, it feels simple — but once you start working on real business problems, things get tricky: When to use > vs >= Defining last 7 days vs last week correctly Identifying users who haven’t ordered in the last 30 days Rolling vs calendar-based calculations Even a small mistake in date logic can completely change your insights. While working with product and sales teams, I came across multiple such scenarios where accurate time-based logic was critical for decision-making. 👉 To organize my learning, I’ve created a small project where I’ve documented: Practical SQL time-based problems Clear and correct approaches Python (Pandas) validation using Jupyter Notebook 📂 I’ve shared: SQL queries Jupyter Notebook A quick reference guide on my GitHub: 👉 https://lnkd.in/gn5kg-xh I’ll continue adding more real-world tasks as I come across them while working on different use cases. 👉 Follow me for more practical tasks and insights like this. #SQL #Python #DataAnalytics #TimeSeries #DataScience #BusinessAnalytics #LearningInPublic #Analytics
Like Comment
To view or add a comment, sign in
Sathiyanarayanan A
2w
Report this post
Week 14(notes) Python Pandas Essentials for Data Analysis ✨ 🐍 Python + Pandas = Powerful Data Analysis some fundamental Pandas operations that every data analyst should know: 📌 1. View First Rows Use head() to display the first 5 rows of a dataset. df.head() 📌 2. View Last Rows Use tail() to display the last 5 rows. df.tail() 📌 3. Statistical Summary Get quick insights like count, mean, std, min, max using: df.describe() 📌 4. Select Single Column df['Name'] 📌 5. Select Multiple Columns df[['Name', 'Age']] 📌 6. Add New Column df['Salary'] = df['Age'] * 1000 📌 7. Basic Filtering Filter rows based on a condition: df[df['Age'] > 25] 💡 Pandas makes data cleaning and analysis fast, simple, and efficient. #Python #Pandas #DataAnalysis #Data #Aspiring #LinkedInLearning #100DaysOfCode #Analytics #CareerTransition #Techdatacommunity #LearningJourney.
Like Comment
To view or add a comment, sign in
Jyoti soni
3w
Report this post
This cheat sheet helped me understand how SQL, Python, and Excel work together in Data Analytics 📊 As a beginner, I am learning how to: • Query data using SQL 🗄️ • Analyze data using Python 🐍 • Work with data in Excel 📈 Step by step, I am improving my skills and building projects 🚀 #DataAnalytics #SQL #Python #Excel #LearningJourney
11 Comments
Like Comment
To view or add a comment, sign in
Dinesh Kumar
1mo
Report this post
🚀 Day 1/20 — Python for Data Engineering From SQL to Python: The Next Step After spending time with SQL, I realized something: 👉 SQL helps us query data 👉 But real-world data engineering needs more than that. We need to: process data transform data move data across systems That’s where Python comes in. 🔹 Why Python? Python helps us go beyond querying: ✅ Process data from multiple sources ✅ Build data pipelines ✅ Automate workflows ✅ Handle large datasets efficiently 🔹 Simple Example import pandas as pd df = pd.read_csv("data.csv") print(df.head()) 👉 From raw file → usable data in seconds 🔹 SQL vs Python (Simple View) SQL → Get the data Python → Work with the data Together, they form the foundation of data engineering. 💡 Quick Summary SQL is where data access begins. Python is where data engineering truly starts. 💡 Something to remember SQL gets the data. Python makes the data useful. #Python #DataEngineering #DataAnalytics #LearningInPublic #TechLearning #Databricks
Like Comment
To view or add a comment, sign in
Jyoti soni
3w
Report this post
Understanding the difference between Excel, SQL, and Python is very important in Data Analytics 📊 Here’s a simple comparison I created to understand how these tools are used for different tasks 💡 As a Data Analytics learner, I am currently building my skills in: • Excel 📈 • SQL 🗄️ • Python 🐍 This helped me get a clear idea of when and where to use each tool 🚀 🔹Which tool do you use the most in your work? 🤔 #DataAnalytics #SQL #Python #Excel #LearningJourney
11 Comments
Like Comment
To view or add a comment, sign in
Shankar Maheshwari
1mo
Report this post
🐍 How well do you know Python Libraries? Here are 4 must-know Python libraries every aspiring Data Analyst & Developer should master 👇 📊 Data Manipulation? → Pandas The backbone of data analysis in Python. DataFrames, filtering, groupby — it's all Pandas. 📈 Data Visualization? → Matplotlib import matplotlib.pyplot as plt — your gateway to charts, plots & visual storytelling. 🔢 Numerical Computations? → NumPy Arrays, matrices, mathematical operations — NumPy makes it fast & efficient. 🌐 Web Scraping? → Selenium Automates browsers to extract data from dynamic, JavaScript-heavy websites. ✅ These 4 libraries alone can take you from zero to job-ready in data roles! 💬 Which Python library do YOU use the most? Comment below 👇 #Python #PythonLibraries #Pandas #NumPy #Matplotlib #Selenium #DataAnalytics #DataScience #WebScraping #PythonProgramming #LearnPython #DataAnalyst #TechSkills #PythonForBeginners #LinkedInLearning #CodingTips #Analytics #Programming #TechCommunity #UpSkill
Like Comment
To view or add a comment, sign in
Preethi Ravula
2w
Report this post
Most analysts know SQL. Most analysts know Python. Very few know how to combine them efficiently. That’s why many stay average. Here are a few things I wish I learned earlier: In SQL: → WHERE cannot filter aggregated results If you're filtering grouped data, use HAVING. → Window functions save messy subqueries Use RANK(), ROW_NUMBER(), SUM() OVER() for ranking and running totals. → LAG() and LEAD() beat self-joins Comparing current vs previous period? One line does what multiple joins often can’t. In Python: → Do not load unnecessary data Filter in SQL before bringing it into pandas. → Avoid for loops in pandas Vectorized operations and apply functions are significantly faster. → Stop hardcoding dates Use datetime so your scripts stay dynamic and reusable. The real power comes when you combine both: → Pull data with SQL → Transform it in Python → Push results back with to_sql() That workflow alone will make you more efficient than most analysts around you. Knowing SQL or Python is useful. Knowing how to use both together is what separates strong analysts from average ones. #DataAnalytics #SQL #Python #AnalyticsEngineering #CareerGrowth

4 Comments
Like Comment
To view or add a comment, sign in

8,239 followers

View Profile Follow

Transcript

More Relevant Posts

Explore related topics

Explore content categories