Power Query - Remove empty rows and columns
Problem:
How can I remove these empty rows (or columns) in Excel?
Have you ever asked yourself this before? Then this article is for you. I am going to showcase the approach that I have developed through my trial and error experiment of using Power Query (If you have never used Power Query, I would highly recommend to check this out first).
I have been there, done that before.
I have encountered a number of occasions where I need to remove empty rows or columns from the data that I have pulled into Power Query. Sometimes, it is as easy as identifying the key columns that I will need to pull from the original data source and selecting only those rows that have the data in those columns. But in other cases where the data is not either well structured/formatted in a particular way, the process of removing empty rows or columns gets much more difficult than applying the simple method since there is no way to know which columns or fields hold the key information or sometimes even no header information is available. It is particularly true when you are pulling the data from an Excel spreadsheet. This is because Excel embraces the free-form data structure, allowing any input without any constraints or data validation rules. For instance, you can type a text in Column A and skip Column B, typing another text in Column C. Or you type in a date in the first row and a number in the second row of the same column. That is perfectly fine in Excel.
I found a cure.
So, how can we remove empty rows or columns from those free-form data? I believe that I have come up with a simple approach to tackle this problem. Let's start with removing empty rows. It turned out to be achievable by a few lines in Power Query. See below.
// ## power query
// ## how to remove empty rows
let
source = Any_Data, // insert any data source (table) here
columns = Table.ColumnNames(source), // column headers
empty_row = Record.FromList(
List.Transform({1..List.Count(columns)}, each null),
columns
), // create empty row record
remove_empty_rows = Table.RemoveMatchingRows(
source, { empty_row }
) // finally, remove all empty rows
in
remove_empty_rows
You can check out my comment in the snippet above. But I will go line by line about what this query does. The actual query starts from line 3, empty_row. I am creating a list of null records for all columns/fields. (This is a little cheat but I am using List.Transform to achieve the result that I want here. I bet that there is a much smarter way to do the same thing). By using this list of null records, it is just as easy as using Table.RemoveMatchingRows function to remove empty rows from the source data.
Removing empty columns is just a little bit advanced version of the query used for removing empty rows. See below.
// ## power query
// ## how to remove empty columns
let
source = Any_Data, // insert any data source (table) here
columns = Table.ColumnNames(source), // column headers
transpose = Table.Transpose(source), // transpose table (flipping columns and rows)
table_to_rows = Table.ToRows(transpose), // converting table to list of lists (rows)
non_null_count = List.Transform(
table_to_rows,
List.NonNullCount
), // tranform list, counting non null items in list (meaning non null rows)
empty_columns = List.PositionOfAny(
non_null_count,
{ 0 },
Occurrence.All
), // identify all columns with empty rows
remove_empty_columns = Table.RemoveColumns(
source,
List.Transform(
empty_columns,
each columns{_}
)
) // finally, remove all empty columns
in
remove_empty_columns
As a first glance, the query looks completely different from the one for removing empty rows. But I think you will know that the approach is very much the same. The only major difference is that this time we are actually figuring out which columns are empty. When removing empty rows, we have assumed that the empty row has no data in the row. So that generalization made it easier to get around the issue. Perhaps, you could apply the same logic and achieve the same outcome. However, the query above actually went an extra mile, identifying all empty columns then removing them from the source data. Let me walk you through the query line by line. Like the previous query, the actual query starts from the third row, transpose. I have flipped the columns and rows in the source data here. I did this because I want to convert the rows into a list of values which is achieved by line 4, table_to_rows. Now you can see that I have converted a list of values into the count of non-null values (line 5: non_null_count) then located all the empty columns by using List.PositionOfAny function (line 6: empty_columns). The final touch is to remove all those columns from the source data (line 7: remove_empty_columns). I have transformed a list of empty columns which holds the positions of empty columns (meaning that, for instance, if Column1 is empty, then it returns 0 as a position of the column. Remember, a list is a zero-based ordered sequence of values in Power Query).
Question?
Let me know if you have any question about the queries presented in this article. I am more than welcome to explain about them or the functions used in the queries. Also, I would like to point out that you could turn both queries into the custom function in Power Query. Which may come in handy when you want to quickly apply both queries in the query that you have already built to transform the data.