Programming tip: using try/except for peace of mind

In a data engineering project, when we deliver features step by step, we can come across situations where we need to modify existing tables - for all the 'future oriented' architectures and designs we create, we always make trade offs and new use cases force us to make changes.

To handle such situations, we create table modification scripts and execute them on QA and Prod (most times, on Dev, we execute the commands directly). If the release cycle is long, it can happen that we have requests to modify the same table multiple times - either because such requests are ad-hoc / new or because we had to prioritize one modification over the other.

In such situations, modification scripts can be painful - primarily because we now have a script that contains multiple modification commands. When we execute the script, it fails for the modifications that have been performed in an earlier story / sprint. The solution is to either create one script for each change or comment the earlier modification and execute the new change. Both options are equally painful to manage.

Recently, I decided to use our trusty try/except construct to overcome the pain of file management.

Let us consider example 1 - We need to add a column in one story. We have created a script and executed it, thereby adding the column to the table. In another story, we need to add one more column. To ensure we do not get errors and we also do not have to spend extra efforts, we can enclose each column addition command inside a try/except block.

You will want to know how this helps . . .

When we execute the column addition script and the column does not exist, the command goes through without error. When the same command is executed and the column is found to be present, we get an exception. We can check the exception message. If it indicates 'column exists' or something similar, we are ok to continue. If the error is something else, we rise the exception and do not proceed.

try:
    qs = """
    alter table catalog.schema.table
        add column new_column STRING comment 'comment';
    """
    spark.sql(qs)
except Exception as ex:
    etext = str(ex)
    if "[FIELD_ALREADY_EXISTS]" in etext:
        print("Column present, nothing to do")
    else:
        raise        

This way, even if we execute the same script multiple times, it will not fail if the column is available in the table.

We can follow same approach for other table operations - say dropping a column.

try:
    qs = """
    alter table catalog.schema.table
        drop column existing_column;
    """
    spark.sql(qs)
except Exception as ex:
    etext = str(ex)
    if "A column, variable, or function parameter with name" in etext and 'cannot be resolved' in etext:
        print("Column not present, nothing to do")
    else:
        raise        

#try_except #table_operation

To view or add a comment, sign in

More articles by Bipin Patwardhan

Explore content categories