Data Processing Made Simple - Your First Data Process with Maestro

Thomas Fowles

Published Aug 25, 2025

Last week I wrote why I built Neudesic Maestro - today I want to show it in action. Maestro was built to be a better execution configuration system. I will highlight how Maestro does a lot of work for you - leaving your data processes simple and easy to understand.

Let's create a common data process - extracting data from a SQL database and saving the data as parquet files. I will be connecting to an AdventureWorks database I have created, hosted on an Azure SQL Database. And I will be using Microsoft Fabric pipelines to perform the actual data movement.

Creating our Maestro Module

Metadata-driven processes always start with execution configuration data. With Maestro, we can build modules (aligns to data processes), module properties (aligns to process parameters), and module items (aligns to process executions). The application gives us pages to create modules and module properties.

First, I add a new module to perform the SQL extraction to parquet:

Article content — The module to power our SQL extraction to parquet files data process.

Next, I add a module property for each process parameter that I will use in my pipeline:

Maestro provides us with some options as we define these module properties:

Type: Maestro supports strings, integers, decimals, booleans, and code. The code type allows users to edit the value in a large code screen.
Default Value: This allows us to specify an initial value for the property for existing and new module items.
Order: Execution configuration data is easy to think of and read in tabular format. This ensures that we can put the property in an order that is easy to understand.
Category: Supports configuration and runtime. Most properties are configuration, meaning that the user sets it and it should be maintained by the user. Runtime is useful for properties like a watermark that automated processes typically control.
Actions: If we add a property and no longer need it, we can delete it. Note that everything in Maestro is a soft delete. We can get the property back easily if needed.

I find it very common to start with an idea of the needed parameters and then change them as I build out my process. Maestro makes it easy rename, remove, or add new module properties. What if our pipeline needs to change in the future? Its as easy as updating the module properties in Maestro to match the pipeline change.

Creating a Module Item

Now that we have defined our module, we now need to define one or more module items. At runtime these items are passed to our dynamic pipeline - extracting data from tables and placing that data into files. Let's start by creating a single module item in Maestro. Let's use the SalesLT.Customer table in the AdventureWorks database. Let's first start by reviewing our module properties:

SourceQuery - This is the query that we will execute on the AdventureWorks database
FabricWorkspaceId - Our pipeline will allow us to define our lakehouse target dynamically. We will need to give the pipeline a Workspace ID due to this.
TargetLakehouse - This will be the GUID of the the lakehouse we want to store our parquet file in. Note that Fabric does not yet support names if supplying a value dynamically.
TargetDirectory - This is the folder structure we will use to store the file.
TargetFileName - The name of the parquet file we are creating.
SchemaName - The source schema name. This is used to build our query expression.
TableName - The source table name. This is used to build our query expression.
SelectColumnList - The columns we wish to select. This is used to build our query expression.
WatermarkColumnName - The column name used in our incremental query. This is used to build our query expression.
Watermark - The watermark value. This is used to build our query expression.

And here are the values for our SalesLT.Customer module item:

SourceQuery - {DefaultSqlServerQuery}
FabricWorkspaceId - {FabricWorkspaceId}
TargetLakehouse - {BronzeLakehouseGuid}
TargetDirectory - AdventureWorks/SalesLT/
TargetFileName - Customer.parquet
SchemaName - SalesLT
TableName - Customer
SelectColumnList - *
WatermarkColumnName - ModifiedDate
Watermark - (not set as it has not yet executed)

Looking at these values, you are probably surprised by the query, workspace, and lakehouse values. What's with the curly braces? Those are Maestro tokens. Let's dig deeper into tokens and building expressions with Maestro.

Maestro Tokenization

A core principal of Maestro is that it should make things easier for our developers. I found that data engineers would usually keep values in execution configuration systems very basic - "SELECT * FROM table" - is a good example. And its for good reason. We have a system that will execute thousands of activities nightly. If we make complex statements, it may increase maintenance. Do you want to update thousands of queries when you find an issue with your pattern?

That's where Maestro comes in. It is a smarter execution configuration system. It allows us to "push left" - the system should be able to do most of the thinking so our processes get those values back and run with them. This reduces process complexity tremendously. One way that Maestro is smarter is through its tokenization system. Within our module items, you can use tokens - text surrounded by curly braces - to indicate that you want to perform some kind of find and replace. Token types include:

Environment Variables - Do you have a value that may need to change environment-to-environment? Create an environment variable and reference it in your module item. Now the value remains the same in each environment (i.e., easy promotion), but the runtime value will be what it should be, depending on the environment you are in.
Constants - Do you need to build a file path that includes date parts, like years, months, etc.? Maestro will replace these tokens with the proper value using the current execution time.
Other Module Properties - Do you want to reference another module property of the same module? Maestro can use other properties in its token system as well. This allows us to build expressions. This is a very powerful feature.

Let's take a look at our SQL extraction query. It is stored as an environment variable named DefaultSqlServerQuery.

DECLARE @Watermark DATETIME2; SET @Watermark = TRY_CONVERT(DATETIME2, NULLIF('{watermark}',''));
DECLARE @WatermarkColumnName SYSNAME; SET @WatermarkColumnName = NULLIF('{watermarkColumnName}','');

IF @Watermark IS NULL
BEGIN
	SELECT {SelectColumnList} FROM {SchemaName}.{TableName};
END;
ELSE IF (@WatermarkColumnName IS NOT NULL)
BEGIN
    DECLARE @sql NVARCHAR(MAX);
    SET @sql = N'SELECT {SelectColumnList} FROM {SchemaName}.{TableName} WHERE ' + QUOTENAME(@WatermarkColumnName) + ' >= DATEADD(HOUR, {DefaultIncrementalHours}, @Watermark);';
    EXEC sp_executesql @sql, N'@Watermark DATETIME2', @Watermark;
END;
ELSE
BEGIN
	RAISERROR('No valid option found. Please ensure the watermark column name is set properly.',16,1);
END;

A few things you may notice:

Building our Data Extraction Pipeline

The big question many of you are probably asking is - how do I get the data out of Maestro and leverage it in my pipeline? Maestro provides stored procedures that act as our interface into the system. Users will not and should not write custom queries to retrieve data from the system.

To get the module item information from Maestro, we run the below procedure:

EXECUTE [NeudesicMaestroRuntime].[uspGetModuleItem]
    @ModuleItemName = 's2pl.adv.SalesLT.Customer',
    @EnvironmentName = 'Dev';

This procedure will return the exact same properties as the previous image - the module item resolution feature within Maesto. This makes it easy to know exactly what will be returned to whatever data processing tool I use.

Let's take a look at the Fabric pipeline.

Maestro allows our data processes to follow simple templates. This pipeline does the following:

Retrieve the Module Item Data - First, we need to get the runtime values so we can use them in the pipeline. Our pipeline has the ModuleItemName and EnvironmentName parameters, which we will pass to our lookup activity.
Perform the Extraction / Copy - Next, we use the module item's properties - query, lakehouse id, filename, etc. - to perform the data copy activity
Update the Watermark - Last, we update the watermark value of the module item. This ensures ongoing incremental data extraction.

You may be wondering - how do we reference the module item properties? It looks like this:

@activity('MaestroModuleItem').output.firstRow.TargetFileName

Very clean, and very simple. This is also simple when we add new module properties - simply add the property in Maestro, and then use it in the pipeline. No need to create new pipeline objects like variables, parameters, etc.

The pipeline itself is also easy to understand. No extra steps for limited find and replace. No branching logic for full load vs incremental loads. No coming up with a custom solution for handling watermark values. Maestro makes it easy to build data processes.

Wrap Up

As a developer, I've found that building data processes with Maestro is fast and easy. Processes that would have taken days can be done in hours. I don't have to spend any time building out my execution configuration system. Maestro's templates also make it incredibly easy. I just copy the template and update the activities as needed. I spend my time focused on building out the data platform - not on the execution configuration system.

We have just scratched the surface of what an advanced execution configuration system can do. In the upcoming weeks we will explore scaling our data platform with Maestro. Bulk updates, environment promotion, ADO integration, automated module item creation. We will also look at how we can leverage a copilot to maintain data in our Maestro system.

Next week we will dig deeper into scaling this pipeline. We will add logging to the pipeline and leverage Maestro's bulk update system to create dozens of module items. We will wrap it up with promoting our module items to the production environment. See you next week!

Arman Saffarnia 7mo

Great walkthrough, Thomas Fowles. Maestro is simplifying what used to be complex, and enabling teams to focus more on delivering results.

To view or add a comment, sign in

Data Processing Made Simple - Your First Data Process with Maestro

Thomas Fowles

Creating our Maestro Module

Creating a Module Item

Maestro Tokenization

Recommended by LinkedIn

Building our Data Extraction Pipeline

Wrap Up

More articles by Thomas Fowles

Others also viewed

Apache Hudi 0.14 Announces hudi_table_changes: Making CDC and Incremental Queries Easier

Introduction to Snowflake Streams

Data Structures powering our Database Part 1 - Hash Indexes

Control Flow & Data Flow-The fundamental blocks of package design

Snowflake Zero-Copy Cloning

Turning 50+ Million Company Records Into a Reliable Data Pipeline

What is a Database Schema?

Optimizing Data Refresh in Snowflake: Streams and Tasks vs. Dynamic Tables vs. Materialized Views

Three simple rules to not fail (too much) the foundations of your data design

Relational v’s Star Data Schema’s

Explore content categories

Creating our Maestro Module

Creating a Module Item

Maestro Tokenization

Recommended by LinkedIn

Building our Data Extraction Pipeline

Wrap Up

More articles by Thomas Fowles

Code Factories Are Real — Here Is What Ours Looks Like in Fabric

Data Engineers Who Move First on Developer Agents Are Going to Pull Away Fast

Neudesic Maestro - An Execution Configuration System for Modern Data Platforms

Leveraging Fabric SQL Databases for Metadata-driven Processes

Others also viewed

Apache Hudi 0.14 Announces hudi_table_changes: Making CDC and Incremental Queries Easier

Introduction to Snowflake Streams

Data Structures powering our Database Part 1 - Hash Indexes

Control Flow & Data Flow-The fundamental blocks of package design

Snowflake Zero-Copy Cloning

Turning 50+ Million Company Records Into a Reliable Data Pipeline

What is a Database Schema?

Optimizing Data Refresh in Snowflake: Streams and Tasks vs. Dynamic Tables vs. Materialized Views

Three simple rules to not fail (too much) the foundations of your data design

Relational v’s Star Data Schema’s

Explore content categories