Building Scalable Data Engineering Solution Using Microsoft Fabric

Kiran Butti

Published May 13, 2025

I wanted to share our journey of using Microsoft Fabric for our Data Engineering solution—starting from simple shortcuts to data sources to developing a full-fledged DE framework that others can easily adopt, deploy and use with minimal onboarding effort.

Through this process, we addressed various data engineering challenges, and I want to share our key learnings at each stage.

Source Data Access

One of the biggest expenses in any data engineering solution is bringing source data into local systems. The volume and complexity of the data impact costs, maintenance efforts, and ingestion strategies.

By leveraging shortcuts, we eliminated significant overhead and streamlined data access without worrying about ingestion complexities. As of today, Fabric supports shortcuts for:

One Lake
Amazon S3
Azure Data Lake Gen2
Azure Blob Storage
Google Cloud
Dataverse

However, our project deals with data from 40+ different sources, including Azure Data Lake, Dynamics CRM, SQL Server, and Cosmos DB. While Fabric doesn't provide direct shortcuts for some of these systems, we devised workarounds to access data without writing ingestion pipelines:

Azure Data Lake: Fabric provides a direct shortcut option, allowing seamless access.
Dynamics CRM: Since data is not directly accessible through shortcuts, we used Fabric Link to publish CRM data into Fabric OneLake via a simple Power Apps setup—a 10-minute process that makes CRM data immediately available in the Fabric Lakehouse.
SQL Server: A common OLTP data store, we used Mirrored Azure SQL Database to replicate data into Fabric. Once data is in Fabric, it is accessible using the standard path:
/<database_name>.mountedrelationaldatabase/Tables/<Schema>/<Table_name>
Cosmos DB: Similar to SQL Server, we utilized Mirrored Azure Cosmos DB, enabling effortless replication into Fabric.
Other Systems (Oracle, Snowflake, etc.): Similar mirroring strategies can be applied to access these sources efficiently.

Compute & Transformations

One of the toughest challenges in ETL processing is orchestrating table dependencies. Traditional DE solutions involve complex pipelines, metadata management, or code-based dependency tracking—leading to high maintenance overhead when dependencies evolve.

Instead, Fabric introduces a smarter approach using:

mssparkutils.runMultiple() – allowing dynamic, parallel execution of notebooks without rigid pipeline configurations. Learn more
Automated DAG generation – by scanning notebook code and dynamically constructing dependency graphs at runtime. Since DAGs are built dynamically, there's no need for extensive metadata maintenance.

We could also build DAG objects during code commits using DevOps/Git build pipelines there by eliminating cost to create DAGs at each run —ensuring efficiency and reducing redundancy.

DAG also enables us to assign a batch of notebooks to specific pool. We could take advantage of this feature in by identifying and grouping notebooks into batches thereby reducing Azure spend.

Data Validations:

Data validations are key components of data processing where we want to make sure the data is accurate and with quality.

Traditional approach involves creating specific set of validations by table and execute them after processing corresponding table data.

Better option would be make this metadata driven and create generic validation rules and configure metadata for each table to use these validation rules.

One limitation with both of these approaches is, we can have limited set of validation rules. If a table demands new type of validation, we need to write complex code/logic to implement.

AI-Powered Data Validations

Traditional validation methods rely on predefined rule sets, limiting flexibility.

A more scalable approach is AI-driven validation using prompt-based validation rules. Instead of maintaining static lists, we allow Fabric AI functions to interpret prompts and dynamically generate validation scenarios. Prompts can act as metadata.

This technique makes validation far more adaptable than manually coding rule sets. Check out this article for more details: Validating Data with Natural Language in Microsoft Fabric

Recommended by LinkedIn

AWS Serverless Data Lake Framework (SDLF) as an…

Ashok K Sahoo 1 year ago

Handcrafted Data-Lake & Data Pipeline (ETL) From…

Deb Bose 3 years ago

End-to-End AWS Data Engineering Project

Aniket Sontakke 1 month ago

Security:

With the combination of lake house permissions and One lake data access, we can secure data at entity level. By creating roles by entity and creating Security groups at entity level, we could achieve granular entity level security.

Data Masking and Data Redaction:

Not only securing entities, it is also crucial to mask and redact sensitive data where applicable. Masking and redaction techniques will make sure sensitive data is not readable at the same time retaining table structure.

Fabric provides options to redact data using Fabric AI functions and Presidio libraries. Using these we could redact Personally identifiable data from entities. Learn more

Semantic Model Support & Validation

Semantic model validation is crucial step to make sure the end users are seeing valid metrics. This is generally challenging to perform validations on semantic models. But Great Expectations (GX) framework with Fabric makes it easy and streamlined to perform semantic model validations are runtime.

Using great expectations, we can validate semantic models at various stages including dataset, metrics, tables.

With GX, we could separate context (rules) creation from execution. Context can be created during development, execution at runtime.. Check out this tutorial

Logging & Monitoring

To ensure system health and pipeline performance, we integrated Azure Log Analytics, combining system logs and application execution insights in one place.

For real-time monitoring, we explored:

Azure Grafana – A powerful dashboarding tool with direct KQL-based queries from Log Analytics. Apart from dashboarding capabilities, Grafana provides teams and Emails integration for notifications. It also provides ICM integration (ICM via ADX) where ICM tickets can be directly created as and when errors happen during pipeline execution.
Fabric Data Activator – Enables automated alerts when failures occur, using real-time monitoring and triggers within Fabric. Learn more

We will discuss each of these items in detail and also how we could bundle all these features as deployable solution, in upcoming posts. Stay tuned!!

Conclusion

Our journey with Microsoft Fabric transformed our data engineering solution—from simple shortcuts to a scalable, automated DE framework.

By adopting Fabric's shortcuts, dynamic transformations, AI-driven validations, and real-time monitoring, we built a cost-efficient, adaptable data pipeline while reducing development overhead.

If you are working with Microsoft Fabric, I’d love to hear your insights and experiences! Let’s connect and discuss innovative ways to optimize data engineering workflows.

Élan Jennings-Jones PMP, PMI-ACP, MBA, graphic

Élan Jennings-Jones PMP, PMI-ACP, MBA 11mo

Amazing article Kiran!! Enjoy partnering with you and our team as we drive AI transformation.

To view or add a comment, sign in

Building Scalable Data Engineering Solution Using Microsoft Fabric

Kiran Butti

Source Data Access

Compute & Transformations

Data Validations:

Recommended by LinkedIn

Security:

Semantic Model Support & Validation

Logging & Monitoring

Conclusion

More articles by Kiran Butti

Others also viewed

DataOps: an Automation Journey in Tuidi

Building a Data Lake on AWS

AWS Glue In 2023

Building Data Pipelines with No-Code ETL Using AWS Glue Studio

Beyond Traditional ETL: How AI and Cross-Platform Expertise Can Elevate Your Data Infrastructure to New Heights.

AWS Glue

Why Use Databricks for Data Migration?

Understanding Azure Data Architecture: From Data Engineering to Analytics.

AWS Data Engineering

Explore content categories

Source Data Access

Compute & Transformations

Data Validations:

Recommended by LinkedIn

Security:

Semantic Model Support & Validation

Logging & Monitoring

Conclusion

More articles by Kiran Butti

The Space Between Human and Machine: A Week Inside an AI's Mind During Real Collaboration

DAG-Based Table Dependencies to Solve Complex Pipeline Orchestration Challenges

Others also viewed

DataOps: an Automation Journey in Tuidi

Building a Data Lake on AWS

AWS Glue In 2023

Building Data Pipelines with No-Code ETL Using AWS Glue Studio

Beyond Traditional ETL: How AI and Cross-Platform Expertise Can Elevate Your Data Infrastructure to New Heights.

AWS Glue

Why Use Databricks for Data Migration?

Understanding Azure Data Architecture: From Data Engineering to Analytics.

AWS Data Engineering

Similar topics

Using Azure in Data Engineering Projects

Building Robust Kubernetes Solutions for Scalability

Explore content categories