Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 3: The Silver Layer!)

Patrik Borosch

Published Jul 22, 2025

Welcome back to part three of this series. Let's first recap what happened so far:

Part 1: We used Copilot to create functions and methods in a PySpark Notebook in Microsoft Fabric to get data from files into the Bronze Layer of our setup.

Part 2: We created a Fabric Data Factory Pipeline to get parameters from a .csv file (this could also be a SQL DB or others) and start the Notebook from part 1 with the parameters read injected during runtime.

And we have only just begun.

In this part we are going to examine two different topics:

We want to create shortcuts from the bronze layer into the Silver layer programmatically
The Silver layer will display a Data Vault structure for the data that we load. I want to achieve an insert only core analytical layer that will allow easy maintenance and provide standard methods to keep the data consistent and clean even in case of failures during loads.

Off we go. Let's start with a new notebook to create the shortcuts as a bridge between the layers bringing in the Bronze layer tables to have them available to work on. (sure this is not necessarily needed but we wanted also to experiment and build that muscle for other moments when this will be necessary like creating a workspace for a team and providing them a bunch of shortcuts to tables to give them a head start.

But back to the shortcuts and how to create them. There is a growing collection of REST APIs that you can use to programmatically manipulate Fabric infrastructure like

Automating workspace management
Managing data pipelines
Automating security and access control
Resource management, monitoring and optimization
Automated deployment

You'll find the API reference here: Microsoft Fabric REST API references - Microsoft Fabric REST APIs | Microsoft Learn.

Another (newer) method to interact and manage Fabric programmatically is the Command Line Interface (CLI): Fabric command line interface - Microsoft Fabric REST APIs | Microsoft Learn that will also help admins and developers in performing tasks in the Fabric realm.

First we'll setup again like in the Bronze loader notebook some parameters in an explicit cell to be able to catch params from the calling job like my Fabric Data Factory Pipeline from part 2.

The second step in this notebook is a function that receives the return value from a REST call and gets the ID for an object name that we're looking for. Let's ask Copilot again for this:

Nice work again, buddy. With this function we can now just pass the name of an item that we are searching for and get back the ID that we need for the REST API call to create the shortcut.

This is the thought sequence of what we need to go through:

Get the WorkspaceID for the workspace name that has been provided via the parameter p_workspaceName
Get the Lakehouses that are present in the given workspace represented by the target_workspaceId
Get the itemID of the Lakehouse represented in the target Lakehousename (in Fabric speak this is the target that the shortcut is pointing at, where the data is residing that the shortcut displays)
Create the URL for the post
Create the JSON body for the post
Create the shortcut

With the function that we have created above this would look something like:

And with those IDs we create the POST URL like described (OneLake Shortcuts - Create Shortcut - REST API (Core) | Microsoft Learn):

Finally run the POST:

And there we are in the Silver Lakehouse:

Again, I don't provide the notebook for download. Have some fun and try Copilot yourself. You'll be surprised how fast you will have the solution.

But let's move to the second step of our plan and create the Data Vault structure from the source data and start a new notebook.

Remember Data Vault? Dan Linstedt introduced the Data Vault methodology in the 1990s, with formal publications and adoption gaining traction in the early 2000s. It was designed to address scalability, auditability, and agility in enterprise data warehousing.

Main Building Blocks:

Hubs: Contain unique business keys (e.g., CustomerID) with minimal attributes.
Links: Represent relationships between hubs (e.g., Customer-Product).
Satellites: Store descriptive and historical data related to hubs or links, including timestamps and source tracking.

Why am I targeting this methodology? In one of my bigger Data Warehouse projects we had the challenge that the night window for loading was filling up and it was obvious that it won't be enough quickly. The other challenge that we ran into was the data quality of the source. The old warehouse was pretty messed up and cleaning up things after another failed job was a lengthy and ugly task with lots of manual SQL to be run by the maintenance team.

With Data Vault we succeeded to build an insert only core warehouse structure that even was able to benefit from the great performance that the SQL Server delivered. We went from sequenced pipelines to parallel loading of the hubs, the satellites and the links. With the history of any depth manifesting in the Satellites and the canonical structures of the tables we were also able to create easy to understand and use SQL scripts as tools for the maintenance teams in case of failure or inconsistency. And yes of course the different table structures lead to more complex SQL / views from the presentation layer to the marts but the warehouse server was able to digest those quite well.

Let's go and examine how we can build a generic Spark notebook that would take the needed parameters and form a data vault structure in our Silver Lakehouse form the sources that we load in the Bronze layer.

The parameters that are needed so far:

Recommended by LinkedIn

The Field-Level Lineage Gap in Microsoft Fabric (And…

Francois de Wet 2 months ago

Microsoft Purview OpenLineage Connector for Azure…

Nazeer Ali Mohammed 3 years ago

Turning Microsoft Fabric Data Agents into MCP Servers:…

Neelam Mourya 2 months ago

We get the table to work on, the primary key, all the attributes that we want to see in the hub and the satellite and a source identifier to be able to audit the content later.

And then again my friend and co-coder comes to rescue again:

And the result:

Not sure how you feel about this and for many people this is of course already daily routine. I still am quite excited about these helpers.

The finish line is right in front of us now. Another instruction to the Copilot to create a function for the Data Vault satellite table that will complete the hub from above:

And the resulting function:

Well, Hub and Satellite functions are available now. The Data Vault though offers uses another type of tables. Let's create a function for a link table to complete the loader and to be able to reflect the facts of an analytical model.

This time I'll do a learning check and try and see what I have learned from the other functions that I have created so far and for this time going without the Copilot showing some sportsmanship.

To be able to call this one separately I'll create another notebook for this. With a separate notebook we can provide separate parameters and call it also in parallel to the notebook from above.

With all the typical cells like before we'll have the following parameters:

First we'll create the link table function. My parameters are:

A data frame with the data that should finally be loaded to the link table and the according satellite.
The source table name from the silver layer. This is one of the shortcuts that we have created above.
The business key that defines the row in the link table. This might be one or more columns.
A source id that points back to where the data is coming from.
An array of table names and columns that will be used to create the hash keys to link to other tables. The columns should be present in the actual list of attributes so that we can go straight forward with the creation of all the needed information. These can be again one or more columns.

The function will:

Select all the attributes needed into a new data frame.
Create a hash key from the column(s) that are provided in the business_key parameter and add it to the data frame.
Create a column with the load timestamp and add it to the data frame.
Create a column for the source_id and add it to the data frame.
Iterate through the list of key columns and for each provided table create a hash key from the columns provided and add it to the data frame.
Append the new data frame to the table that will be generated from the source table parameter and the postfix "_link".
Finally return the new data frame.

And now finally create the Link-Satellite:

We need to create the business key like we did for the link table, add the attributes to the data frame and also again add the audit information.

The work that we then finally need to do is as follows:

How to add these notebooks to the Data Factory pipeline works exactly like in Part 2. I trust you to be able to repeat those steps and add a new parameter file or table and run these two notebooks.

Now with the work that we have done above we are able to create shortcuts from our tables in the Bronze layer and create simple Data Vault structures from these tables as our central analytics model. We can of course adjust the basic process to reflect more sophisticated requirements as needed.

In the next article it will be the time to create the presentation layer or Gold layer. We will examine how to control the PySpark job and even inject transformation function strings from our meta data repository and run them to create the presentation layer model. In our case as we are using Fabric in this example we will create a Star-Join schema to optimize the semantic model for the usage with PowerBI.

We will examine two alternative ways of providing the transformations and the models:

Using SQL strings from the meta data repository and running them against the Silver Layer.
Using PySpark code.

Until then dear reader maybe try this yourself. Like already mentioned I don't provide the material not because I want to provoke but to encourage you to use the Fabric Copilot in the Spark experience and to see the magic you will be doing with this feature.

Thx for reading!

Here are the other parts of the series:

Part 1: The Bronze Layer, the fundament

Part 2: The control flow, orchestrating the load

Microsoft for Nonprofits 9mo

Insightful!

1 Reaction

To view or add a comment, sign in

Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 3: The Silver Layer!)

Patrik Borosch

Recommended by LinkedIn

More articles by Patrik Borosch

Others also viewed

From API ZIP Files to Bronze Delta Tables in Microsoft Fabric: How I Built My CORDIS Data Pipeline

🚀 Unleashing the Power of PySpark with Microsoft Fabric

Microsoft Data Platform News 2024 - Week 11

Microsoft Data Platform News 2024 - Week 14

Building Real-time Table Editor with Databricks Apps and Lakebase

Unlocking the Power of Predictive Analytics: How to Use Microsoft R for Data Predictions

# Why Use Spark in Microsoft Fabric? The Data Engineer's Power Combo

From Tools to Ecosystems: What FABCON & SQLCON Taught Me About Microsoft Fabric

The Fabric March updates are seriously impressive — here's what you need to know

Native support for Excel files!

Explore content categories

Recommended by LinkedIn

More articles by Patrik Borosch

Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 2: The control flow!)

Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 1)

Consuming Kafka events with an Azure Function

Imagine efficiency: right-click a file and instantly start analyzing with SQL in Synapse

Imagine...

Experimenting with Customvision.ai

Let's hack it @ Wallisellen

Others also viewed

From API ZIP Files to Bronze Delta Tables in Microsoft Fabric: How I Built My CORDIS Data Pipeline

🚀 Unleashing the Power of PySpark with Microsoft Fabric

Microsoft Data Platform News 2024 - Week 11

Microsoft Data Platform News 2024 - Week 14

Building Real-time Table Editor with Databricks Apps and Lakebase

Unlocking the Power of Predictive Analytics: How to Use Microsoft R for Data Predictions

# Why Use Spark in Microsoft Fabric? The Data Engineer's Power Combo

From Tools to Ecosystems: What FABCON & SQLCON Taught Me About Microsoft Fabric

The Fabric March updates are seriously impressive — here's what you need to know

Native support for Excel files!

Explore content categories