Integrating Microsoft Fabric and Databricks Unity Catalog
This is a short guide on how to currently integrate Microsoft Fabric with Databricks
Introduction
Although different platforms both Databricks and Microsoft Fabric embrace the Lakehouse concept (Databricks, Fabric) as well as using Delta-Parquet as the basis for the physical data layer.
With the ability of using OneLake-Shortcuts in Microsoft Fabric you can directly work with tables defined in the Databricks Unity Catalog as if they were tables stored in the Fabric OneLake.
Setup
To follow along you need to have a Microsoft Fabric Lakehouse. You either use an existing one or create a new one. I would suggest to have one Lakehouse for each schema in the Unity Catalog.
For the Databricks part you need to have Unity Catalog enabled for your account and a metastore. Then you can create catalogs in this metastore from within your Databricks workspace as well as schemas in those catalogs and tables in those schemas.
You should create your catalog with an external location pointing to a location in an Azure Data Lake Storage Gen2.
Let´s assume I have a catalog called "healthcare" with a schema called "clinicaltrialssilver" and some tables in it called "subject", "adverse_events", "measurements".
Create a shortcut to a Databricks Unity Catalog table
To work with those tables from Fabric you can easily create shortcuts. This way you can work with the data as if it was stored in the Fabric OneLake but it´s actually only a reference and no data is moved.
To do so go to your Fabric Lakehouse and create a table shortcut
Choose ADLS Gen2 and build a connection. The only missing piece is to know the path where the Unity Catalog table´s delta files are actually stored.
To do so either use the API, CLI or SDK of Databricks or navigate to the Catalog/Data section within your Databricks workspace and check the details section of the corresponding table.
Recommended by LinkedIn
(The code for the API calls can also be found in this repo: DaSenf1860/fabricdatabricksunitycatalog (github.com))
Adjust the abfss-path to the corresponding https-path and you can complete the shortcut-setup. (I´ve done the path-conversion programmatically in my code.)
Now you can see the table in your Fabric Lakehouse.
As both the table definition in the Fabric Lakehouse as well as the Unity Catalog table are actually pointing to the same delta table in the ADLS they are exactly the same. They even share their delta-specific features like enabled change feed etc.
Outlook
To avoid doing this process one by one you can use the Databricks API, CLI or SDKs to quickly extract the table details for each table in a specific schema or even for a whole catalog and create shortcuts for all of them.
As soon as we have the full Fabric API-capabilities in that regard you can easily sync your Unity catalogs with your Fabric Lakehouses and have both metalayers constantly in sync without moving any data and without additional effort.
Considering the big benefits of harmonizing both experiences hopefully we will soon get a functionality to sync Fabric Lakehouses with Databricks schemas on the data meta layer as well as on the permissions layer automatically.
Reap the benefits
Integrating both experiences we get the best out of the two. We can have all the state-of-the-art data engineering capabilities
Being able to use the Unity Catalog tables seamlessly from Fabric and the other way around gives you the full flexibility to choose the best tool for each task and still have a unified experience.
Potentially we could see architectures like the following with close to zero integration effort.
This is an insightful guide! I'm curious about the potential challenges you foresee when integrating Unity Catalog with Microsoft Fabric. Given the architectural concerns mentioned by others, how do these impact long-term strategy?
Is it possible to access OneLake delta tables through DBX catalog? As in: 1. In OneLake: Create delta table 2. In OneLake: Create internal shortcut to delta table location 3. Grant access on DBX and OneLake to service principal 4. In Databricks: Create New Catalog with aforementioned OneLake shortcut as Storage Location 5 In Databricks: Access OneLake delta table through DBX Catalog schema and manage further re-sharing from within DBX.
Thanks for the article! But as I understand it doesn't work yet if the ADLS Gen 2 lies behind a firewall? The new managed private endpoints might fix the issue, but it's only available for F SKUs.
Abhilash Sharma Sayandip Sarkar Gregory Petrossian
I don't think Unity Catalog provides capabilities for data engineering tools other than Databricks to leverage it. Microsoft Purview can connect to Unity for the purposes of scanning/crawling metadata. What you've described here isn't actually an integration between Fabric and Unity Catalog. All you've done is created a connection between Fabric and the underlying stored data in ADLS. The difference is not trivial and has significant impact to a multitude of architectural pillars including security and access control, privacy and risk management, metadata and lineage tracking, compute management and even cost management to name a few.