From the course: Implementing Data Engineering Solutions Using Microsoft Fabric (DP-700) Cert Prep by Microsoft Press
Unlock this course with a free trial
Join today to access over 25,500 courses taught by industry experts.
Optimizing a lakehouse table - Microsoft Fabric Tutorial
From the course: Implementing Data Engineering Solutions Using Microsoft Fabric (DP-700) Cert Prep by Microsoft Press
Optimizing a lakehouse table
Optimizing a lake house table. Now when we're writing data into a lake house delta table, this affects the underlying parquet files. As parquet files are immutable, if you're inserting or updating data, new parquet files are being added. And we can have what's known as the small file problem. With continued data modification statements against the Lakehouse Delta table, lots of small parquet files can be created. And this is going to affect read operations on the table because smaller amounts of larger files are preferable rather than larger amounts of smaller files. Now we can run maintenance commands against the lake house table and we start with optimize. Now what optimize does is it rewrites the smaller parquet files into fewer larger files. Now this helps with compression of the data inside the parquet files and helps with read optimization and we're looking at roughly between 128 meg and one gig per parquet file for an optimal size. We can also add an extra compression step…
Contents
-
-
-
-
-
-
-
-
-
-
-
-
-
(Locked)
Learning objectives22s
-
(Locked)
Optimizing a lakehouse table6m 56s
-
(Locked)
Optimizing a data factory pipeline5m 2s
-
(Locked)
Optimizing a warehouse5m 43s
-
(Locked)
Optimizing Eventstreams and Eventhouses4m 43s
-
(Locked)
Optimizing Spark performance8m 29s
-
(Locked)
Optimizing query performance5m 37s
-
(Locked)
Quiz2m 30s
-
(Locked)
-
-