From the course: Implementing Data Engineering Solutions Using Microsoft Fabric (DP-700) Cert Prep by Microsoft Press

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Optimizing a lakehouse table

Optimizing a lakehouse table

Optimizing a lake house table. Now when we're writing data into a lake house delta table, this affects the underlying parquet files. As parquet files are immutable, if you're inserting or updating data, new parquet files are being added. And we can have what's known as the small file problem. With continued data modification statements against the Lakehouse Delta table, lots of small parquet files can be created. And this is going to affect read operations on the table because smaller amounts of larger files are preferable rather than larger amounts of smaller files. Now we can run maintenance commands against the lake house table and we start with optimize. Now what optimize does is it rewrites the smaller parquet files into fewer larger files. Now this helps with compression of the data inside the parquet files and helps with read optimization and we're looking at roughly between 128 meg and one gig per parquet file for an optimal size. We can also add an extra compression step…

Contents