Workshop - 08/03/24

Workshop - 08/03/24

Building Modern Data Platform for Analytics

I want to thank my friend Ed Pollack for asking me to speak at SQL Saturday Albany. This full day pre-conference workshop is a bargain for those professionals who want to learn about the MDP. You get breakfast, lunch, and my half of a decade experience on creating Modern Data Platforms in Azure for various clients who want to do analytics.


Here are the details that will be covered that day. Please sign-up for the workshop using Eventbrite. Hope to see you this summer!


Many companies are placing their corporate information into data lakes in the cloud. Since storage costs are cheap, the amount of data stored in the lake can easily exceed the amount of data seen in a typical relational database. Regardless of the types of files in the data lake, there is always a need to transform the raw data files into refined data files for analytics, machine learning, and/or AI.


The Delta Lakehouse design uses a medallion (bronze, silver, and gold) architecture for data quality. We can we abstract the read and write actions in Spark to create dynamic notebooks to process data files. Data pipelines can be used to bring remote data into the lake as well as orchestrate data processing. A metadata driven design allows for the inputs to the dynamic notebooks to be stored in a central place.


The most important part of a modern data platform is security. Microsoft Entra, formally known as Azure Active Directory, can be used to secure the files in storage. This security layer is used in both the Apache Spark and Serverless SQL pools. Designers use a variety of tools for reporting. The Serverless SQL Pool turns a data lake files into a read only database tables. While the demos in this course are Azure specific, the concepts can be used with any cloud service.


Lessons:

  1. Infrastructure deployment (storage, key vault, Databricks, Synapse)
  2. Create a service principle for services
  3. Create medallion zones + assign rights
  4. Introduction to Data Factory pipelines
  5. How to create a hybrid design
  6. Working with different sources (database, file shares, rest API's)
  7. Hard coding vs meta data design
  8. Full vs incremental load patterns
  9. Configuring clusters + storage for security
  10. Writing data engineering notebooks
  11. Orchestrating pipelines with Data Factory
  12. Creating a presentation layer with Synapse Serverless Pools
  13. Connecting to Synapse with Power BI

There is less than 24 days before the pre-conference sessions for SQL Saturday Albany. Don't miss out training from IT professionals who have been using the technology for years. The cost includes training and food. Hope to see you in my class!

Like
Reply

To view or add a comment, sign in

More articles by John Miner

  • Day of Data ~ Jacksonville FL ~ right around the corner ...

    I am excited to be talking at Day of Data ~ Jacksonville Florida on May 2nd. There are two pre-conference workshops on…

  • Data Modeling with dbt for Visual Code

    Current Technology Data build tool (dbt) is an open-source command line tool that helps analysts and engineers…

  • RI DPUG - Call for Speakers

    The Rhode Island Data Platform User Group has been meeting either in person or virtually since the early 2000s. I was…

    4 Comments
  • Seeding a Fabric Warehouse with dbt for Visual Studio Code

    Technology Data build tool (dbt) is an open-source command line tool that helps analysts and engineers transform data…

  • Boston Code Camp #40

    I am looking forward to speaking at the event at the Microsoft Technology Center on Saturday, March 28th. Of course, I…

    1 Comment
  • Learn the data build tool (dbt) in one day!

    If you have not used the data build tool yet, you should. It allows companies to transform data via the medallion…

    43 Comments
  • Going Native with Fabric Spark Pools

    The open-source version of Apache Spark is written in Scala. The issue with Scala (Object Oriented Java) is that fact…

  • NEXT WEEK - FABCON 2026 + SQL SAT ATLANTA

    Hi Folks, I will be working the Insight Booth located at #733 in the exhibitor hall. Please stop by and talk with our…

    2 Comments
  • New YouTube Channel @ninedatamuses

    The Greek Muses were deemed essential for artistic creation, representing the connection between human creativity and…

    1 Comment
  • Comparing Spark Workspaces (Fabric vs Databricks)

    Did you know that the delta file format supports a binary data type? This is inherited from the Apache Spark. Today, we…

Others also viewed

Explore content categories