Passing the Databricks Data Engineer Associate exam

Passing the Databricks Data Engineer Associate exam

Last week, I passed The Databricks Data Engineer Associate (v2) exam (woo!). Here’s a little bit about what I learnt.

What’s covered

The exam score you receive at the end is broken down into five categories. These are below, along with question subjects I remember from each section:

  • Databricks Lakehouse Platform: the benefits of a Lakehouse (batch & streaming, strutured + semi-structured, ACID compliance); the general ‘architecture’ (i.e. what’s in the Control Plane, what’s in the Data Plane)
  • ETL with Spark SQL and Python: lots of coding syntax questions, both SQL (e.g. MERGE, DELETE FROM, UNION…) and PySpark stuff
  • Incremental Data Processing: the benefits of Autoloader (incremental processing with ease, idempotency) and how it works under-the-hood (Spark Structured Streaming, checkpointing and write-ahead logs)
  • Production Pipelines: Delta Live Tables (features of the UI, data quality enforcement options), and the Jobs UI (orchestrating tasks, shedules, retries, inputting parameters into the Jobs)
  • Data Governance: controlling access, and the different ways to do so (e.g. GRANT USAGE; SELECT, MODIFY, ALL). Unity Catalog was not covered in v2.

How I prepared

When it comes to learning, each person has their own unique way of acquiring knowledge. Personally, I focused on two things: practical experience, and practice papers.

In terms of experience, the most helpful resource I recommend are the Databricks practice notebooks (available on Git). Cloning these into Databricks, working through them, and plenty of time in the labs. The notebooks are easy to follow, contain good explanations, and most importantly, the labs helped to ingrain knowledge beyond what following videos/reading docs does.

In terms of past papers, at the time of writing Databricks only has one available. I also purchased these five additional papers from Udemy, which I recommend. The Udemy papers aren’t perfect: they’re littered with grammatical/spelling errors, and sometimes the questions worded are too vague to answer properly, and it goes into unnecessary levels of detail in some areas. But some of the questions are almost exactly what come up in my exam. I think it’s worth the purchase.

My loose structure was:

  • Work through the Databricks notebooks
  • Take a practice exam, and read docs where I’d gone wrong
  • Take another practice exam…
  • And, finally, take the actual exam.

I’m a fairly obsessive note-taker, so during this learning process I accumulated this heap of notes. Skim-reading this several times in the lead-up to the exam further helped to reinforce my learning. Here’s the link to those notes, in case they help.

Hope this helps. If you’re taking the exam, good luck!

This is huge !!! Well done on passing the cert, I'll be knocking on your door for more tips

Like
Reply

To view or add a comment, sign in

More articles by Sakib Moghal

Others also viewed

Explore content categories