Cracking the Databricks Spark Developer certification

Cracking the Databricks Spark Developer certification

I took the Spark developer certification - Python exam yesterday and passed with 70%. This exam is relatively tougher than other Spark certification exams from Cloudera and MapR. More than 80% of the questions were Code Snippets with multiple correct answers. Here are some of the recommendations and tips from my side for those aspiring to appear for it. 

 

  1. Databricks recommends to go through the book (7-steps-for-a-developer-to-learn-apache-spark) for the preparation, this can be a good starting point for preparation.
  2. Should be well versed with most of the RDD and Dataframe APIs.like are map, flatmap, filter, Spark Session, DataFrameReader/DataFrameWriter, Dataframes, Row/Column, Spark SQL functions, Window.
  3. Should know Default storage levels for RDD and DataFrames and the details of other storage levels
  4. Spark Internals concept which includes Driver, Executor, Cores, jobs, Stages, Tasks, Partitions, Shuffling, Wide& Narrow Transformations.
  5. A very good source of Information for Spark Internals is youtube video by Sameer Farooqi, here is the link
  6. Structured Streaming API for Kafka Source and Sink
  7. Go through the details of the Pipeline (transformers and estimator) in ML flow.
  8. One Question on GraphFrames with BFS algorithm
  9. One Question on most efficient code for reading a CSV file and converting it into Parquet
  10. There were multiple questions on Broadcast joins and accumulators
  11. Questions on identifying the Actions and Transformations
  12. Questions on most efficient code with least data shuffling
  13. One Question on Catalyst Optimizer and Tungsten encoder
  14. One Question on Predicate Pushdown possibility in the given code snippet.
  15. Questions on default parallelism and the number of partitions for a dataset
  16. Coalesce and repartition
  17. Defining and registering UDFs in Spark
  18. Heap memory in JVMs when caching the dataframes
  19. Performance of Python, Java and Scala APIs in Spark 2.x with catalyst and tungsten versus performance in Spark 1.x
  20. Structured Streaming link
  21. SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal link
  22. Tuning and Debugging Apache Spark  link
  23. Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust link

 

So, that's it from my side. Do let me know if you want any more details or have some clarifications. Best of luck for your preparation and Happy Learning.

Congrats. Well articulated points

Like
Reply

Does anyone know if this is gonna be transferable to the new Associate one? As an entry requirement to the higher level specialised certs? (yet to be released)

Like
Reply

Congrats and well done..!!! Any specific reason you chose darabricks certification over cloudera? It would be interesting to know your point of view on which one should be preferred - Cloudera OR Databricks OR any other vendor?

Thanks for sharing Gautam ! Super useful

Like
Reply

To view or add a comment, sign in

More articles by Kumar Gautam

Others also viewed

Explore content categories