Data Engineering on GCP - done (almost)
Completed 5 courses Data Engineering on Google Cloud Platform program on Coursera. Here is what I liked the most:
- Dataproc - Serverless Hadoop, you really can stood up your Hadoop cluster in minutes and start working on the solution. And if you no longer need it you can shut it down. Your Spark code works just fine on Dataproc.
- Serverless ML - allows you train and use your TF models w/o need to provision and support ML infrastructure.
- ML APIs - this one is amazing fast way to use Google's own pre-trained models to build various data science applications: from image recognition to sentiment analysis.
- DataStudio - helps you build various reports, dashboards against batch and streaming data.
- DataLab - Jupyter like notebooks in GCP. No need to install anything, one line command gives you the notebook quickly. Python2/3 kernels are supported out of the box. No Scala/Java.
Here is what I liked less:
- Dataflow - scalable serverless platform which allows building batch and streaming data pipelines, doing in-flight processing and ingesting data into BigQuery (Google's Datawarehouse) and BigTable (Google's low latency NoSQL data store). I found Dataflow API somewhat non-intuitive and hard to learn especially around windowing and triggering. Take this code for example (dealing with handling late arriving results), you won't be able to make sense of it w/o reading DF API docs with all these "After", "With", "Past", "Allowed", etc. Imagine more complex use case and code quickly becomes unreadable with all these chaining transformations.
- BigTable - NoSQL scalable, distributed KV store. BigTable has HBASE API access, its interactive queries capability are limited - you need to write java/json like code on the console to query it - unproductive and bit hard to learn the query semantic. No SQL support, obviously no indexing, no easy way to query data on non key column. Of course it's fast and low latency when you always query by the ordered key.
Taking a break to digest, rest and prepare for the final exam!