From the course: PySpark Essential Training: Introduction to Building Data Pipelines
Unlock this course with a free trial
Join today to access over 25,500 courses taught by industry experts.
Cloud services - Python Tutorial
From the course: PySpark Essential Training: Introduction to Building Data Pipelines
Cloud services
- [Instructor] So far, we've talked about how to build infrastructure to run PySpark jobs almost from scratch. But what if you don't want to deal with all the set up, configuration, and infrastructure yourself? That's where Cloud Services come in. There are a handful of platforms that let you run PySpark jobs without having to manage your own Spark cluster. Some of them are super flexible and give you full control, others are more all-in-one opinionated platforms that take care of most of the heavy lifting, so you can focus on writing code and getting results. Databricks is one of the most popular platforms for running PySpark. It was created by the original creators of Apache Spark, and it's designed to make big data and machine learning workflows much easier to manage. Databricks provides a collaborative notebook environment, auto-scaling Spark clusters, built-in data connectors, and excellent performance tuning out of the box. It's especially useful for teams working together on…