Streamlining Databricks Project Development with RevoData Asset Bundle Templates

Thomas Brouwer

Published Nov 13, 2024

Looking to deploy a fully configured development environment for your Databricks Asset Bundles—complete with strictly enforced coding standards, CI/CD pipelines, pre-commit hooks, and example pipelines? Try RevoData Asset Bundle Templates today:

databricks bundle init https://github.com/revodatanl/revo-asset-bundle-templates

Introduction

Setting up a new development environment often feels like reinventing the wheel—repetitive tasks, boilerplate code, and setting up tedious configurations. This not only drains productivity but slows you down, as well as introduces inconsistencies and potential errors.

Databricks Asset Bundles offer a powerful solution to streamline this process. DABs allow you to create fully customized templates that enforces your organization's unique standards, configurations, and best practices.

At RevoData, we have taken this concept further by designing our own Asset Bundle Templates – and we are open-sourcing them for everyone to use. Our templates elevate project setup by enforcing the highest coding standards and integrating essential tools, enabling you to hit the ground running when starting new projects.

What Are Databricks Asset Bundles?

DABs provide an infrastructure-as-code approach to managing Databricks projects. They make it easier to:

Manage complex projects with deployments across multiple environments
Implement CI/CD pipelines, streamlining your deployment processes
Simplify new project setup, reducing time spent on initial configurations

Article content — High-level overview of project development and deployment via CI/CD pipelines with DABs – taken from the official Databricks

What it means in practice becomes clear from above image from Databricks own documentation. Developers are working in a local dev environment, fully customized to the user’s preference. Once an asset is done, say an ingestion pipeline, it is integrated in the DAB and deployed to the Development workspace where it is tested and configured further. Once it meets acceptance criteria it is checked into version control and deployed onwards to the Staging workspace and ultimately to the Production workspace via CI/CD pipelines. If used correctly, the same assets get redeployed to the various environments without code duplication and nothing that is untested ends up in production.

Recognizing the challenges in setting up such an intricate environment, we have developed our own RevoData Asset Bundle Templates. Our templates serve as a versatile starting point for new projects: integrating a fully configured developer toolset, pre-commit hooks, example pipelines ready for deployment, and several CI/CD pipelines—including a semantic release pipeline that automatically tags releases and generates a CHANGELOG for you.

Introducing RevoData Asset Bundle Templates

Our Asset Bundle Templates simplify setting up a consistent local development environment, strictly enforce coding style, making it easy to start new projects while maintaining high-quality standards. While our approach is highly opinionated and might require some time to getting used to - you WILL become annoyed - we believe that following these recommendations will make you a better programmer in the end!

We have integrated modern development tools and are particularly fond of the new and fast ones written in Rust (such as Ruff and uv – we love the Astral team!) alongside essentials like mypy – even though they can be really quite slow they add unequivocal value to the developer’s experience.

To make sure that the standards we set in in the local development environment are maintained further in the codebase we included a combination of pre-commit hooks and a CI pipeline that we suggest you set as a prerequisite for a merge.

Getting Started

Initialize the bundle with a single command:

databricks bundle init https://github.com/revodatanl/revo-asset-bundle-templates

If this does not work, you might need to update your Databricks CLI 🤓

You'll be guided through setup prompts to configure your project. Pick a Python package manager blazingly fast uv or classic Poetry. Specify your git client – we support GitHub and Azure DevOps (support for GitLab under development right now). Lastly, specify your project type: opt for a fully configured Databricks Asset Bundle including various example pipelines, or rather a lean python-only setup (without any Databricks files). If you decide to include the example pipelines and jobs, note how we implemented best practices in our examples, that are easy to customize and extend further.

We currently recommend Visual Studio Code, since our bundle comes with some neat default settings for it. Other IDEs will probably also work well but you’d have to configure those yourself.

Setting Up

Run the following command to install necessary tools and set up the environment:

make setup

This command ensures all essential tools are installed:

Homebrew: For managing packages on macOS and Linux.
Git: Version control system.
Python Environment: Matching the latest Databricks Runtime LTS version.
Package Manager: uv or Poetry, based on your choice.
Databricks CLI: For those developers simply cloning an existing repository.

Subsequently, all project dependencies are rolled out, a virtual environment is created and activated, Git is configured, and the pre-commit hooks are installed to immediately enforce code quality.

Cleaning Up

To deactivate and remove the virtual environment, lock files, and caches, simply run:

make clean

Standout Features

1. Modern Python Packaging

Utilize uv or Poetry as your package manager for efficient dependency management and packaging, rather than keeping a requirements.txt file buildable and up to date. Good luck with that 😬

2. Strict Code Quality Enforcement

Strongly opinionated linters, formatters, and type checkers uphold code quality standards:

Ruff: Linting and formatting
mypy: Static type checking
SQLFluff: SQL code linting and formatting
pydoclint: Docstring linting
Bandit: Security assessment

All tools are configured in the pyproject.toml file, keeping the repository clean and clear. We maintain code quality in the code base by utilizing pre-commit hooks and CI pipelines.

3. Integrated CI/CD

Support for various Git clients with pre-configured workflows for:

Continuous Integration: Automated testing and linting on code pushes
Semantic Release: Automated release versioning and CHANGELOG generation based on commit messages that strictly adhere to the Angular Commit Message Conventions
Bundle Deployment: These example pipelines are currently nested under the Revo Modules

Azure DevOps and GitHub are supported, and we are working on getting GitLab in the mix as well.

4. Revo Modules

Revo Modules are custom add-ons for additional functionalities. As of now, the Bundle Deployment pipeline can be added by running:

make module

Recognizing the wealth of possibilities that these Revo Modules can bring, this will be one of our main focus areas to develop our bundle templates further in the future. Some examples that will be nested under Revo Modules in the future:

Standardized ingestion patterns
Deployment of Unity Catalog system schemas
Metadata-driven ingestion pipelines leveraging pushcart
FinOps tools
Alerts and notifications (e.g. Slack integration)

5. Testing

To run a full test suite using pytest, including generating a coverage report, run:

make test

6. Auto-generated Documentation

Last but definitely not least: generate amazingly beautiful and comprehensive project documentation by running:

make docs

Our project uses MkDocs to generate comprehensive HTML documentation from markdown files. In addition, we use pdoc3 to auto-generate HTML documentation from doctrings of modules and tests. Lastly, the coverage report is embedded in the documentation as well.

Conclusion

RevoData Asset Bundle Templates offer a robust foundation for new Databricks projects, encapsulating our best practices, and automating routine tasks. By standardizing the development environment and integrating these carefully selected essential tools, your team can focus on delivering value quickly rather than configuring setups or bickering about code standards.

Being fully open-source, we invite everyone to leverage our templates to enhance efficiency and elevate your projects today.

Get Involved

Adopt the Templates: Kickstart your next Databricks project with our templates today.
Provide Feedback: Share your experiences and suggestions to help us improve our templates.
Contribute: Even better! Follow the guidelines that can be found in the repo.

Acknowledgments

This project is a result of collaborative efforts to improve our development processes. Special thanks to the RevoData team who provided input and tested the templates during development.

Sven Hofstede 1y

Cool stuff! Will take a look

Adriaan van der Feltz 1y

Daan Pruijt George Fourmouzis Wouter Overbeek

1 Reaction

Fabian Frank 1y

Sta je er ook zelf? Dan tot morgen!

Sanne Wouters 1y

Dr. Guillermo G Schiava D'Albano I feel we have a next candidate for databricks championship soon ;-)

Streamlining Databricks Project Development with RevoData Asset Bundle Templates

Thomas Brouwer

Introduction

What Are Databricks Asset Bundles?

Introducing RevoData Asset Bundle Templates

Getting Started

Setting Up

Cleaning Up

Recommended by LinkedIn

Deploy and Destroy with Ease

Standout Features

Conclusion

Get Involved

Acknowledgments

More articles by Thomas Brouwer

Others also viewed

The GitOps Blueprint: 3 Essential Steps for Seamless Development

Unleashing the Power of GitOps: A Deep Dive into Today’s Top Tools

CI/CD with Bitbucket Pipelines: From TDD to Deployment

Code Wiki: Google's Solution to Code Understanding at Scale

Docker and Kubernetes: Revolutionizing Modern Application Development

Implementing Continuous Integration and Delivery with Azure Pipelines

Mastering CI/CD Best Practices with GitHub Actions

Platform Engineering: Transforming Developer Experience in Modern Organizations

Reusing Workflows in GitHub Actions

🧭 Helm Deep Dive: Mastering Kubernetes Package Management from Create to Rollback

Explore content categories

Introduction

What Are Databricks Asset Bundles?

Introducing RevoData Asset Bundle Templates

Getting Started

Setting Up

Cleaning Up

Recommended by LinkedIn

Deploy and Destroy with Ease

Standout Features

Conclusion

Get Involved

Acknowledgments

More articles by Thomas Brouwer

Insights from the Microsoft Netherlands Databricks Event

Mastering Terraform in 10 days

Others also viewed

The GitOps Blueprint: 3 Essential Steps for Seamless Development

Unleashing the Power of GitOps: A Deep Dive into Today’s Top Tools

CI/CD with Bitbucket Pipelines: From TDD to Deployment

Code Wiki: Google's Solution to Code Understanding at Scale

Docker and Kubernetes: Revolutionizing Modern Application Development

Implementing Continuous Integration and Delivery with Azure Pipelines

Mastering CI/CD Best Practices with GitHub Actions

Platform Engineering: Transforming Developer Experience in Modern Organizations

Reusing Workflows in GitHub Actions

🧭 Helm Deep Dive: Mastering Kubernetes Package Management from Create to Rollback

Similar topics

Streamlined CI/CD Setup for AWS

Documentation for Data Pipelines

Explore content categories