Streamlining Databricks Project Development with RevoData Asset Bundle Templates

Streamlining Databricks Project Development with RevoData Asset Bundle Templates

Looking to deploy a fully configured development environment for your Databricks Asset Bundles—complete with strictly enforced coding standards, CI/CD pipelines, pre-commit hooks, and example pipelines? Try RevoData Asset Bundle Templates today:

databricks bundle init https://github.com/revodatanl/revo-asset-bundle-templates         

Introduction

Setting up a new development environment often feels like reinventing the wheel—repetitive tasks, boilerplate code, and setting up tedious configurations. This not only drains productivity but slows you down, as well as introduces inconsistencies and potential errors.

Databricks Asset Bundles offer a powerful solution to streamline this process. DABs allow you to create fully customized templates that enforces your organization's unique standards, configurations, and best practices.

At RevoData, we have taken this concept further by designing our own Asset Bundle Templates – and we are open-sourcing them for everyone to use. Our templates elevate project setup by enforcing the highest coding standards and integrating essential tools, enabling you to hit the ground running when starting new projects.


What Are Databricks Asset Bundles?

DABs provide an infrastructure-as-code approach to managing Databricks projects. They make it easier to:

  • Manage complex projects with deployments across multiple environments
  • Implement CI/CD pipelines, streamlining your deployment processes
  • Simplify new project setup, reducing time spent on initial configurations


Article content
High-level overview of project development and deployment via CI/CD pipelines with DABs – taken from the official Databricks

What it means in practice becomes clear from above image from Databricks own documentation. Developers are working in a local dev environment, fully customized to the user’s preference. Once an asset is done, say an ingestion pipeline, it is integrated in the DAB and deployed to the Development workspace where it is tested and configured further. Once it meets acceptance criteria it is checked into version control and deployed onwards to the Staging workspace and ultimately to the Production workspace via CI/CD pipelines. If used correctly, the same assets get redeployed to the various environments without code duplication and nothing that is untested ends up in production.

Recognizing the challenges in setting up such an intricate environment, we have developed our own RevoData Asset Bundle Templates. Our templates serve as a versatile starting point for new projects: integrating a fully configured developer toolset, pre-commit hooks, example pipelines ready for deployment, and several CI/CD pipelines—including a semantic release pipeline that automatically tags releases and generates a CHANGELOG for you.


Introducing RevoData Asset Bundle Templates

Our Asset Bundle Templates simplify setting up a consistent local development environment, strictly enforce coding style, making it easy to start new projects while maintaining high-quality standards. While our approach is highly opinionated and might require some time to getting used to - you WILL become annoyed - we believe that following these recommendations will make you a better programmer in the end!

We have integrated modern development tools and are particularly fond of the new and fast ones written in Rust (such as Ruff and uv – we love the Astral team!) alongside essentials like mypy – even though they can be really quite slow they add unequivocal value to the developer’s experience.

To make sure that the standards we set in in the local development environment are maintained further in the codebase we included a combination of pre-commit hooks and a CI pipeline that we suggest you set as a prerequisite for a merge.


Getting Started

Initialize the bundle with a single command:

databricks bundle init https://github.com/revodatanl/revo-asset-bundle-templates        

If this does not work, you might need to update your Databricks CLI 🤓

Article content
RevoData Asset Bundle Templates

You'll be guided through setup prompts to configure your project. Pick a Python package manager blazingly fast uv or classic Poetry. Specify your git client – we support GitHub and Azure DevOps (support for GitLab under development right now). Lastly, specify your project type: opt for a fully configured Databricks Asset Bundle including various example pipelines, or rather a lean python-only setup (without any Databricks files). If you decide to include the example pipelines and jobs, note how we implemented best practices in our examples, that are easy to customize and extend further.


Article content
Our template heavily depends on the provided Makefile for tasks. If Make is not installed, you a̶r̶e̶ ̶g̶o̶n̶n̶a̶ ̶h̶a̶v̶e̶ ̶a̶ ̶b̶a̶d̶ ̶t̶i̶m̶e̶ need to manually run the commands listed in the Makefile.

We currently recommend Visual Studio Code, since our bundle comes with some neat default settings for it. Other IDEs will probably also work well but you’d have to configure those yourself.


Setting Up

Run the following command to install necessary tools and set up the environment:

make setup        

This command ensures all essential tools are installed:

  • Homebrew: For managing packages on macOS and Linux.
  • Git: Version control system.
  • Python Environment: Matching the latest Databricks Runtime LTS version.
  • Package Manager: uv or Poetry, based on your choice.
  • Databricks CLI: For those developers simply cloning an existing repository.

Subsequently, all project dependencies are rolled out, a virtual environment is created and activated, Git is configured, and the pre-commit hooks are installed to immediately enforce code quality.


Cleaning Up

To deactivate and remove the virtual environment, lock files, and caches, simply run:

make clean        

Deploy and Destroy with Ease

To deploy the asset bundle to the appropriate Databricks workspace, run:

make deploy_*        

To remove everything that came with the bundle from the Databricks workspace, run:

make destroy_*         

Note: the * in the command above can (by default) be replaced with the following options: dev or prd.


Standout Features

1. Modern Python Packaging

Utilize uv or Poetry as your package manager for efficient dependency management and packaging, rather than keeping a requirements.txt file buildable and up to date. Good luck with that 😬

2. Strict Code Quality Enforcement

Strongly opinionated linters, formatters, and type checkers uphold code quality standards:

  • Ruff: Linting and formatting
  • mypy: Static type checking
  • SQLFluff: SQL code linting and formatting
  • pydoclint: Docstring linting
  • Bandit: Security assessment

All tools are configured in the pyproject.toml file, keeping the repository clean and clear. We maintain code quality in the code base by utilizing pre-commit hooks and CI pipelines.

3. Integrated CI/CD

Support for various Git clients with pre-configured workflows for:

  • Continuous Integration: Automated testing and linting on code pushes
  • Semantic Release: Automated release versioning and CHANGELOG generation based on commit messages that strictly adhere to the Angular Commit Message Conventions
  • Bundle Deployment: These example pipelines are currently nested under the Revo Modules

Azure DevOps and GitHub are supported, and we are working on getting GitLab in the mix as well.

4. Revo Modules

Revo Modules are custom add-ons for additional functionalities. As of now, the Bundle Deployment pipeline can be added by running:

make module        

Recognizing the wealth of possibilities that these Revo Modules can bring, this will be one of our main focus areas to develop our bundle templates further in the future. Some examples that will be nested under Revo Modules in the future:

  • Standardized ingestion patterns
  • Deployment of Unity Catalog system schemas
  • Metadata-driven ingestion pipelines leveraging pushcart
  • FinOps tools
  • Alerts and notifications (e.g. Slack integration)

5. Testing

To run a full test suite using pytest, including generating a coverage report, run:

make test        

6. Auto-generated Documentation

Last but definitely not least: generate amazingly beautiful and comprehensive project documentation by running:

make docs        

Our project uses MkDocs to generate comprehensive HTML documentation from markdown files. In addition, we use pdoc3 to auto-generate HTML documentation from doctrings of modules and tests. Lastly, the coverage report is embedded in the documentation as well.


Conclusion

RevoData Asset Bundle Templates offer a robust foundation for new Databricks projects, encapsulating our best practices, and automating routine tasks. By standardizing the development environment and integrating these carefully selected essential tools, your team can focus on delivering value quickly rather than configuring setups or bickering about code standards.

Being fully open-source, we invite everyone to leverage our templates to enhance efficiency and elevate your projects today.


Get Involved

  • Adopt the Templates: Kickstart your next Databricks project with our templates today.
  • Provide Feedback: Share your experiences and suggestions to help us improve our templates.
  • Contribute: Even better! Follow the guidelines that can be found in the repo.


Acknowledgments

This project is a result of collaborative efforts to improve our development processes. Special thanks to the RevoData team who provided input and tested the templates during development.



Article content
A bundle of (Data)bricks in front of a Lakehouse.


To view or add a comment, sign in

More articles by Thomas Brouwer

  • Insights from the Microsoft Netherlands Databricks Event

    On January 22, 2024, an insightful event took place in Amsterdam, bringing together experts and enthusiasts in the…

    4 Comments
  • Mastering Terraform in 10 days

    At RevoData, we strongly believe in continuous learning and personal development. To stimulate this ideal, everyone at…

Others also viewed

Explore content categories