Streamlining Databricks Project Development with RevoData Asset Bundle Templates
Looking to deploy a fully configured development environment for your Databricks Asset Bundles—complete with strictly enforced coding standards, CI/CD pipelines, pre-commit hooks, and example pipelines? Try RevoData Asset Bundle Templates today:
databricks bundle init https://github.com/revodatanl/revo-asset-bundle-templates
Introduction
Setting up a new development environment often feels like reinventing the wheel—repetitive tasks, boilerplate code, and setting up tedious configurations. This not only drains productivity but slows you down, as well as introduces inconsistencies and potential errors.
Databricks Asset Bundles offer a powerful solution to streamline this process. DABs allow you to create fully customized templates that enforces your organization's unique standards, configurations, and best practices.
At RevoData, we have taken this concept further by designing our own Asset Bundle Templates – and we are open-sourcing them for everyone to use. Our templates elevate project setup by enforcing the highest coding standards and integrating essential tools, enabling you to hit the ground running when starting new projects.
What Are Databricks Asset Bundles?
DABs provide an infrastructure-as-code approach to managing Databricks projects. They make it easier to:
What it means in practice becomes clear from above image from Databricks own documentation. Developers are working in a local dev environment, fully customized to the user’s preference. Once an asset is done, say an ingestion pipeline, it is integrated in the DAB and deployed to the Development workspace where it is tested and configured further. Once it meets acceptance criteria it is checked into version control and deployed onwards to the Staging workspace and ultimately to the Production workspace via CI/CD pipelines. If used correctly, the same assets get redeployed to the various environments without code duplication and nothing that is untested ends up in production.
Recognizing the challenges in setting up such an intricate environment, we have developed our own RevoData Asset Bundle Templates. Our templates serve as a versatile starting point for new projects: integrating a fully configured developer toolset, pre-commit hooks, example pipelines ready for deployment, and several CI/CD pipelines—including a semantic release pipeline that automatically tags releases and generates a CHANGELOG for you.
Introducing RevoData Asset Bundle Templates
Our Asset Bundle Templates simplify setting up a consistent local development environment, strictly enforce coding style, making it easy to start new projects while maintaining high-quality standards. While our approach is highly opinionated and might require some time to getting used to - you WILL become annoyed - we believe that following these recommendations will make you a better programmer in the end!
We have integrated modern development tools and are particularly fond of the new and fast ones written in Rust (such as Ruff and uv – we love the Astral team!) alongside essentials like mypy – even though they can be really quite slow they add unequivocal value to the developer’s experience.
To make sure that the standards we set in in the local development environment are maintained further in the codebase we included a combination of pre-commit hooks and a CI pipeline that we suggest you set as a prerequisite for a merge.
Getting Started
Initialize the bundle with a single command:
databricks bundle init https://github.com/revodatanl/revo-asset-bundle-templates
If this does not work, you might need to update your Databricks CLI 🤓
You'll be guided through setup prompts to configure your project. Pick a Python package manager blazingly fast uv or classic Poetry. Specify your git client – we support GitHub and Azure DevOps (support for GitLab under development right now). Lastly, specify your project type: opt for a fully configured Databricks Asset Bundle including various example pipelines, or rather a lean python-only setup (without any Databricks files). If you decide to include the example pipelines and jobs, note how we implemented best practices in our examples, that are easy to customize and extend further.
We currently recommend Visual Studio Code, since our bundle comes with some neat default settings for it. Other IDEs will probably also work well but you’d have to configure those yourself.
Setting Up
Run the following command to install necessary tools and set up the environment:
make setup
This command ensures all essential tools are installed:
Subsequently, all project dependencies are rolled out, a virtual environment is created and activated, Git is configured, and the pre-commit hooks are installed to immediately enforce code quality.
Cleaning Up
To deactivate and remove the virtual environment, lock files, and caches, simply run:
make clean
Recommended by LinkedIn
Deploy and Destroy with Ease
To deploy the asset bundle to the appropriate Databricks workspace, run:
make deploy_*
To remove everything that came with the bundle from the Databricks workspace, run:
make destroy_*
Note: the * in the command above can (by default) be replaced with the following options: dev or prd.
Standout Features
1. Modern Python Packaging
Utilize uv or Poetry as your package manager for efficient dependency management and packaging, rather than keeping a requirements.txt file buildable and up to date. Good luck with that 😬
2. Strict Code Quality Enforcement
Strongly opinionated linters, formatters, and type checkers uphold code quality standards:
All tools are configured in the pyproject.toml file, keeping the repository clean and clear. We maintain code quality in the code base by utilizing pre-commit hooks and CI pipelines.
3. Integrated CI/CD
Support for various Git clients with pre-configured workflows for:
Azure DevOps and GitHub are supported, and we are working on getting GitLab in the mix as well.
4. Revo Modules
Revo Modules are custom add-ons for additional functionalities. As of now, the Bundle Deployment pipeline can be added by running:
make module
Recognizing the wealth of possibilities that these Revo Modules can bring, this will be one of our main focus areas to develop our bundle templates further in the future. Some examples that will be nested under Revo Modules in the future:
5. Testing
To run a full test suite using pytest, including generating a coverage report, run:
make test
6. Auto-generated Documentation
Last but definitely not least: generate amazingly beautiful and comprehensive project documentation by running:
make docs
Our project uses MkDocs to generate comprehensive HTML documentation from markdown files. In addition, we use pdoc3 to auto-generate HTML documentation from doctrings of modules and tests. Lastly, the coverage report is embedded in the documentation as well.
Conclusion
RevoData Asset Bundle Templates offer a robust foundation for new Databricks projects, encapsulating our best practices, and automating routine tasks. By standardizing the development environment and integrating these carefully selected essential tools, your team can focus on delivering value quickly rather than configuring setups or bickering about code standards.
Being fully open-source, we invite everyone to leverage our templates to enhance efficiency and elevate your projects today.
Get Involved
Acknowledgments
This project is a result of collaborative efforts to improve our development processes. Special thanks to the RevoData team who provided input and tested the templates during development.
Cool stuff! Will take a look
Daan Pruijt George Fourmouzis Wouter Overbeek
Sta je er ook zelf? Dan tot morgen!
Dr. Guillermo G Schiava D'Albano I feel we have a next candidate for databricks championship soon ;-)