Rust for Data Engineering

Stefan Parvu

Published Dec 19, 2025

Capturing data is a core fundamental part of what makes today data engineering. And we do that using different programs and pipelines deployed on technologies like #DataFactory to more complex data warehouse analytics platforms and lakehouses, example #Databricks.

Today we call this - data ingestion.

Layers and layers of ingested data to be filtered, organized and saved for analysis. As more data becomes available we need faster and more scalable ingestion processes to keep up. Pipelines get slower, take longer time to run and cost more. We turn quickly to coding to fix these things, to find more complex pipelines and less accurate data. What’s wrong?

Capturing the RIGHT data is complex and requires more thinking than fetching and building big lakes of data. Since 2007, when I started Data Recorders and crossed paths with Prof. Neil Gunther , this was one my goal: find ways to filter and capture the right data to solve certain problems. Data recorders have ingestion capabilities built-in: can capture, filter and prepare the output for various things. But thats not enough. You need more.

You need to think, plan and build on a strategy on how to get the right data. Three main things you need:

► Plan and write down WHAT needs to be done

► Think and develop the logic or the algorithm HOW to do that

► Then DOIT. Using the right programming language

Enter #Rust.

Today’s Data Engineering world is made of Python and Java/Scala. Everyone is consuming these. But as soon as your data pipelines will need to process more data and use more computing resources, you will need to rethink. You will need to look and start fresh from the logic level, building and writing a technical spec of your pipeline, and select the best programming language to scale-up.

Thats how Rust meets Data Engineering. Rust is a very well known systems programming language, with a large popularity due to its performance, safety, and concurrency features. Based on these merits and capabilities Rust is the right programming language for many data engineering tasks: data ingestion and transformation.

Rust advantages

Memory Safety: Rust’s original ownership system ensures that memory errors are caught at compile time, reducing eliminating the risk of runtime crashes. There is no need for a garbage collector.
Concurrency: With its lightweight concurrency model and strict compile-time checks, Rust makes it easier to write concurrent programs that are both safe and efficient, a critical need in data-intensive applications.
Performance: Rust’s performance is similar to C, making it suitable for high-throughput data processing tasks using less computing resources.
Excellent Build Package Manager: Cargo is the integrated package manager and build system, handling dependency management, compilation, testing, documentation generation, and package publishing. It has everything you need for complete software development.

Why Rust for Data Engineering?

Rust has excellent capabilities for processing large amounts of data because of its efficient memory management system. Combined with its performance and concurrency features Rust is a very good candidate for writing:

Real-time data pipelines
Complex transformation and business logic pipelines
Output capabilities for data lakes or data lakehouses
Very efficient pipelines which consume less CPU or Memory
Easy maintainable data pipelines using Cargo

There are a number of ready crates (libraries) for Data Engineering like Polars, DataFusion, Delta Lake, Parquet.

Rust is here to stay for Data Engineering tasks next to Python and Java. I look forward to see how quickly companies like #Databricks and others will support Rust on their platforms.

For the rest of us, start with your technical specs, write down your logic and algorithm before coding anything. And when ready, give #Rust a try.

Merry Xmas

Espoo 2025 Dec 20

To view or add a comment, sign in

Rust for Data Engineering

Stefan Parvu

Recommended by LinkedIn

Rust advantages

Why Rust for Data Engineering?

More articles by Stefan Parvu

Others also viewed

Data Engineering Is Not Just SQL. And That Changed How I Think.

From Code to Commentary: Using Claude 3.7 Sonnet Natively in Databricks

Distributed Data Science with Spark 2.0, Python, R, and H2O on Docker

Solving Common MLOps problems with Kedro and Snowflake

IBM Databand: Effortless, Unified Observability for All Your Data Pipelines

Data Engineering - Spark Runtime Architecture

Part 1: Laying the Foundation – Data Validation with Great Expectations & PySpark

All Data and AI Weekly #215: 10 November 2025

Databricks: Conditional execution in a job using if-else

Spark Submit Deployment - Spark Execution Types (Part-1)

Explore content categories

Recommended by LinkedIn

Rust advantages

Why Rust for Data Engineering?

More articles by Stefan Parvu

Fundamentals Are Everything - Part I: Functional Block Diagrams

What is a data recorder?

Beware of Short Service Interruptions

Every breath you take

cpuplayer - multiprocessor performance analysis

Others also viewed

Data Engineering Is Not Just SQL. And That Changed How I Think.

From Code to Commentary: Using Claude 3.7 Sonnet Natively in Databricks

Distributed Data Science with Spark 2.0, Python, R, and H2O on Docker

Solving Common MLOps problems with Kedro and Snowflake

IBM Databand: Effortless, Unified Observability for All Your Data Pipelines

Data Engineering - Spark Runtime Architecture

Part 1: Laying the Foundation – Data Validation with Great Expectations & PySpark

All Data and AI Weekly #215: 10 November 2025

Databricks: Conditional execution in a job using if-else

Spark Submit Deployment - Spark Execution Types (Part-1)

Similar topics

Data Capture Strategies for Robotics Professionals

Explore content categories