Data Engineering Best Practices

Prashant Tandan

Published Jun 29, 2025

The undercurrents of Data Engineering

“Smooth seas never made skilled sailors.” — Franklin D. Roosevelt

In the last blog, we came to know what data engineers really do. In this blog, we are going to explore how they do the things they do (or practically speaking, how they should be doing those things).

I think my last blog was clear about this: data engineering doesn’t have a cool-kid hype like training LLMs or finetuning diffusion models to make Drake wear a sari. What it does have, however, is everything you need to make all of those things work.

The cool LLMs and those ML algorithms don’t work if there’s no data or if the data is crappy. The work of data engineers is not just to dump the data from disparate sources to a target system so that it can be gobbled up by these cool applications. Yes, I admit that it involves a lot of data dumping but there are a set of principles that have to be followed so that the dumping is done securely and reliably, and so that the dumping system survives a long time.

Without further ado, let’s talk about those principles:

1. Security: The Least You Can Do

One of the most common anti-patterns in early data teams is: “Just give them admin access. We’ll clean it up later.” (Spoiler: we never do!)

In data engineering, security isn’t just a checkbox, it’s a mindset. When pipelines touch sensitive data like credit cards, healthcare info, internal emails, you can’t afford a “YOLO” mentality.

Here’s the golden rule: Least privilege, always. Give people only the permissions they need, only for the time they need them. Just like you don’t hand over your Netflix password to someone who’s looking to watch a movie trailer, don’t give root access to people checking the column of a table.

And for the data engineers themselves:

Use temporary credentials.
Hide secrets and env variables.
Learn how IAM, encryption, and network protocols work because when things go wrong, you become the incident report.

2. Data Management

“Data management” sounds like one of those overly broad corporate terms, like “strategic synergies” or “robust alignment.” But underneath that umbrella lives half of a data engineer’s job.

It includes:

Data modeling (Should this be a dimension or a fact?)
Data quality (Is NULL a valid value or a sad mistake?)
Warehousing (Is this a lakehouse now?)
Governance (Who owns this field called user_name_2_backup_new?)

3. Data Architecture: Design Like You’re Going to Rebuild It Anyway

Architecture in data engineering is not just “how the tables are joined.” It’s about designing for today with tomorrow in mind.

Good architecture is flexible. It makes reversible decisions, prepares for failures, and chooses components that work well with other teams, not just yours.

Here are some principles that you may want to tattoo on your Teams channel:

Plan for failure. (Things will break. The goal is not “never break” but “recover gracefully.”)
Loosely couple systems. (Let them dance, not tangle.)
Prioritize security. (Even over speed sometimes.)
Embrace FinOps. (Design systems that do the job without draining your AWS credits like a slot machine.)

And most importantly:

Lead with ideas, not just tickets. Communicate and collaborate with all of the stakeholders and the data team while designing because architecture is a shared vision, not just a diagram in Lucidchart.

4. DataOps

Imagine if software engineers didn’t have Git, CI/CD, or Jira (okay, maybe Jira is debatable). That’s what data engineering often looked like five years ago.

Enter DataOps, a set of practices that bring agility, collaboration, and reliability to data products.

Recommended by LinkedIn

Crafting Excellence: A Guide to Becoming a Standout…

Patrick Robinson 2 years ago

How to Gain The Unfair Advantage in Data Engineering

Christine Karimi Nkoroi 2 years ago

The rise of the Data Engineer

Chris Williams 1 year ago

What does it actually mean?

a. Automation:

Use orchestration tools like Airflow or Dagster.
Build CI/CD for data pipelines.
Automate tests for data quality and schema changes.

b. Observability:

Know when your data looks weird.
Monitor freshness, nulls, duplicates, unexpected spikes.
Keep logs and create systems to audit the changes.

c. Incident Response:

Alerts that matter.
Create dashboards that explain if anything went wrong.
A culture that doesn’t shoot the messenger.

In short, DataOps helps you trust your data even when you’re not staring at it.

5. Orchestration: The Invisible Conductor

You can have the best scripts and transformations, but if they don’t run in the right order with the right dependencies, you’ll spend your evenings debugging why there were no sales loaded for that day (have you checked if our products data was loaded correctly today?).

Good orchestration:

Automates dependencies.
Fails loudly and clearly.
Plays well with retries, alerts, and dashboards.

Bad orchestration? It’s why someone’s Sunday evening gets ruined.

6. Software Engineering: Because SQL Alone Won’t Save You

There was a time when data engineering was just SQL and cron jobs. Those times are over.

Modern data engineers write a lot of code: Python, Spark, dbt, APIs, tests. And that code needs to be:

Readable
Reusable
Modular
Testable

In other words, you need to know software engineering. Version control, environment management, testing, code reviews. They’re not optional anymore.

Final Thoughts: The Part You Don’t See

Most of what makes a great data engineer is not what shows up on the analytics dashboards and reports. It’s the habits, the design decisions, the trade-offs, the duct-taped late-night hotfixes that later get replaced with clean abstractions.

So next time someone asks what data engineers do, you can say: “We move data from A to B, yes. But we do it securely, reliably, observably, and in a way that won’t haunt our successors in 2027.”

This is the second blog of a series of blogs about the different data engineering concepts and technologies. If you’ve ever wanted to see what a data engineer does instead of just reading job descriptions full of buzzwords, stick around.

Until then, may your data be clean and your DAGs never fail.

Let’s get connected here or on Medium.

Sangam Man Buddhacharya 10mo

Thanks for sharing, Prashant

1 Reaction

See more comments

To view or add a comment, sign in

Data Engineering Best Practices

Prashant Tandan

1. Security: The Least You Can Do

2. Data Management

3. Data Architecture: Design Like You’re Going to Rebuild It Anyway

4. DataOps

Recommended by LinkedIn

5. Orchestration: The Invisible Conductor

6. Software Engineering: Because SQL Alone Won’t Save You

Final Thoughts: The Part You Don’t See

More articles by Prashant Tandan

Others also viewed

Databricks Certified Data Engineer Professional: A Quick Recap for Recertification & Beyond

What Data Engineering Really Was in 2025 (and What It Left Us)

Why I started Data Engineering?

What does the data engineering do.

Forte Spotlight: Internal Development Platforms (IDPs), Key Roles In Data Engineering and More

10 Essentials That Make You a Better Data Engineer

Selective Denormalization Strategy: Why the opposite direction, partially?

The Most Expensive Data Engineering Mistake I’ve Seen

Basics of Data Engineering

GCP Data engineering fundamentals Quick reference notes - Part 1

Explore content categories

1. Security: The Least You Can Do

2. Data Management

3. Data Architecture: Design Like You’re Going to Rebuild It Anyway

4. DataOps

Recommended by LinkedIn

5. Orchestration: The Invisible Conductor

6. Software Engineering: Because SQL Alone Won’t Save You

Final Thoughts: The Part You Don’t See

More articles by Prashant Tandan

What Do Data Engineers Really Do?

Scores Without Soul: The Metric Obsession from Report Cards to Economy

Others also viewed

Databricks Certified Data Engineer Professional: A Quick Recap for Recertification & Beyond

What Data Engineering Really Was in 2025 (and What It Left Us)

Why I started Data Engineering?

What does the data engineering do.

Forte Spotlight: Internal Development Platforms (IDPs), Key Roles In Data Engineering and More

10 Essentials That Make You a Better Data Engineer

Selective Denormalization Strategy: Why the opposite direction, partially?

The Most Expensive Data Engineering Mistake I’ve Seen

Basics of Data Engineering

GCP Data engineering fundamentals Quick reference notes - Part 1

Similar topics

Model Auditing Guidelines for Data Teams

Explore content categories