Data Engineering Best Practices
The undercurrents of Data Engineering
“Smooth seas never made skilled sailors.” — Franklin D. Roosevelt
In the last blog, we came to know what data engineers really do. In this blog, we are going to explore how they do the things they do (or practically speaking, how they should be doing those things).
I think my last blog was clear about this: data engineering doesn’t have a cool-kid hype like training LLMs or finetuning diffusion models to make Drake wear a sari. What it does have, however, is everything you need to make all of those things work.
The cool LLMs and those ML algorithms don’t work if there’s no data or if the data is crappy. The work of data engineers is not just to dump the data from disparate sources to a target system so that it can be gobbled up by these cool applications. Yes, I admit that it involves a lot of data dumping but there are a set of principles that have to be followed so that the dumping is done securely and reliably, and so that the dumping system survives a long time.
Without further ado, let’s talk about those principles:
1. Security: The Least You Can Do
One of the most common anti-patterns in early data teams is: “Just give them admin access. We’ll clean it up later.” (Spoiler: we never do!)
In data engineering, security isn’t just a checkbox, it’s a mindset. When pipelines touch sensitive data like credit cards, healthcare info, internal emails, you can’t afford a “YOLO” mentality.
Here’s the golden rule: Least privilege, always. Give people only the permissions they need, only for the time they need them. Just like you don’t hand over your Netflix password to someone who’s looking to watch a movie trailer, don’t give root access to people checking the column of a table.
And for the data engineers themselves:
2. Data Management
“Data management” sounds like one of those overly broad corporate terms, like “strategic synergies” or “robust alignment.” But underneath that umbrella lives half of a data engineer’s job.
It includes:
3. Data Architecture: Design Like You’re Going to Rebuild It Anyway
Architecture in data engineering is not just “how the tables are joined.” It’s about designing for today with tomorrow in mind.
Good architecture is flexible. It makes reversible decisions, prepares for failures, and chooses components that work well with other teams, not just yours.
Here are some principles that you may want to tattoo on your Teams channel:
And most importantly:
4. DataOps
Imagine if software engineers didn’t have Git, CI/CD, or Jira (okay, maybe Jira is debatable). That’s what data engineering often looked like five years ago.
Enter DataOps, a set of practices that bring agility, collaboration, and reliability to data products.
Recommended by LinkedIn
What does it actually mean?
a. Automation:
b. Observability:
c. Incident Response:
In short, DataOps helps you trust your data even when you’re not staring at it.
5. Orchestration: The Invisible Conductor
You can have the best scripts and transformations, but if they don’t run in the right order with the right dependencies, you’ll spend your evenings debugging why there were no sales loaded for that day (have you checked if our products data was loaded correctly today?).
Good orchestration:
Bad orchestration? It’s why someone’s Sunday evening gets ruined.
6. Software Engineering: Because SQL Alone Won’t Save You
There was a time when data engineering was just SQL and cron jobs. Those times are over.
Modern data engineers write a lot of code: Python, Spark, dbt, APIs, tests. And that code needs to be:
In other words, you need to know software engineering. Version control, environment management, testing, code reviews. They’re not optional anymore.
Final Thoughts: The Part You Don’t See
Most of what makes a great data engineer is not what shows up on the analytics dashboards and reports. It’s the habits, the design decisions, the trade-offs, the duct-taped late-night hotfixes that later get replaced with clean abstractions.
So next time someone asks what data engineers do, you can say: “We move data from A to B, yes. But we do it securely, reliably, observably, and in a way that won’t haunt our successors in 2027.”
This is the second blog of a series of blogs about the different data engineering concepts and technologies. If you’ve ever wanted to see what a data engineer does instead of just reading job descriptions full of buzzwords, stick around.
Until then, may your data be clean and your DAGs never fail.
Let’s get connected here or on Medium.
Thanks for sharing, Prashant