Data engineering is not software engineering
Data is more important than ever. Yet, many data initiatives fail to deliver on their promises. One of the reasons is a lack of appreciation of the technical work needed in the background to make your data initiatives a success.
Data success won’t come by vision and architecture alone, or by choosing the right platforms and tools, no matter what promises the industry and our architects are trying to sell us. Nor can better software engineering solve our data challenges. To be successful, you need good quality data analysts, data management specialists, and data engineers (some prefer the term analytics engineers, I will use data engineering in this article).
This is the start of a series about what is needed for successful data engineering and to be a professional data engineer.
In my experience there is a lack of appreciation of what makes a good data engineer. Data engineering is a specific profession, which needs a specific skill set, ways of working, tools and principles.
Unfortunately, many organizations consider data engineering as just another kind of software engineering. This causes wrong choices, like placing data engineering under software engineering, imposing software development standards to data engineering, using tooling not tailored for data projects, inefficient processes or hiring the wrong people.
Importantly, data engineering is a completely different discipline. Data engineers are not software developers, they act as data analysts, business analysts, designers, data modelers, DevOps specialists, testers, developers, working closely with business users - often much closer than typical software engineering teams. Some general principles from software engineering are certainly useful for data engineering but they need to be used with care.
In this article I will look at the differences between data engineering and software engineering, to create more clarity on the differences and about the specific needs for data engineering.
Purpose and requirements
The purpose of software engineering is creating software according to specifications. This applies regardless of whether all the specifications will be ready in advance or developed in an agile way as part of the process. Simply put: You can have your software finished, tested and rolled out to the users. If there are no bugs, you don’t need to change anything until there are new requirements.
This is never the case for data engineering. Because the data you are working with is created and managed outside of your influence. Your data application (a pipeline, a data product, a complex model) is always part of a chain, sometimes a very long chain. This means that you inherit all the issues upstream. And that your application is never finished. Never. Because there will always be some data modelling changes upstream, some changes in the business meaning of the data, new ways of working, some data issues, the list is endless.
This means that the basic principles of data engineering are completely different from the principles of software engineering. Let’s have a look at the differences.
End-product
Software engineering has a clear end-product: working software. For a successful development process, users need to have an idea what the software should do, even when it may be a bit vague in the beginning.
Data engineering usually delivers intermediate products: A reporting environment that is easy-to-use for a data visualization team or business analysts. The business should have an idea what data they need, and how to create value out of it. But virtually no user can tell you in the beginning of the project how the dashboards will look like. At best they have a generic description about what kind of data they want to be able to use, and the purpose of the reporting initiative.
How to use the data and the actual format of reports and dashboards will evolve after they have access to the data. This is the only way to be successful with data. And it is perfectly fine, if you keep it in mind and adjust your process for it. So, we must work very agile, and we must accept and accommodate for many small changes all the time.
Another important aspect is that in data engineering we need to build for flexibility. We don’t want our data pipelines and data products to fail on every small change or issue elsewhere in the chain.
Requirements and refinement
Software development has a relatively clear distinction between design/refinement and development.
If you try to refine the complete specifications of a data product (pipeline, model or dashboard) upfront, you will often get a slow and painful process. I have seen scrum teams who spend 2 days on refinement, and then 2 hours on development. That cannot be the right way. My advice is to consider the assessment of the requirements and potential data quality issues as the refinement phase and working out the details, like source-target mappings, as the development phase. That brings the balance back but might feel awkward to a software development team.
Measuring success
The measure of success is completely different. If we deliver new software and there are a lot of issues (bugs), we consider the software poorly developed. A successful data solution though, will generate a lot of data issues, new ideas, new requests. Because if they’re going to use the data, it will offer lots of insights, and the insights generate new questions and new requests.
Note the difference: Many issues after software delivery is considered the result of bad quality, many requests after a new data delivery is considered a success.
Recommended by LinkedIn
Please don’t measure success the same way!
Testing
The purpose of testing software products is proving that the software is functioning correctly and fit for use. You don’t need to test again after it is in production and there are no changes.
Testing of data products and data pipelines is primarily monitoring. It needs to be done continuously. You don’t only test if your data product is developed correctly, but you also need to keep monitoring if the data products are still correct after changes in the source data systems, or after small changes in the business rules, or even after operational process changes. Reality teaches us that not all changes are communicated properly and therefore monitoring is needed to discover them.
Another big difference is that testing for software is often done by different, specialised teams, while for data products it is more effective if it is closely integrated with the data engineering team.
Feedback
In software engineering feedback often comes mainly from testers and product owners. In practice, I see in data projects more direct involvement between business users and business analysts with data engineers.
Release cycle
Many organizations want the releases of operational systems very controlled, so that everything is aligned: system, migration, documentation, end-user training. This typically leads to long release cycles (months), except for bug fixes.
Data products have less dependencies like this. My experience is that a successful (read: actively used) data product generates many small requests that can deliver immediate value. It would be a lost opportunity to wait for a longer release cycle. So, a typical release cycle of a successful data team would be 2 weeks. Of course, a bit more preparation is needed when you build data products for external customers.
A short overview of the differences:
When you would rather use a software engineer
Let’s make it clear. Some of the work in a data engineering team can be better done by software engineers. Data engineers are the best fit when there is a lot of data involved, and flexibility and data analysis skills matter. But for more technically oriented tasks software engineers might be a better choice. Examples are:
Conclusion
If you want to be successful in your data initiatives, make sure data engineering can play its role well. Hire the right people to do the work. And set up your processes and way of working in a way that works for data engineering and data success. It is not a typical technical job!
Please let me know what you think.
Next article coming soon: A data engineer is a data detective!
The author declares that this article is not generated by AI.
True!!! Data engineering and software engineering are connected but different. Data engineers focus on data pipelines, while software engineers build apps and systems.
💡 Data engineering is not software engineering — and confusing the two misses the mark. While both write code, data engineers design systems that move, transform, and scale data — with a focus on reliability, latency, and trust. It’s about building pipelines, not products. Different mindset. Different goals. Same need for precision.