Data Architect
It’s been about 20 years since I first did data architecture work, designing a data warehouse for a utility company. It’s been quite varied works, ranging from writing ERD and designing databases to designing ingestion pipelines for data lakes and warehouses. From dealing with Azure Key Vault, Front Door, Databricks and Fabric, to S3 and Snowflake in AWS. Today I’d like to reflect on that journey, with a particular emphasis on the difference to data engineering.
As a data architect, our bread and butter is data modelling. Here, we are dealing with Entity Relationship Diagram, but also so much more, such as understanding the business too. Yes, there are two very different skills in contrast when doing data modelling. On one side it is the technicality, the technical skills. And on the other side it’s the business knowledge. You need both of them to do data modelling.
In the technical skills side for example, you need to understand the difference between these crow’s feet symbols when writing an ERD:
On the other hand, designing a data model requires business knowledge, in a specific industry. For example, if you are designing a data model for a retail bank, you need to understand payment services, such as regular payments, direct debits, standing orders, transfer, CHAPS, SWIFT and SEPA. Some of them are global, such as SWIFT, some only applicable in a region such as SEPA (Single European Payments Area), and some only applicable in a country, such as CHAPS and Faster Payments. Without detailed knowledge on the payment system, you can’t write the entities and attributes in the ERD, let alone the relationships.
So that’s one side of data architecture, i.e. data modelling. But there is another side of data architecture. One where we have to deal with the big picture. Not just the database, but also the ingestion process, and the reporting. How everything hangs together, like this:
Recommended by LinkedIn
(credit: Michael Segner, MonteCarloData.com, link)
It is actually quite challenging, to comprehend many technologies like above. You need to deal with S3, Redshift, Lambda, Glue, Docker, EMR, Athena, Monte Carlo, DataDog, CloudWatch, Looker, Tableau, APIs, etc. It’s overwhelming to say the least. And yet when you work on Azure or Google Cloud, you face different set of technologies.
Ah, but that’s data engineering, no? Your job as a data architect is just to draw up the conceptual boxes and lines, no? Not at all. As a data architect, you need to lead the data engineers. You need to understand how it all hangs together. The big picture is in your hand, including pipelines, storage and security. That big picture diagram my friends, is your responsibility, as a data architect. Not the network engineer, not the data engineer, not the enterprise architect, but it’s yours, the data architect in the company.
So there are two differences to data engineering. One, data modelling. I agree it’s not something that is specific to the data architects, but data architects do data modelling much more than a data engineer. Two, the data architecture diagram, including pipelines, storage, security and reporting. Yes data engineer built them, but you are responsible for the big picture. To make sure everything hangs together well, and works.
So that, is the world of data architecture. As usual I welcome your comments, corrections and opinions.
I love your awesome article about what data architecture is and what a data architect does! Thanks for sharing! 👍
As usual, very succinctly put! The role of the data architect has been brought out very well.