Data Architect

Data Architect

It’s been about 20 years since I first did data architecture work, designing a data warehouse for a utility company. It’s been quite varied works, ranging from writing ERD and designing databases to designing ingestion pipelines for data lakes and warehouses. From dealing with Azure Key Vault, Front Door, Databricks and Fabric, to S3 and Snowflake in AWS. Today I’d like to reflect on that journey, with a particular emphasis on the difference to data engineering.

As a data architect, our bread and butter is data modelling. Here, we are dealing with Entity Relationship Diagram, but also so much more, such as understanding the business too. Yes, there are two very different skills in contrast when doing data modelling. On one side it is the technicality, the technical skills. And on the other side it’s the business knowledge. You need both of them to do data modelling.

In the technical skills side for example, you need to understand the difference between these crow’s feet symbols when writing an ERD:

Article content

On the other hand, designing a data model requires business knowledge, in a specific industry. For example, if you are designing a data model for a retail bank, you need to understand payment services, such as regular payments, direct debits, standing orders, transfer, CHAPS, SWIFT and SEPA. Some of them are global, such as SWIFT, some only applicable in a region such as SEPA (Single European Payments Area), and some only applicable in a country, such as CHAPS and Faster Payments. Without detailed knowledge on the payment system, you can’t write the entities and attributes in the ERD, let alone the relationships.

So that’s one side of data architecture, i.e. data modelling. But there is another side of data architecture. One where we have to deal with the big picture. Not just the database, but also the ingestion process, and the reporting. How everything hangs together, like this:

Article content

(credit: Michael Segner, MonteCarloData.com, link)

It is actually quite challenging, to comprehend many technologies like above. You need to deal with S3, Redshift, Lambda, Glue, Docker, EMR, Athena, Monte Carlo, DataDog, CloudWatch, Looker, Tableau, APIs, etc. It’s overwhelming to say the least. And yet when you work on Azure or Google Cloud, you face different set of technologies.

Ah, but that’s data engineering, no? Your job as a data architect is just to draw up the conceptual boxes and lines, no? Not at all. As a data architect, you need to lead the data engineers. You need to understand how it all hangs together. The big picture is in your hand, including pipelines, storage and security. That big picture diagram my friends, is your responsibility, as a data architect. Not the network engineer, not the data engineer, not the enterprise architect, but it’s yours, the data architect in the company.

So there are two differences to data engineering. One, data modelling. I agree it’s not something that is specific to the data architects, but data architects do data modelling much more than a data engineer. Two, the data architecture diagram, including pipelines, storage, security and reporting. Yes data engineer built them, but you are responsible for the big picture. To make sure everything hangs together well, and works.

So that, is the world of data architecture. As usual I welcome your comments, corrections and opinions.

I love your awesome article about what data architecture is and what a data architect does! Thanks for sharing! 👍

As usual, very succinctly put! The role of the data architect has been brought out very well.

To view or add a comment, sign in

More articles by Vincent Rainardi

  • Unstructured Data - From Conversational Files to Conversational Analytics

    For decades analytics is about tables, numbers and relational databases. It is about structured data, as we call it.

    3 Comments
  • Business Analyst

    Before I was a data architect, I was a data engineer. And before I was a data engineer, I was a business analyst.

    1 Comment
  • CDO and CIO: What's the difference?

    So CIO is Chief Data Officer. And CDO is Chief Data Officer.

  • Snowflake dbt Projects

    How does Snowflake dbt projects look like? It looks like this: Snowflake dbt Projects and Cortex Code On the left you…

  • Stupid Questions

    There is NO such thing as a stupid question. Why? Because asking questions is a good way to get knowledge.

  • The Science of (Data) Migration

    Say you have a data warehouse in SQL Server or Oracle, and you need to migrate it to Snowflake or Databricks. The…

    1 Comment
  • Cortex Search

    Cortex is the AI capability in Snowflake. Of all the Cortex features, Cortex Search is probably the least well known.

  • AI-ready data: what does it mean?

    JI am a practical person and when I hear people talking “fluffy cloud” words like “AI-ready data” I always try find out…

  • Interval Data Type

    We all know a data type called Date. And Time.

  • Row Timestamp

    In Snowflake, the Row Timestamp is a column that stores when each row was last updated. It’s a brand new feature, went…

Others also viewed

Explore content categories