From a data engineer to a data architect

From a data engineer to a data architect

A few days ago someone asked me: Hi Vincent, I am working as a data engineer but I want to transition to a data architect. Would love to get few ideas and assistance.

And here is my reply to her. She should remain anonymous of course, but I thought I should share my reply here so that everyone could benefit from it.

There are a few ways of becoming a data architect. And the best way is by becoming a data engineer first.

This is because data engineers know how things are built. They know whether an architecture would work or not, and what issues will rise when it is implemented.

What the data engineers are lacking is the data governance side. Such as the approval process, the privacy policy, data quality, metadata and data retention policy.

The other area that's lacking is information architecture, i.e. data modelling, metadata management, master data management, reference data management, data quality rules, and observability. And business knowledge.

The other area that's often lacking is the financials. How to cost a solution, both operation costs and build costs. And most importantly how to control costs. And audit. And how to get funding, which is a very important thing for your team or project.

The other area that's lacking is reference architecture. You need to have a library containing different solution architectures, like 20-30 of them (no you don't create them but copy them from AWS website, Azure website, Snowflake website, etc). Apart from reference architecture, you also need to undertand all capabilities in data and AI. There are about 100 different capabilities in data and AI, try to list them all. For example: doing analytics, processing data, providing storage, ingesting data, transforming data, controlling data quality, creating visualisation, etc.

So to grow your abilities in data architecture, you need to plug those gaps.

Pathways to a Data Architect

Over the years, I’ve seen different path that people took in becoming a data architect. Some of them are through data engineering, some of them are through data modeller, some of them are through data analyst, some of them are through a data management/governance, and some of them are through business intelligence. Some of them are through data science/ML/AI.

There are so many paths to become a data architect. You don’t even have to be able to code. It sounds strange but that is true. If you are data modeller, you can become a data architect, without being able to code. You will need to learn a lot about data pipelines, but you don’t have to be able to code.

But ultimately, a data architect designs data infrastructure. Including the storage, the pipeline and the compute. Including the databases, the data lake, the data models and the data flows. Including the data quality monitoring, the security and the metadata repository. Including the data privacy, the data governance and the regulatory compliance.

So you have to have some technical proficiency. And you have to be able to think strategically too. I think data engineers make the best data architects, but that is just my opinion. You might think that data modellers or data scientists or data analysts would make better data architects than data engineers. And you are perfectly entitled to your opinion. I could be wrong and you could be right. What do you think? Let me know in the comments.

Keep learning! My Linkedin articles: link. My blog: link.

#DataArchitect #DataEngineer #DataScientist #DataAnalyst #DataWarehouse #Analytics #Career #Job

Credit for cover image: University of the Potomac, https://potomac.edu/data-architect-vs-data-engineer/

Great article! In my opinion architects who can code bring stronger value because they understand how pipelines behave in production and design solutions that are practical and easier for engineers to implement.

Awesome Article Vincent! 👏 I’d love to know more about the mindset and skills a Data Architect should have — and what courses or topics you’d recommend learning to move from Data Engineer to Data Architect. 🙏

Personally, I'd start as being a Solution Architect first 😊 Data is used by the business user to support and grow the business. Understanding how the business uses the data, their challenges and frustrations is key to being able to put in place the right architecture for the data journey.

Really interesting article. I love the Career Progression diagram you have shared with us. I want to transition into a Data Architect at some point. I am from a Relational Database Background. Would you advise transiting into a Data Engineer role before moving applying to Data Architect roles in future?

To view or add a comment, sign in

More articles by Vincent Rainardi

  • Unstructured Data - From Conversational Files to Conversational Analytics

    For decades analytics is about tables, numbers and relational databases. It is about structured data, as we call it.

    3 Comments
  • Business Analyst

    Before I was a data architect, I was a data engineer. And before I was a data engineer, I was a business analyst.

    1 Comment
  • CDO and CIO: What's the difference?

    So CIO is Chief Data Officer. And CDO is Chief Data Officer.

  • Snowflake dbt Projects

    How does Snowflake dbt projects look like? It looks like this: Snowflake dbt Projects and Cortex Code On the left you…

  • Stupid Questions

    There is NO such thing as a stupid question. Why? Because asking questions is a good way to get knowledge.

  • The Science of (Data) Migration

    Say you have a data warehouse in SQL Server or Oracle, and you need to migrate it to Snowflake or Databricks. The…

    1 Comment
  • Cortex Search

    Cortex is the AI capability in Snowflake. Of all the Cortex features, Cortex Search is probably the least well known.

  • AI-ready data: what does it mean?

    JI am a practical person and when I hear people talking “fluffy cloud” words like “AI-ready data” I always try find out…

  • Interval Data Type

    We all know a data type called Date. And Time.

  • Row Timestamp

    In Snowflake, the Row Timestamp is a column that stores when each row was last updated. It’s a brand new feature, went…

Others also viewed

Explore content categories