Lambda Architecture in 2020

Lambda Architecture in 2020

As I start to think about some of the upcoming projects we'll be working on over the next year and how we might go about building them, I wanted to consider where lambda architecture fits in our toolbox for building data services.

Having led a project a few years ago that tried to follow lambda architecture, I've got a decent feel for it's strengths and weaknesses. In my opinion, lambda architecture is based on two key assumptions:

1. Streaming processes are in some way imperfect (e.g. lossy, maybe slow for all but simple processing, unable to handle late or replayed data, unable to do complex aggregations, etc)

2. Users are happy to have either missing or approximated data for a while until the batch process completes

A problem for many use cases is that users do not want assumption 2, or do not understand why that should be the case, and therefore start losing trust in the whole system.

I also don't think assumption 1 holds up as much now we have frameworks like Beam and Flink, which are a lot better than what we had 3-5 years ago.

The exception to this might be if you really need (imperfect) data very, very fast. But even then, you would probably be better off simplifying your requirements rather than adding the complexity of this architecture.

I think lambda architecture is now a product of its (really very brief) time, and shouldn't be seriously considered for building data services these days. Having said that, the ideas and its approach to solving them are still interesting to learn from, so the book may still be worth a read if you are interested in the area.

---

Originally published at https://andrew-jones.com/blog/lambda-architecture-in-2020/

Cover image from Unsplash.

To view or add a comment, sign in

More articles by Andrew Jones

  • A Data Platform is not just for analytics

    Data is at the heart of every meaningful service, and it’s the effective use of that data which builds a product, and a…

  • Data Contracts

    Almost all data platforms start with a change data capture (CDC) service to extract data from an organisations…

  • We had an incident, and it was great

    We recently had an incident with our data pipeline, resulting in data being lost on route to our data platform. Of…

  • The democratisation of Data Science

    There’s a trend in the industry to make data science and machine learning more accessible, allowing engineers to build…

  • What does a Tech Lead do?

    I've been a Tech Lead for a few years now, though I'd say I've only been a good Tech Lead for about a year. So what…

  • The benefits of postmortems

    Postmortems are a well established process followed in the aftermath of an incident. Often the most visible output from…

  • You build it, you run it - you manage your data

    Many tech companies follow the "you build it, you run it" mantra - and for good reason. You want to empower your teams…

  • If your data is worth storing, it's worth structuring

    When some people talk about a Data Lake (or Hadoop, or even just Big Data), they go on to say that we can store all our…

  • Momentum

    In the Michael Connelly books, Detective Harry Bosch often talks about the importance of momentum when working a case…

  • Why your master datastore should be immutable

    Thinking about how your master dataset is stored and managed is arguably the most important part of architecting a…

Others also viewed

Explore content categories