Streaming Technologies Comparison

Paul Hernandez

Published Dec 17, 2019

Introduction

This is the first time a decided to share my notes here in Linked-in, which could be beneficial for me and maybe for other people.

Recently I read some posts about streaming technologies and some comparisons. I tried to summarize the info provided and create a comparison table for myself.

Technologies to be compared

I selected Apache Spark, Flink, Storm, Kafka Streams and Samza, since they are extensively used and have some popularity within the data engineering community. Some of them could be outdated and less promising than others.

Some inputs are also only based on the reference I provided at the end. I gained a lot of experience using Kafka + Spark Streaming in the last years but I’ve never used the other technologies discussed, so please be kind 😊 and correct me if I’m wrong.

Initially I read this comparison where also Apache Beam, Apache Apex and Akka Streams where discussed. I discard them just for lack of time.

It will be also quite interesting to include cloud providers alternative like Azure Stream Analytics or Amazon Kinesis Data Analytics.

The comparison table

I chose the features to be compared based in the information provided in my references. Some features don’t applied to all technologies since their computational model or architecture could be completely different.

BTW. I haven't found a pretty option to create a table in Linked-in.

Conclusion

It seems like depending on the complexity of the business requirements or the computational power needed there are two main categories:

High-performance cluster computing and data processing frameworks: here Apache Spark and Apache Flink are very strong and they cover a wide range of use cases. Which one is the right choice, impossible to say, it depends on the specific requirements and use case.

Embeddable stream processing engines: here I would say Kafka Streams is a great option for building reactive and stateful streaming applications, microservices and event-driven systems. It is also suitable for many IoT scenarios.

I hope these notes help other people and I encourage you to try these technologies by yourself.

References

Pablo Endres 6y

Interesting. Thanks for sharing. The table is not Mobile friendly, but you already complained about that in the article

To view or add a comment, sign in

See all

Streaming Technologies Comparison

Paul Hernandez

Introduction

Technologies to be compared

The comparison table

Conclusion

References

More articles by this author

Others also viewed

Spark Streaming vs Flink vs Storm vs Kafka Streams vs Samza : Choose Your Stream Processing Framework

If You Chose Spark Streaming Because "We Already Have Spark," You're Heading Toward a Wall

Small information on spark streaming Checkpointing

Upgrading a Running Spark Streaming Application with Code Changes with Zero-Data-Loss

Spark Streaming v.s. Flink Streaming

🚀 1. Open‑Source vs Paid Streaming Platforms (Using Kafka as Example)

Kafka: Essential Terms

Kafka and Spark Streaming Integration

Explore content categories

Introduction

Technologies to be compared

The comparison table

Conclusion

References

Book Review - Azure Synapse Analytics Cookbook

Jul 8, 2022

Trading volume in Rubles @Binance notably increased during March 2022

Apr 4, 2022

Serverless cloud-agnostic and open source platforms

Jun 19, 2020

Others also viewed

Spark Streaming vs Flink vs Storm vs Kafka Streams vs Samza : Choose Your Stream Processing Framework

If You Chose Spark Streaming Because "We Already Have Spark," You're Heading Toward a Wall

Small information on spark streaming Checkpointing

Upgrading a Running Spark Streaming Application with Code Changes with Zero-Data-Loss

Spark Streaming v.s. Flink Streaming

🚀 1. Open‑Source vs Paid Streaming Platforms (Using Kafka as Example)

Kafka: Essential Terms

Kafka and Spark Streaming Integration

Similar topics

Common Use Cases for Data Streaming Technologies

How to Analyze Streaming Data

Explore content categories