Streaming Technologies Comparison
Introduction
This is the first time a decided to share my notes here in Linked-in, which could be beneficial for me and maybe for other people.
Recently I read some posts about streaming technologies and some comparisons. I tried to summarize the info provided and create a comparison table for myself.
Technologies to be compared
I selected Apache Spark, Flink, Storm, Kafka Streams and Samza, since they are extensively used and have some popularity within the data engineering community. Some of them could be outdated and less promising than others.
Some inputs are also only based on the reference I provided at the end. I gained a lot of experience using Kafka + Spark Streaming in the last years but I’ve never used the other technologies discussed, so please be kind 😊 and correct me if I’m wrong.
Initially I read this comparison where also Apache Beam, Apache Apex and Akka Streams where discussed. I discard them just for lack of time.
It will be also quite interesting to include cloud providers alternative like Azure Stream Analytics or Amazon Kinesis Data Analytics.
The comparison table
I chose the features to be compared based in the information provided in my references. Some features don’t applied to all technologies since their computational model or architecture could be completely different.
BTW. I haven't found a pretty option to create a table in Linked-in.
Conclusion
It seems like depending on the complexity of the business requirements or the computational power needed there are two main categories:
High-performance cluster computing and data processing frameworks: here Apache Spark and Apache Flink are very strong and they cover a wide range of use cases. Which one is the right choice, impossible to say, it depends on the specific requirements and use case.
Embeddable stream processing engines: here I would say Kafka Streams is a great option for building reactive and stateful streaming applications, microservices and event-driven systems. It is also suitable for many IoT scenarios.
I hope these notes help other people and I encourage you to try these technologies by yourself.
References
- https://dzone.com/articles/streaming-in-spark-flink-and-kafka-1
- https://medium.com/@chandanbaranwal/spark-streaming-vs-flink-vs-storm-vs-kafka-streams-vs-samza-choose-your-stream-processing-91ea3f04675b
- https://www.confluent.io/blog/apache-flink-apache-kafka-streams-comparison-guideline-users/
- https://blog.knoldus.com/flinkathon-what-makes-flink-better-than-kafka-streams/
Interesting. Thanks for sharing. The table is not Mobile friendly, but you already complained about that in the article