Making Right Choice for Streaming Akka Framework OR Apache Spark -- Series I - Kafka
There has been lots of talk over selecting right streaming engine for distributed applications which are capable of processing data on arrival broadly termed as ' real time or near real time ' contrary to batch oriented architectures of big data where data is ingested in volumes on distributed storage like HDFS and processed later which are too slow.
Adopting a streaming engine can be a deceptive process from bunch of micro service architectures like reactive streaming , akka streaming , kafka streaming , apache flink but data volume and latency remains the key parameters before endorsing one.
Why Kafka ?
The kafka producer and consumer in kafka ecosystem can be anything right from sockets , rest endpoints , logs services and so as consumers can be any consumption services . The consumer and producer has P * C relationship where P services can talk with any N number of C services . Imagine a scenario where this centralized kafka broker is not in place and services calling each other would result in data loss if one of the service from the producer ends drops off.
Kafka broker will provide a reliable way for communication between producers and consumers as services at consumer end just needs to speak with kafka api instead of interpreting whats at the consumer end may be it is like rest api or flume agents which will prevent data loss and main the state of your distributed application efficiently.