Apache Kafka
What?
Apache Kafka is a distributed streaming platform, data pipeline, enables real-time data ingestion and messaging, running on clusters to provide scalability, availability, throughput and performance.
Allows to build data pipeline, where the producers can publish the data in topics on partitions, and subscribers can read the data from the appropriate partitions
Let us understand key terminologies
Why Kafka over Messaging?
Kafka has better throughput, built-in partitioning, replication, and fault-tolerance , which makes it a good solution for large scale message/event processing applications.
Streaming vs. Message Platform
Stream
Recommended by LinkedIn
Message
Use cases/Scenarios
Messaging - compared to existing enterprise messaging tools, Kafka can provide the high throughput, low latency and persistence of messages
Click stream analysis - track the website activities like view, search,… will be published into topics and the subscribers can consumes the data
Log aggregation -. Kafka abstracts the log files and provide log as stream of messages
Event sourcing - Kafka is very good backend for event sourcing based applications
Stream processing - Kafka streams to process the real-time data, does the aggregation and transformed for further consumption
Key Benefits:
Challenges: