Apache Kafka Introduction
Apache Kafka is an event streaming platform.
3 features –
To publish (write) and subscribe to (read) streams of events
To store streams of events durably and reliably for as long as you want
To process streams of events as they occur or retrospectively.
What is event streaming?
It is way of capturing data in real-time from event sources like databases, sensors, mobile devices, s/w applications etc.
-storing these events
-processing these events
-routing these events
Event streaming thus ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time.
To capture and analyze sensor data from IoT devices in factories
To capture payments
To capture financial transactions
To track and monitor cars, trucks, shipment etc.
Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol.
Servers: Kafka is run as a cluster of one or more servers that can span multiple datacenters or cloud regions. Some of these servers form the storage layer, called the brokers
-Other servers run Kafka Connect to continuously import and export data
Clients: They allow you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner even in the case of network problems or machine failures.
When some event happens that is recorded. It is also called record or message. Event will have key, value and timestamp.
Event key: “Prakash”
Event value: “Paid Rs.100 to Vijay”
Event timestamp: “May. 25, 2022 at 7:06 p.m.”
Producers are those client applications that publish (write) events to Kafka, and consumers are those that subscribe to (read and process) these events.
Events are organized and durably stored in topics. (folders in filesystem and events are files in that folders).
Topics are partitioned, meaning a topic is spread over a number of “buckets” located on different Kafka brokers.
-The Admin API to manage and inspect topics, brokers, and other Kafka objects.
-The Producer API to publish (write) a stream of events to one or more Kafka topics.
-The Consumer API to subscribe to (read) one or more topics and to process the stream of events produced to them.
-The Kafka Streams API to implement stream processing applications and microservices.
-The Kafka Connect API to build and run reusable data import/export connectors that consume (read) or produce (write) streams of events from and to external systems and applications so they can integrate with Kafka.