Kafka using Java | Basic code examples of Features | Producer, Consumer, Offset, Group, Partition, Replication

In this article, we will go through simplest example of Apache Kafka along with basic installation steps for Windows operating system. We will use Java for code examples.

Example in this article

  • Install Kafka on Windows machine.
  • Create a producer which will mimic customer & deposit bank check.
  • Deposited check amount will be published to a Kafka topic.
  • Bank Check processor consumer will pick amounts from Kafka topic & process it.

We will use this example & execute in different ways to understanding Kafka features.

Installation

Kafka Setup on Windows OS | Basic Installation, Setup, Verification, Cluster Setup, Storage

Kafka client Java API

To connect to Kafka, publish or subscribe messages we need to use kafka-client java library. Below are references.

Its not required to create topic from command line. By default “auto.create.topics.enable” property is true due to which topic is automatically created when producer or consumer tries to access through code.


Kafka Producer



Kafka Consumer



Execute program & observe Kafka features

We will try different scenarios to observe behaviors of Kafka in different cases.

New Consumer connects before Producer publishes

  • Scenario
    • Run the consumer first which will keep polling Kafka topic
    • Then run the producer & publish messages to Kafka topic.

This is the producer log which is started after consumer. As per code, producer will send 10 records & then close producer. So producer java program exits after that.

Below is the consumer log. After producer publishes records, consumer receives records & processes it. Consumer JVM keeps running since its a infinite loop.

Notice the offset value of the last record which is ‘9’. This is committed to Kafka automatically due to property we set consumerProps.setProperty("enable.auto.commit", "true"); . Kafka internally stores offsets in internal topic called “__consumer_offsets”. You can verify current offsets using below CLI command in command prompt.

We will see how this offset is used in next example.


Existing Consumer re-connects AFTER producer published

  • Scenario
    • Stop existing running consumer from earlier section.
    • Start producer & publish all the messages when consumer is not running.
    • After producer publishing finishes, then start consumer.

By default, Kafka configuration comes with below property. Below property means that the published message will remain in kafka for 168 hours regardless of whether consumer reads it or not.

Now here is the producer log which publishes messages and finishes. Notice offset which is “+1” from offset from earlier test.

Now in this test, consumer was not connected when publisher published message. In other MQ providers like ActiveMQ or RabbitMQ, message is lost if consumer was not subscribed at the time message was published. In Kafka, due to above configuration, Kafka consumer can connect later (Before 168 hours in our case) & still consume message.

Below is consumer log which is started few minutes later. In earlier example, offset was stored as ‘9’. So now consumer starts from offset 10 onwards & reads all messages.

Note that this is true for the consumers which had connected to particular topic before, committed offset to kafka & reconnecting again. If you connect new consumer with different consumer group, then it won’t read past messages by default because it never committed offset to kafka. For this check next section.


New Consumer Group connect & consume all messages from starting.

  • Scenario
    • Connect new consumer to existing topic which already had published messages.
    • Consume all past messages from beginning.

If new consumer group wants to read messages from past then use below shown property in consumer code.

You can see this new consumer consumed all messages starting from offset 0 till 19 i.e. all messages from earlier tests.



Existing Consumer re-connects AFTER retention period is over.

  • Scenario
    • Run producer first & publish all messages.
    • Wait for retention period to be over.
    • Connect consumer after retention period & try to consume messages.
    • Run producer again to get new messages & see if those messages get consumed.

To make this test say, lets modify properties so that retention period is smaller to test quickly.

Here is the producer log. Notice that offset started from where we left off in earlier test.

Now wait for 1 hour ………..

After more than hour, start the consumer from earlier tests. Here is consumer log after more than an hour.

No messages consumed even though offset was committed because all messages are now deleted since retention period is over. Lets try running producer again & produce more new messages. Keep consumer running.

Here is the consumer log now. It will start consuming newer messages from newer offset i.e. 30.



Partitioned topic & multiple consumers with same consumer group.

  • Scenario
    • Run 2 Kafka servers & form a Kafka cluster.
    • Create partitioned & replicated topic.
    • Run producer & publish messages to partitioned & replicated topic.
    • Run 2 consumer instances & consume from all partitions.

Refer to earlier article to setup simple Kafka cluster with 2 Kafka broker. Then create a partitioned & replicated topic. Verify that topic is created as expected.

As shown in above command output, Kafka created 2 partitions of topic & put each partition on each Kafka server to make it scalable. Kafka also created replicas of each partition on other Kafka server to make it highly available.

Now modify our code as shown below. Change the topic name to newly created topic & add logging for partition.

Now start 2 instances of consumers first. Simply run consumer java program twice. Then run producer code once. Below are logs.

You can see that Kafka client distributed published messages among partitions evenly.

Kafka assigns one consumer per partition. So as you can see below in consumer logs that each consumer from same group consumed messages from single partition.



Replicated topic & Consumer consuming after broker instance failure

  • Scenario
    • Stop all consumers from past examples.
    • Run producer first & publish all messages to Kafka cluster.
    • Stop/kill one of the Kafka server/broker to imitate broker failure.
    • Run consumer & observe messages being consumed.

Now you can see that messages are published to different broker partitions. Now stop one of the kafka broker server. Then run the consumer. You can see that consumer will consume all messages from both partitions because partitions are also replicated.



Leave a Reply

Your email address will not be published.