In this article, we will go through very simple & basic installation of Kafka on windows machine.
Table of Contents
Basic Kafka Setup
Apache ZooKeeper
Kafka needs Zookeeper installed & running first. So lets start with installing Zookeeper.
- Download – Download zookeeper from https://zookeeper.apache.org/releases.html. It will be GZIP file like “apache-zookeeper-3.5.7-bin.tar.gz“. Feel free to download latest version available.
- Unzip – Ungzip & then untar downloaded zookeeper at some directory like C:\...\apache-zookeeper-3.5.7-bin. You can use any unzip utility like 7zip to ungzip & untar.
- Config – Zookeeper needs a config file at location
C:\...\apache-zookeeper-3.5.7-bin\conf. So just copy sample file that came with installation from
conf/zoo_sample.cfg as
conf/zoo.cfg .
- If config file is missing then you might see error
123org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing C:\...\apache-zookeeper-3.5.7-bin\apache-zookeeper-3.5.7-bin\bin\..\conf\zoo.cfg.Caused by: java.lang.IllegalArgumentException: C:\...\apache-zookeeper-3.5.7-bin\apache-zookeeper-3.5.7-bin\bin\..\conf\zoo.cfg file is missing
- If config file is missing then you might see error
- Data directory – Create “data” folder under zookeeper & change config zoo.cfg to dataDir=C:\...\apache-zookeeper-3.5.7-bin\data
- Start – then start zookeeper as shown below
1 2 3 4 5 6 7 8 |
C:\...\apache-zookeeper-3.5.7-bin\bin>zkServer.cmd. . . 2020-03-04 21:52:02,553 [myid:] - INFO [main:NIOServerCnxnFactory@686] - binding to port 0.0.0.0/0.0.0.0:2181 2020-03-04 21:52:02,573 [myid:] - INFO [main:ZKDatabase@117] - zookeeper.snapshotSizeFactor = 0.33 2020-03-04 21:52:02,582 [myid:] - INFO [main:FileTxnSnapLog@404] - Snapshotting: 0x0 to C:\MSTD\Tools\apache-zookeeper-3.5.7-bin\apache-zookeeper-3.5.7-bin\data\version-2\snapshot.0 2020-03-04 21:52:02,586 [myid:] - INFO [main:FileTxnSnapLog@404] - Snapshotting: 0x0 to C:\MSTD\Tools\apache-zookeeper-3.5.7-bin\apache-zookeeper-3.5.7-bin\data\version-2\snapshot.0 2020-03-04 21:52:02,613 [myid:] - INFO [main:ContainerManager@64] - Using checkIntervalMs=60000 maxPerMinute=10000 |
Apache Kafka
Now that Zookeeper is installed & running, we can download & install Kafka.
- Download – Download Kafka from https://kafka.apache.org/quickstart#quickstart_download i.e. “kafka_2.12-2.4.0.tgz“
- Unzip – Unzip downloaded Kafka at location like C:\...\kafka_2.12-2.4.0\
- Config – Kafka needs config/server.properties which comes with installation with default configurations. We will use the same.
- Start – Start Kafka as shown below. Note that scripts for windows are located in kafka_2.12-2.4.0\bin\windows
1 2 3 4 5 6 7 8 9 10 11 12 |
C:\MSTD\Tools\kafka_2.12-2.4.0\bin\windows>kafka-server-start.bat ../../config/server.properties [2020-03-04 21:58:56,568] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$) [2020-03-04 21:58:57,172] INFO starting (kafka.server.KafkaServer) [2020-03-04 21:58:57,173] INFO Connecting to zookeeper on localhost:2181 (kafka.server.KafkaServer) [2020-03-04 21:58:57,196] INFO [ZooKeeperClient Kafka server] Initializing a new session to localhost:2181. (kafka.zookeeper.ZooKeeperClient) [2020-03-04 21:58:57,217] INFO Client environment:zookeeper.version=3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18 GMT (org.apache.zookeeper.ZooKeeper) [2020-03-04 21:58:57,217] INFO Client environment:host.name=192.168.0.19 (org.apache.zookeeper.ZooKeeper) [2020-03-04 21:58:57,217] INFO Client environment:java.version=1.8.0_121 (org.apache.zookeeper.ZooKeeper) [2020-03-04 21:58:57,217] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper) [2020-03-04 21:58:57,218] INFO Client environment:java.home=C:\Program Files\Java\jdk1.8.0_121\jre (org.apache.zookeeper.ZooKeeper) . . |
Verify Setup using Windows Command Prompt
Create simple topic using command
1 2 |
C:\..\kafka_2.12-2.4.0\bin\windows>kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Hello Created topic Hello |
Verify created topic by listing topics
1 2 |
C:\...\kafka_2.12-2.4.0\bin\windows>kafka-topics.bat --list --zookeeper localhost:2181 Hello |
Publish/produce message from command prompt
Use this command to publish data to specific topic. Kafka internally store it in file storage directory. Refer Kafka Logs section for more details.
1 2 3 4 |
C:\...\kafka_2.12-2.4.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic Hello >a1 >a2 >Terminate batch job (Y/N)? y |
Consume message from command prompt
Use –from-beginning so that you can read all messages from given topic. Provide topic name using –topic.
1 2 3 4 5 |
C:\...\kafka_2.12-2.4.0\bin\windows>kafka-console-consumer.bat --bootstrap-server localhost:9092 --from-beginning --topic Hello a1 a2 Processed a total of 2 messages Terminate batch job (Y/N)? y |
Kafka Cluster Setup
Setup additional Kafka broker
We can use same installation with different config file to run multiple kafka broker servers on windows. Having multiple brokers as a cluster makes Kafka scalable & highly available. Generally cluster is setup across multiple machines but for simplicity we will setup on same machine.
Keep earlier kafka server running & follow below steps for setting up another server.
- Config – Copy config/server.properties & paste with different name config/server-1.properties. In this file, modify/add below properties. Basically we are creating separate kafka broker with different id & log directory. We will run this kafka broker on different port.
1 2 3 4 5 6 7 |
broker.id=1 port=9093 . . log.dirs=/tmp/kafka-logs-1 . . |
- Start – Now start server with this new config file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
C:\...\kafka_2.12-2.4.0\bin\windows>kafka-server-start.bat ../../config/server-1.properties [2020-03-09 21:24:17,241] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$) [2020-03-09 21:24:17,736] INFO starting (kafka.server.KafkaServer) [2020-03-09 21:24:17,737] INFO Connecting to zookeeper on localhost:2181 (kafka.server.KafkaServer) [2020-03-09 21:24:17,760] INFO [ZooKeeperClient Kafka server] Initializing a new session to localhost:2181. (kafka.zookeeper.ZooKeeperClient) [2020-03-09 21:24:17,784] INFO Client environment:zookeeper.version=3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18 GMT (org.apache.zookeeper.ZooKeeper) [2020-03-09 21:24:17,784] INFO Client environment:host.name=192.168.0.19 (org.apache.zookeeper.ZooKeeper) [2020-03-09 21:24:17,786] INFO Client environment:java.version=1.8.0_121 (org.apache.zookeeper.ZooKeeper) [2020-03-09 21:24:17,789] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper) . . . [2020-03-09 21:24:20,139] INFO Kafka version: 2.4.0 (org.apache.kafka.common.utils.AppInfoParser) [2020-03-09 21:24:20,146] INFO Kafka commitId: 77a89fcf8d7fa018 (org.apache.kafka.common.utils.AppInfoParser) [2020-03-09 21:24:20,151] INFO Kafka startTimeMs: 1583814260107 (org.apache.kafka.common.utils.AppInfoParser) [2020-03-09 21:24:20,162] INFO [KafkaServer id=1] started (kafka.server.KafkaServer) |
Verify cluster
In separate command prompt you can verify that multiple brokers are started.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
C:\...\apache-zookeeper-3.5.7-bin\apache-zookeeper-3.5.7-bin\bin>zkCli.cmd -server localhost:2181 Connecting to localhost:2181 2020-03-09 21:30:19,346 [myid:] - INFO [main:Environment@109] - Client environment:zookeeper.version=3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 02/10/2020 11:30 GMT 2020-03-09 21:30:19,350 [myid:] - INFO [main:Environment@109] - Client environment:host.name=192.168.0.19 2020-03-09 21:30:19,350 [myid:] - INFO [main:Environment@109] - Client environment:java.version=1.8.0_121 2020-03-09 21:30:19,351 [myid:] - INFO [main:Environment@109] - Client environment:java.vendor=Oracle Corporation . . . 2020-03-09 21:30:19,906 [myid:localhost:2181] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1394] - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x100052d1c1d0004, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] ls /brokers/ids [0, 1] [zk: localhost:2181(CONNECTED) 1] |
Create replicated topic
Create topic with replication & partition factor as 2.
1 2 |
C:\...\kafka_2.12-2.4.0\bin\windows>kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 2 --partitions 2 --topic HelloCluster Created topic HelloCluster. |
Publish record to replicated topic
1 2 3 4 |
C:\MSTD\Tools\kafka_2.12-2.4.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic HelloCluster >C1 >C2 >Terminate batch job (Y/N)? y |
Consume record from first Kafka broker (Port: 9092)
1 2 3 4 5 |
C:\MSTD\Tools\kafka_2.12-2.4.0\bin\windows>kafka-console-consumer.bat --bootstrap-server localhost:9092 --from-beginning --topic HelloCluster C2 C1 Processed a total of 2 messages Terminate batch job (Y/N)? y |
Consume record from second Kafka broker (Port: 9093)
1 2 3 4 5 |
C:\MSTD\Tools\kafka_2.12-2.4.0\bin\windows>kafka-console-consumer.bat --bootstrap-server localhost:9093 --from-beginning --topic HelloCluster C2 C1 Processed a total of 2 messages Terminate batch job (Y/N)? y |
As you can see that the produced/published message was replicated to both Kafka brokers & was available in both the brokers.
Verify topics & their replications
Use below command to check topics, their partitions, replications etc. You can see in below output that topic ‘HelloCluster’ has 2 replicas. THe earlier created topic ‘Hello’ has a single copy available.
1 2 3 4 5 6 7 8 |
C:\...\kafka_2.12-2.4.0\bin\windows>kafka-topics.bat --describe --topic HelloCluster --zookeeper localhost:2181 Topic: HelloCluster PartitionCount: 2 ReplicationFactor: 2 Configs: Topic: HelloCluster Partition: 0 Leader: 0 Replicas: 0,1 Isr: 0,1 Topic: HelloCluster Partition: 1 Leader: 1 Replicas: 1,0 Isr: 1,0 C:\...\kafka_2.12-2.4.0\bin\windows>kafka-topics.bat --describe --topic Hello --zookeeper localhost:2181 Topic: Hello PartitionCount: 1 ReplicationFactor: 1 Configs: Topic: Hello Partition: 0 Leader: 0 Replicas: 0 Isr: 0 |
Kafka Logs / Data Storage
In server.properties file under C:\...\kafka_2.12-2.4.0\config , there is property log.dirs=/tmp/kafka-logs . This property indicates where kafka logs i.e. kafka data will be stored. This directory is not for server log statements but it is actual data that is published to topic.
In windows, default kafka log directory or data storage directory is C:\tmp\kafka-logs . Under this directory you can find sub-directories with the names of topics created. Below is files from the topic “Hello”.
1 2 3 4 5 6 7 8 9 10 11 12 |
C:\tmp\kafka-logs>dir /B Hello* Hello-0 HelloCluster-0 HelloCluster-1 C:\tmp\kafka-logs>cd Hello-0 C:\tmp\kafka-logs\Hello-0>dir /B 00000000000000000000.index 00000000000000000000.log 00000000000000000000.timeindex leader-epoch-checkpoint |