Introduction

former_member638919 · ‎2019 Nov 07

This blog post will give a brief understanding of basic concepts in Apache Kafka.

Introduction

In simple terms, Apache Kafka is designed for distributed high throughput systems. It tends to work very well as a replacement for a more traditional message broker. In comparison to other messaging systems, Kafka has better throughput, built-in partitioning, replication and inherent fault-tolerance, which makes it a good fit for large-scale message processing applications.

Next, I'll introduce some basic concepts about it.

Fundamentals

Broker

A Kafka server.

Message

Information that is sent from the producer to a consumer through Kafka.

Producer

Application that sends the messages.

Consumer

Application that receives the messages.

Kafka cluster

Consists of one or more servers (brokers).

Topic

A topic is a category name to which message are stored and published. All Kafka messages are organized into topics. Producer applications write data to topics, consumer applications read from topics.

Messages published to the cluster will stay in the cluster until retention period has passed.

Topic partition

Topic are divided into several partitions, which allow to split data across multiple brokers.

Each partition is an ordered, immutable sequence of messages that is continually appended to. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message with in the partition, that looks like this:

Producers publish data to the topics of their choice, there has three ways of deciding which partition the published message belongs to:

1. Specify Partition id

2. Semantic partition function (e.g: key % partition numbers)

3. Round-robin to balance load

Consumer group

Consumers can join a group called a consumer group. A consumer group includes the set of consumers processed that are subscribing to a specific topic. Each consumer in the group is assigned a set of partitions to consumer from. They will receive messages from a different subset of partitions in the topic. Kafka guarantees that a message is only read by a single consumer in the group.

There are 3 possible scenarios for the relationship between number of partitions and number of consumers:

1. Number of consumers is same as number of topic partitions, then the mapping as below:

2. Number of consumers is less than number of topic partitions, then multiple partitions can be assigned to one of consumer in the group. That looks like that:

3. Number of consumers is greater than number of topic partitions, then partition and consumer mapping can be as below, as you see, ‘Consumer 3’ is idle.

Conclusion

Ok, the above is brief introduction for basic concepts of the Apache Kafka, hope it can help you gain some understanding about Apache Kafka.

Thanks for your reading.

Apache kafka for beginners

Introduction

Fundamentals

Broker

Message

Producer

Consumer

Kafka cluster

Topic

Topic partition

1. Specify Partition id

2. Semantic partition function (e.g: key % partition numbers)

3. Round-robin to balance load

Consumer group

1. Number of consumers is same as number of topic partitions, then the mapping as below:

2. Number of consumers is less than number of topic partitions, then multiple partitions can be assigned to one of consumer in the group. That looks like that:

3. Number of consumers is greater than number of topic partitions, then partition and consumer mapping can be as below, as you see, ‘Consumer 3’ is idle.

Conclusion

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win