No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


FusionInsight HD 6.5.0 Product Description 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).


Basic Concept


Kafka is a distributed, partitioned, replicated message publishing and subscription system. It provides features similar to the Java Message Service (JMS), but the design is different. Kafka provides features, such as message persistence, high throughput, multi-client support, and real-time processing, and applies to online and offline message consumption. It is ideal for Internet service data collection scenarios, such as conventional data collection, website active tracing, data monitoring, and log collection.


Kafka is a distributed, partitioned, replicated message publishing and subscription system. Producers publish data to Topics, and Consumers subscribe to the Topics and consume messages. Each server in a Kafka cluster is a Broker. For each Topic, the Kafka cluster maintains Partitions for scalability, parallelism, and fault tolerance. Each partition is an ordered, immutable sequence of messages that is continually appended to a commit log. Each message in a Partition is assigned a sequential ID, which is called Offset.

Figure 2-50 Kafka architecture
Table 2-13 Kafka modules




A Broker is a server in a Kafka cluster.


A Topic is a category or feed name, to which messages are published. A Topic can be divided into multiple Partitions, which can act as a parallel unit.


A Partition is an ordered, immutable sequence of messages that is continually appended to a commit log. Each message in a Partition is assigned a sequential ID, which is called Offset. The Offset uniquely identifies each message in the Partition.


Producers publish messages to a Kafka topic.


Consumers subscribe to Topics and process the feed of published messages.

Figure 2-51 shows the relationships between the modules.

Figure 2-51 Relationships between Kafka modules

Consumers label themselves with a Consumer group name, and each message published to a topic is delivered to one Consumer instance within each subscribing Consumer group. If all the Consumer instances belong to the same Consumer group, loads are evenly distributed among the Consumers. For example, as shown in Figure 2-51, Consumer1 and Consumer2 work in load-sharing mode; Consumer3, Consumer4, Consumer5, and Consumer6 work in load-sharing mode. If all the Consumer instances belong to different Consumer groups, messages are broadcast to all Consumers. As shown in Figure 2-51, the messages in Topic 1 are broadcast to all Consumers in Consumer Group1 and Consumer Group2.

  • Message reliability

    When a Kafka Broker receives a message, it stores the message on a disk persistently. Each Partition of a Topic has multiple replicas stored on different Broker nodes. If one node is faulty, the replicas on other nodes can be used.

  • High throughput

    Kafka provides high throughput in the following ways:

    • Messages are written into disks instead of being cached in the memory.
    • The use of Zero-copy eliminates I/O operations.
    • Data is sent in batches, improving network utilization.
    • Each Topic is divided in to multiple Partitions, which increases concurrent processing. Concurrent read and write operations can be performed between multiple Producers and Consumers. Producers send messages to specified Partitions based on the algorithm used.
  • Message subscribe-notify mechanism

    Consumers subscribe to interested Topics and consume data in pull mode. Consumers can choose the consumption mode and control the message pulling speed based on actual situation. Consumers need to maintain the consumption records by themselves.

  • Scalability

    When Broker nodes are added to expand the Kafka cluster capacity, the newly added Brokers register with ZooKeeper. After the registration is successful, procedures and Consumers can sense the change in a timely manner and make related adjustment.

Open-source Features
  • Reliability

    Message processing methods such as At-Least Once, At-Most Once, and Exactly Once are provided. The message processing status is maintained by Consumers. Kafka needs to work with the application layer to implement the Exactly Once message processing method.

  • High Throughput

    High throughput is provided for message publishing and subscription.

  • Persistence

    Messages are stored on disks in persistence mode and can be used for batch consumption and real-time application programs. Data persistence and replication prevent data loss.

  • Distribution

    A distributed system is easy to be expanded externally. All Producers, Brokers, and Consumers support the deployment of multiple distributed clusters. Systems can be expanded without stopping the running of software or shutting down the machines.

Relationship with Other Components

As a message publishing and subscription system, Kafka provides high-speed data transmission methods for data transmission between different subsystems of the FusionInsight HD platform. Kafka can receive external messages in a real-time manner and provides the messages to the online and offline services for processing. The following figure shows the relationship between Kafka and other components.

Figure 2-52 Relationship with other components
Updated: 2019-05-17

Document ID: EDOC1100074548

Views: 3045

Downloads: 35

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Previous Next