No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD V100R002C60SPC200 Product Description 06

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Flume

Flume

Basic Features

Function

Flume is a distributed, highly reliable, and HA massive log aggregation system. Flume supports customized data transmitters for collecting data. Flume also roughly processes data and writes data to customizable data receivers. Flume has the following features:

  • Functions as a distributed framework for event stream data collection and aggregation.
  • Logs data.
  • Adopts the ad-hoc scheme (multi-hop, no central control node).
  • Supports declarative configuration and dynamic configuration update.
  • Provides the context routing function.
  • Supports load balancing and fault switchover.
  • Supports complete extensibility.
Structure

Flume Agent consists of Source, Channel, and Sink, as shown in Figure 2-57. Table 2-15 describes these modules.

Figure 2-57 Flume structure diagram 1
Table 2-15 Module description

Name

Description

Source

A Source receives data or generates data by a special mechanism, and place the data in batches in one or more Channels. There are two types of Source, data-driven and polling.

Typical Source types are as follows:

  • Sources that are integrated with the system: Syslog and Netcat
  • Sources that automatically generates events: Exec and SEQ
  • IPC Source that is used in the communication between Agents: Avro

    A Source must be associated with at least a Channel.

Channel

Channel locates between Source and Sink. A Channel caches received data and sends the data successfully processed by Sink to the next hop Channel or final destination. The data is then removed from the Channel.

Different Channel provide different persistence levels.

  • Memory Channel: Non-persistency
  • File Channel: Write-Ahead Logging (WAL)-based persistence
  • JDBC Channel: Database-embedded persistence

Channel supports transactions, provides weak sequence guarantee, and can work with any number of Sources and Sinks.

Sink

Sink is responsible for sending data to the next hop or final destination and removing the data from the Channel after successfully sending the data.

Typical Sink types are as follows:

  • Sinks that send storage data to the final destination, such as: HDFS and HBase
  • Sinks that are consumed automatically, such as: Null Sink
  • IPC sink that is used for communication between Agents: Avro

A Sink must function with a specific Channel.

Flume can be configured to multiple Sources, Channels, and Sinks, as shown in Figure 2-58.

Figure 2-58 Flume structure diagram 2

Flume also supports cascading of multiple Flume Agents, as shown in Figure 2-59.

Figure 2-59 Flume cascading
Principle

Reliability between Agents

Figure 2-60 shows the data exchange between Agents

Figure 2-60 Agent data transmission process

  1. Flume ensures reliable data transmission based on transactions. When data flows from one Agent to another Agent, the two transactions take effect. Sink of Agent 1 (Agent that sends a message) needs to obtain a message from a Channel and sends the message to Agent 2 (Agent that receives the message). If Agent 2 receives and successfully processes the message, Agent 1 will submit a transaction. This indicates a successful and reliable data transmission.
  2. When Agent 2 receives the message sent by Agent 1 and starts a new transaction, after the data is processed successfully (written to a Channel), Agent 2 submits the transaction and sends a success response to Agent 1.
  3. Before a commit operation, if the data transmission fails, the last transaction starts and retransmits the data that fails to be transmitted last time. The commit operation has written the transaction into a disk. Therefore, the last transaction can continue after the process fails and restores.

Relationship with Components

Relationship with HDFS

If the HDFS is configured as the Flume Sink, HDFS functions as the final data storage system of Flume. Flume installs, configures, and writes all transmitted data into HDFS.

Relationship with HBase

If the HBase is configured as the Flume Sink, HBase functions as the final data storage system of Flume. Flume installs, configures, and writes all transmitted data into HBase.

Translation
Download
Updated: 2019-04-10

Document ID: EDOC1000104139

Views: 6038

Downloads: 64

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next