No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


FusionInsight HD 6.5.0 Product Description 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).


Basic Concept


Flink is a unified computing framework that supports both batch processing and stream processing. It provides a stream data processing engine that supports data distribution and parallel computing. Flink features stream processing and is a top open-source stream processing engine in the industry.

Flink provides high-concurrency pipeline data processing, millisecond-level latency, and high reliability, making it suitable for low-latency data processing.

Figure 2-12 shows the technology stack of Flink.

Figure 2-12 Technology stack of Flink

The following lists the key features of Flink in the current version:

  • DataStream
  • Checkpoint
  • Window
  • Job Pipeline
  • Configuration Table

For details about other Flink features, see


Figure 2-13 shows the architecture of Flink.

Figure 2-13 Flink architecture

As shown in Figure 2-13, the entire Flink system consists of three parts:

  • Client

    Flink client is used to submit jobs (streaming jobs) to Flink.

  • TaskManager

    TaskManager (also called worker) is a service execution node of Flink. It executes specific tasks. A Flink system could have multiple TaskManagers. These TaskManagers are equivalent to each other.

  • JobManager

    JobManager (also called master) is a management node of Flink. It manages all TaskManagers and schedules tasks submitted by users to specific TaskManagers. In high-availability (HA) mode, multiple JobManagers are deployed. Among these JobManagers, one of which is selected as the leader, and the others are standby.

Flink provides the following features:

  • Low latency

    Millisecond-level processing capability.

  • Exactly once

    Asynchronous snapshot mechanism, ensuring that all data is processed only once.

  • High availability

    Leader/Standby JobManagers, preventing single point of failure (SPOF).

  • Scale out

    Manual scale out supported by TaskManagers.

Working Principle
  • Stream, Transformation, and Operator

    A Flink program consists of two building blocks: stream and transformation.

    1. Conceptually, a stream is a (potentially never-ending) flow of data records, and a transformation is an operation that takes one or more streams as input, and produces one or more output streams as a result.
    2. When a Flink program is executed, it is mapped to a streaming dataflow. A streaming dataflow consists of a group of streams and transformation operators. Each dataflow starts with one or more source operators and ends in one or more sink operators. A dataflow resembles a directed acyclic graph (DAG).

      2 shows the streaming dataflow to which a Flink program is mapped.

      Figure 2-14 Streaming dataflow to which a Flink program is mapped

      As shown in 2, FlinkKafkaConsumer is a source operator; map, keyBy, timeWindow, and apply are transformation operators; RollingSink is a sink operator.

  • Pipeline Dataflow

    Applications in Flink can be executed in parallel or distributed modes. A stream can be divided into one or more stream partitions, and an operator can be divided into multiple operator subtasks.

    The executor of streams and operators are automatically optimized based on the density of upstream and downstream operators.

    • Operators with low density cannot be optimized. Each operator subtask is separately executed in different threads. The number of operator subtasks is the parallelism of that particular operator. The parallelism (the total number of partitions) of a stream is that of its producing operator. Different operators of the same program may have different levels of parallelism.
      Figure 2-15 Operator
    • Operators with high density can be optimized. Flink chains operator subtasks together into a task, that is, an operator chain. Each operator chain is executed by one thread on TaskManager.
      Figure 2-16 Operator chain
      • In the upper part of the preceding figure, the condensed Source and map operator is chained into an Operator Chain, that is, a larger operator. The Operator Chain, keyby, and Sink all represent an operator respectively and are connected with each other through streams. Each operator corresponds to one task during the running. Namely, there are three tasks in the preceding figure.
      • In the lower part of the preceding figure, each task, except Sink, is divided into two subtasks. The parallelism of the Sink operator is one.

HA Solution Description

A Flink cluster has only one JobManager. This has the risks of SPOF. There are three modes of Flink: Flink On YARN mode, Flink Standalone mode, and Flink Local mode. Flink On YARN and Flink Standalone modes are based on clusters and Flink Local mode is based on a single node. Flink On YARN and Flink Standalone provide an HA mechanism. With such a mechanism, you can recover the JobManager from failures and thereby eliminate SPOF risks. This section describes the HA mechanism of the Flink On YARN mode.

Flink supports the HA mode and job exception recovery. The HA mode and job exception recovery depend on ZooKeeper. If you want to enable these two functions, configure ZooKeeper in the flink-conf.yaml file in advance as follows:

high-availability: zookeeper
high-availability.zookeeper.quorum: localhost:2181 
high-availability.storageDir: hdfs:///flink/recovery 
Flink On YARN

Flink JobManager and the YARN ApplicationMaster YARN are in the same process. YARN ResourceManager monitors ApplicationMaster. If ApplicationMaster is abnormal, YARN restarts it and restores all JobManager metadata from HDFS. During the recovery, existing tasks cannot run and new tasks cannot be submitted. ZooKeeper stores JobManager metadata, such as information about jobs, to be used by the new JobManager. The Akka DeathWatch mechanism of the JobManager monitors TaskManagers. When a TaskManager fails, a container is requested again from YARN and a new TaskManager is created.

For more details about the HA mechanism of YARN mode, visit the following website:

Flink Standalone Mode

In Flink standalone mode, multiple JobManagers can be started and ZooKeeper elects one as the leader JobManager. In Flink standalone mode, there is a leader JobManager and multiple standby JobManagers. If the leader JobManager fails, a standby JobManager takes over the leadership. Figure 2-17 shows the process of a leader/standby JobManager switchover.

Figure 2-17 Process of a leader/standby JobManager switchover
Restoring TaskManager

The Akka DeathWatch mechanism of the JobManager monitors TaskManagers. When TaskManager fails, JobManager creates a TaskManager and migrant services to the created TaskManager.

Restoring JobManager

Flink JobManager and the YARN ApplicationMaster YARN are in the same process. YARN ResourceManager monitors ApplicationMaster. If ApplicationMaster is abnormal, YARN restarts it and restores all JobManager metadata from HDFS. During the recovery, existing tasks cannot run and new tasks cannot be submitted.

Restoring Jobs

If you want to restore jobs, ensure that the restart strategy is configured in Flink configuration files. Supported restart strategies are fixed-delay, failure-rate, and none. Jobs can be restored only when the strategy is configured to fixed-delay or failure-rate. If the restart strategy is configured to none and checkpoint is configured for jobs, the restart strategy is automatically configured to fixed-delay and the value of restart-strategy.fixed-delay.attempts (which specifies the number of retry times) is configured to Integer.MAX_VALUE.

For details about the strategies, see Following is an example of the restart strategy configuration:
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 3
restart-strategy.fixed-delay.delay: 10 s
Jobs will be restored in the following scenarios:
  • If the JobManager fails, all its jobs are stopped, and will be recovered after a new JobManager is up.
  • If a TaskManager fails, all tasks on the TaskManager are stopped, and will be started until there are available resources.
  • When a task of a job fails, the job is restarted.

Relationship with Other Components

Flink supports YARN-based cluster management mode. In this mode, Flink serves as an application of YARN and runs on YARN.

Figure 1 How Flink interacts with YARN shows how Flink interacts with YARN.

Figure 2-18 How Flink interacts with YARN
  1. The Flink YARN client first checks whether the requested resources for starting the YARN cluster are available. If yes, the client uploads the jar package and configuration file to the Hadoop distributed file system (HDFS).
  2. The Flink YARN client communicates with the YARN ResourceManager (RM) to request a container for starting the ApplicationMaster (AM). After all NodeManagers of the YARN cluster finish downloading the jar package and configuration file, the AM is started.
  3. During AM startup, the AM interacts with the YARN RM to request the container for starting the TaskManager. After the container is ready, the TaskManager process is started.
  4. In the Flink YARN cluster, the AM and Flink JobManager are running in the same container. The AM informs each TaskManager of the JobManager RPC address. After TaskManagers are started successfully, they register with the JobManager.
  5. After all TaskManagers successfully complete registration with the JobManager, Flink is started in the YARN cluster. Then the Flink YARN client can submit Flink jobs to the JobManager, and Flink can perform mapping, scheduling, and computing for the jobs.
Updated: 2019-05-17

Document ID: EDOC1100074548

Views: 3084

Downloads: 35

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Previous Next