No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


FusionInsight HD V100R002C60SPC200 Product Description 06

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).


Basic Concept


The Hadoop Distributed File System (HDFS) supports high-throughput data access and applies to processing of large data sets.


The HDFS consists of active and standby NameNodes and multiple DataNodes, as shown in Figure 2-16.

The HDFS works in master/slave mode. NameNodes run on the master node, and DataNodes run on the slave nodes. ZKFC should run along with the NameNodes.

The NameNodes and DataNodes communicate with each other using Transmission Control Protocol (TCP)/Internet Protocol (IP). The NameNode, DataNode, ZKFC and JournalNode can be deployed on Linux servers.

Figure 2-16 HA HDFS Architecture

Table 2-7 describes the functions of each module shown in Figure 2-16.

Table 2-7 HDFS modules




Manages the namespace, directory structure, and metadata of file systems, and provides a backup mechanism.

  • Active NameNode: manages the namespace, directory structure, and metadata of file systems, and records the mapping relationships between data blocks and files to which the data blocks belong.
  • Standby NameNode: synchronizes with active NameNode data, and takes over services from the Active NameNode if the Active NameNode is faulty.


Stores data blocks of each file and periodically reports stored data blocks to the NameNode.


Synchronizes metadata between the active and standby NameNodes in the High Availability (HA) cluster.


ZKFC must be deployed for each NameNode. It is responsible for monitoring NameNode status and writing status information to the ZooKeeper. ZKFC also has permission to select the active NameNode.

ZK Cluster

ZooKeeper Cluster is a co-ordination service which helps the ZKFC to perform leader election.

  • HDFS principle

    Figure 2-17 shows how the HDFS works.

    Figure 2-17 HDFS principle

    In the HDFS, a file is divided into one or multiple data blocks that are stored in DataNodes. The NameNodes store and manage all metadata of the HDFS. The client communicates with the active NameNode to execute namespace operations on a file system, such as opening, closing, and renaming files and folders, and determine the mapping relationships between data blocks and DataNodes. Instructed by the active NameNode, the DataNodes create, delete, and replicate data blocks. The client communicates with the DataNodes to read and write block data.

  • HDFS HA architecture

    HA is used to resolve the single point failure problem of NameNode. This feature provides a standby NameNode in hot standby mode for the active NameNode. When the active NameNode is faulty, the service can be quickly switched to the standby NameNode to continuously provide services for external systems.

    In a typical HDFS HA scenario, there are usually two NameNodes. One is in the Active state, and the other is in Standby state.

    A shared storage system is required to support metadata of the Active and Standby NameNodes. This version provides Quorum Journal Manager (QJM) HA solution, as shown in Figure 2-18. A group of JournalNodes are used to synchronize metadata between the active and standby NameNodes.

    Generally, an odd number (2N+1) of JournalNodes are configured. At least three JournalNodes are required. For one metadata update message, data writing is considered successful as long as data writing is successful on N+1 JournalNodes. In this case, data writing failure of a maximum of N JournalNodes is tolerable. For example, when there are three JournalNodes, data writing failure of one JournalNode is tolerable; when there are five JournalNodes, data writing failure of two JournalNodes is tolerable.

    JournalNode is a lightweight daemon process and share a host with other services of Hadoop. You are advised to deploy the JournalNode on the management node to prevent data writing failure on the JournalNode during transmission of a massive data.

    Figure 2-18 HDFS architecture based on QJM

HA Overview


In the version earlier than Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine. This affected the overall availability of the HDFS cluster in two major ways:

(1) In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.

(2) Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in windows of cluster downtime.

The HDFS High Availability (HA) feature addresses the preceding problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows fast failover to a new NameNode in the case that a machine crashes, or graceful administrator-initiated failover for the purpose of planned maintenance, ensuring the availability of the cluster during maintenance.

Figure 2-19 Typical HA deployment

In a typical HA cluster shown in Figure 2-19, two separate machines are configured as NameNodes. At any time point, only one of the NameNodes is in Active state, and the other is in Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining synchronization with the Active node to provide fast failover if necessary.

To keep the data synchronized with each other, both nodes communicate with a group of separate daemons called "JournalNode" (JN). When any file system's metadata modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. For example, if there are three JournalNode, then the log will be saved on two of them at least. The Standby node is capable of reading the edits from the JNs, and is constantly watching them for changes to the edit log. As the Standby node detects the edits, it applies them to its own namespace. In the event of failover, the Standby node will ensure that it has read all of the editlog from the JounalNode before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before failover occurs.

To provide fast failover, the Standby node also needs to have up-to-date information regarding the location of blocks in the cluster. To this end, the DataNodes are configured with the location of both NameNodes, and send block location information and heartbeats to both.

It is vital for the correct operation of an HA cluster that only one of the NameNodes be Active at a time. Otherwise, the namespace state would quickly diverge between the two, risking data loss or other incorrect results. To ensure this property and prevent the so-called "split-brain scenario", the JournalNodes will only ever allow a single NameNode to be a writer at a time. During failover, the NameNode which is to become active will simply take over the role of writing to the JournalNodes, which will effectively prevent the other NameNode from continuing in the Active state, allowing the new Active node to safely proceed with failover.

For more information about the HDFS HA, see content at:

Relationship with Other Components

Relationship Between HDFS and HBase

HDFS is the subproject of Apache Hadoop. HBase uses the Hadoop Distributed File System (HDFS) as the file storage system. HBase is located in structured storage layer. The HDFS provides highly reliable support for lower-layer storage of HBase. All the data files of HBase can be stored in the HDFS, except some log files generated by HBase.

Relationship Between MapReduce and HDFS
  • With high fault tolerance, the Hadoop Distributed File System (HDFS) can be deployed on inexpensive hardware. It provides high throughout to access application programs with huge data sets.
  • MapReduce is a programming model used for parallel computation of large data sets (larger than 1 TB). Data computed by MapReduce can come from multiple data sources, such as Local FileSystem, HDFS, and databases. Most data computed by MapReduce comes from the HDFS. The high throughput of HDFS can be used to read massive data. After being computed, data can be stored in the HDFS.
Relationship Between Spark and HDFS

Data computed by Spark can come from multiple data sources, such as Local files and HDFS. Most data computed by Spark comes from the HDFS. The HDFS can read data in large scale for parallel computation. After being computed, data can be stored in the HDFS.

Spark involves Driver and Executor. Driver schedules tasks and Executor runs tasks.

Figure 2-20 describes the file reading process.

Figure 2-20 File reading process
The file reading process is as follows:
  1. Driver interconnects with the HDFS to obtain the information of File A.
  2. The HDFS returns the detailed block information about this file.
  3. Driver sets a parallel degree based on the block data amount, and creates multiple tasks to read the blocks of this file.
  4. Executor runs the tasks and reads the detailed blocks as part of the Resilient Distributed Dataset (RDD).

Figure 2-21 describes the file writing process.

Figure 2-21 File writing process
The file writing process is as follows:
  1. Driver creates a directory where the file is to be written.
  2. Based on the RDD distribution status, the number of tasks related to data writing is computed, and these tasks are sent to Executor.
  3. Executor runs these tasks, and writes the RDD data to the directory created in step 1.
Relationship Between SmallFS and HDFS

As a file system constructed at the upper layer of HDFS, SmallFS can merge small files on HDFS to reduce the impact of excessive small files on HDFS and provide customers with a transparent small file operation interface.

Numerous small files on HDFS are merged as input of SmallFS into large files, which are then output to HDFS.

Relationship Between ZooKeeper and HDFS

Figure 2-22 describes the relationship between ZooKeeper and HDFS.

Figure 2-22 Relationship between ZooKeeper and HDFS

As the client of a ZooKeeper cluster, ZKFailoverController (ZKFC) monitors the state information of NameNode. The ZKFC process is deployed only in the node where NameNode is deployed. The ZKFC process is deployed in both the active node and standby node of HDFS NameNode.

  1. The ZKFC of HDFS NameNode connects to ZooKeeper and save information such as host names to ZooKeeper, under the znode directory say /hadoop. NameNode that creates the znode first is considered as the active node, and the other is the standby node. HDFS NameNode Standby watches the active NameNode information through the ZooKeeper watcher.
  2. When the process of the active node is ended abnormally, HDFS NameNode Standby detects that some changes occur in the /hadoop znode directory using ZooKeeper, and then NameNode performs the active/standby switchover.
Updated: 2019-04-10

Document ID: EDOC1000104139

Views: 6009

Downloads: 64

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Previous Next