No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD V100R002C60SPC200 Product Description 06

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
SmallFS

SmallFS

Basic Concept

Overview

In the HDFS, NameNode is responsible for managing meta data of the whole file system. Since NameNode meta data is managed by using the memory, many small files generated during use of product services rapidly consume the memory of NameNode and slow NameNode running, dragging down cluster efficiency.

To adapt to this service scenario, the SmallFS service for merging small files is introduced to Huawei big data so that massive small files can be merged, deleted, or cleaned up periodically to reduce file quantity and relieve NameNode meta data management pressure.

Architecture

The FGCService is the main process of the SmallFS service.

As shown in Figure 2-23, a cluster has two FGCService processes in the HA mode. The active FGCService serves the main process, scheduling Merge, Delete, and Cleanup tasks. The standby FGCService performs checkpoint operations on meta data, and saves data in LevelDB. The SmallFS Client accesses the FGCService to operate small files by invoking the RPC interface.

Tasks scheduled at the backend are described as follows:

  • Merge: FGCService periodically merges small files in the HDFS to a big file by using MapReduce and store such file to the HDFS to store meta data to LevelDB, and delete small files.
  • Delete: FGCService periodically reads meta data in LevelDB by using MapReduce, cleans up the large file, and deletes the contents of small files from the large file.
  • Cleanup: When FGCService stops or exits unexpectedly, files involving task failure will be merged or deleted.
Figure 2-23 SmallFS Architecture
Table 2-8 Module description

Module Name

Description

SmallFS Client

SmallFS client.

FGCService

SmallFS main process. In the HA mode, active and standby nodes are configured. In case that a node fails, an active/standby switching will be performed immediately, ensuring SmallFS services.

LevelDB

Used for managing SmallFS meta data. The cluster ensures data consistency in the active and standby nodes through HDFS Editlog.

Editlog

Records operations performed by Active FGCService to meta data.

HA Overview

Background

In FGC HA mode, two redundant FGCServices run on two separate machines as Active and Standby node. This allows for a failover option in case there is a failure due to machine, network, or process failure.

Implementation

In FGC HA mode, two FGCServices will run in two separate nodes. One FGC node will be active state and the other in the standby state. The active FGC node handles all the client operations like merge, delete, and cleanup in the cluster; while, the standby node acts as a slave, maintaining enough state to provide a quick failover if necessary.

When any namespace modification is performed by the active node, it logs a record of the modification to an edit log file stored in the HDFS directory. The standby node is constantly watching this directory for edits, and as it sees the edits, it applies them to its own namespace. In the event of a failover, the standby will ensure that it has read all of the edits from the HDFS storage before promoting itself to the active state. This ensures that the namespace state is fully synchronized before a failover occurs.

To provide failover support, FGCservices uses following Hadoop components:

1. Zookeeper

2. Job History Server

Figure 2-24 FGC HA Service

Relationship with Other Components

Relationship Between SmallFS and HDFS

As a file system constructed at the upper layer of HDFS, SmallFS can merge small files on HDFS to reduce the impact of excessive small files on HDFS and provide customers with a transparent small file operation interface.

Numerous small files on HDFS are merged as input of SmallFS into large files, which are then output to HDFS.

Relationship Between SmallFS and YARN

SmallFS will periodically run merge, delete, and cleanup tasks. Suck tasks are MapReduce tasks running on YARN to merge, delete, and clean up data in the HDFS.

Relationship Between SmallFS and ZooKeeper

FGCService is deployed in high availability (HA) mode. The HA mode is used to ensure service availability when a single-node fault occurs. FGCService depends on ZooKeeper to support the HA mode.

Figure 2-25 Relationship between ZooKeeper and ZooKeeper
Translation
Download
Updated: 2019-04-10

Document ID: EDOC1000104139

Views: 6037

Downloads: 64

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next