No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

OceanStor 9000 V300R005C00 File System Feature Guide 11

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Overview

Overview

This section describes the background, definition, and benefits of the Hadoop Distributed File System (HDFS) feature.

Background

HDFS is one major component of Hadoop and applies to storage of large-scale data sets. Currently, HDFS faces the following problems:

  • Low capacity utilization and high cost

    HDFS adopts the duplicate mechanism (three duplicates in general) to store data. 10 PB service data is used an example. When the three-duplicate mechanism is used, the amount of data stored to disks increases to 30 PB but the capacity utilization reduces to 33%. The actual storage costs in hardware procurement, equipment room occupation, and energy consumption double the effective storage costs and will constantly increase as the amount of data grows.

  • Low hardware error tolerance

    The three-duplicate mechanism allows a maximum of two failed nodes. The open-source HDFS allows a maximum of one failed node.

HDFS does not support remote replication, thereby unable to provide service-level error tolerance and remote disaster recovery (DR) capabilities. HDFS also does not support value-added services such as tiered storage, quota management, virus scanning, and NDMP backup, which brings troubles during use and management of massive amount of data.

OceanStor 9000 introduces the HDFS feature to resolve the preceding problems.

Definition

The OceanStor 9000 HDFS feature is also named the HDFS interface feature. Huawei HDFS Plugin deployed on Hadoop nodes and the Hadoop client can convert the HDFS-based access requests to NFS-based access requests. In this way, Hadoop service data can be directly stored to OceanStor 9000. Figure 14-1 shows the HDFS feature.
Figure 14-1  HDFS feature

Benefits

Table 14-1 describes the benefits of the HDFS feature.

Table 14-1  Benefits of the HDFS feature

Benefit

Description

Significantly increased capacity utilization and reduced total cost of ownership (TCO)

Capacity utilization increases to over 60% and up to 95%.

Increased hardware error tolerance capability

Data nodes and metadata nodes are not distinguished from each other. A maximum of 4 failed nodes is allowed (specifically, N+4 is adopted with capacity utilization ranging from 60% to 80%).

Professional storage value-added services for Hadoop service data and maximized data value

Hadoop service data stored in OceanStor 9000 is accessible to external NFS/CIFS/FTP clients via basic or value-added functions provided by OceanStor 9000, such as remote replication, snapshot, tiered storage, quota management, virus scanning, and NDMP backup. When analysis data is migrated to the Hadoop cluster, you can select an NFS/CIFS/FTP/HDFS interface based on your needs.

Translation
Download
Updated: 2019-03-30

Document ID: EDOC1000101823

Views: 15496

Downloads: 97

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next