No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

OceanStor 9000 V300R006C00 File System Feature Guide 12

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Working Principle

Working Principle

This section analyzes the interface conversion mechanism between the HDFS and NFS interfaces and describes the working principle of the HDFS feature.

Introduction to HDFS Plugin

HDFS Plugin is used to convert HDFS protocol-based access requests to NFS protocol-based access requests.

Positioning

Figure 14-2 shows the position of HDFS Plugin in the Hadoop software architecture after HDFS Plugin is interconnected with OceanStor 9000.

Figure 14-2  Hadoop architecture including HDFS Plugin
Function

As shown in Figure 14-3, by inheriting the FileSystem-class codes and the AbstractFileSystem-class codes provided by the open-source HDFS, HDFS Plugin provides file access interfaces, as shown in Table 14-3.

NOTE:

For details on the the FileSystem-class codes and the AbstractFileSystem-class codes provided by the open-source HDFS, visit http://hadoop.apache.org/.

Figure 14-3  HDFS Plugin principle
Table 14-3  Interfaces supported by HDFS Plugin

Class

Function Name

Description

FileSystem

initialize

Initializes HDFS Plugin.

getFileBlockLocations

Obtains file offset information.

append

Appends data to a file.

create

Creates a file.

delete

Deletes a file or folder.

getFileStatus

Obtains file information.

listStatus

Lists file information.

open

Open a file.

rename

Renames a file or folder.

mkdirs

Creates a folder.

setOwner

Sets an owner or owning group for a file or folder.

setPermission

Sets permission for a file or folder.

AbstractFileSystem

createSymlink

Creates a soft link.

getFileLinkStatus

Obtains the file information to which a specific soft link corresponds.

File Read/Write Process

This section describes the file read/write process after HDFS Plugin is deployed.

Figure 14-4 shows the file read/write process.

Figure 14-4  File read/write process

Table 14-4 describes the process.

Table 14-4  File read/write process

No.

Process

Remarks

1

After an application on a Hadoop node sends a file read/write request or an operator executes HDFS Shell commands on the Hadoop client, the local HDFS Plugin receives the request and converts it to an NFS protocol-based request.

-

2

HDFS Plugin sends the NFS protocol-based request over the front-end service network to a storage node of OceanStor 9000.

OceanStor 9000 selects an appropriate node based on the load balancing policy of the InfoEqualizer feature.

3

The storage node selects a disk group in a storage node group based on the protection level and reads data from or writes data into the disk group.

For details on the data protection levels and data read/write of OceanStor 9000, see Erasure Code (N+M Protection) in the OceanStor 9000 File System Administrator Guide.

Comparison Between Data Storage and Task Scheduling Mechanisms

This section analyzes the differences between data storage and task scheduling mechanisms before and after Hadoop is interconnected with OceanStor 9000.

Before Hadoop is interconnected with OceanStor 9000, data is stored using the duplicate mechanism, as shown in Figure 14-5.

Figure 14-5  HDFS duplicate mechanism

After Hadoop is interconnected with OceanStor 9000, data is stored in N+M mode using the Erasure Code mechanism, as shown in Figure 14-6.

Figure 14-6  Erasure Code mechanism

Table 14-5 compares data storage and task scheduling mechanisms.

Table 14-5  Comparison Between Data Storage and Task Scheduling Mechanisms

Item

Before Interconnecting with OceanStor 9000

After Interconnecting with OceanStor 9000

Data block storage mechanism

Files are stored on multiple datanodes in accordance with the defined block size and duplicate mechanism.

By default, the data block size is 64 MB. Generally, the value ranges from 64 MB to 128 MB. The three-duplicate mechanism is used by default.

A file can be split into strips. The matrix is used to calculate out several check strips. The strips are stored on multiple nodes.

Strip size can be set to 512 KB, 256 KB, 128 KB, 32 KB, or 16 KB. The protection level can be set to N+1, N+2, N+3, N+4, N+2:1, or N+3:1.
  • If the average file size is equal to or smaller than 64 KB, set the stripe size to 16 KB.
  • If the average file size is larger than 64 KB and equal to or smaller than 256 KB, set the stripe size to 32 KB.
  • If the average file size is larger than 256 KB and equal to or smaller than 2048 KB, set the stripe size to 128 KB.
  • If the average file size is larger than 2048 KB and equal to or smaller than 4096 KB, set the stripe size to 256 KB.
  • If the average file size is larger than 4096 KB, set the strip size to 512 KB.
  • If the average file size is larger than 8192 KB, set the strip size to 1024 KB.

Task scheduling policy

The local priority and proximity principles are adopted. Tasks will be fulfilled by the most nearby node, minimizing data migration on the network.

Computing and storage are separated physically. Nodes in a storage cluster are symmetrically deployed. The overheads of the paths from computing nodes to storage nodes are the same.

Translation
Download
Updated: 2019-06-27

Document ID: EDOC1000122519

Views: 79323

Downloads: 147

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next