No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Replacing Faulty Data Disks in the Big Data HD Cluster in Batches

Publication Date:  2019-04-25 Views:  14 Downloads:  0
Issue Description

This solution can be used when a batch of faulty disks needs to be replaced in the deployed big data cluster. This solution applies only to data nodes configured with the HBase, Hive, Spark, or Kafka service instances.

Handling Process

The physical hardware RAID solution can be used to replace disks in batches. If a disk is configured with RAID5 (for example, the Kafka service), directly remove a disk from the disk group and perform the synchronization. Repeat the operations repeatedly. No further actions are required on the big data software layer.

If the disk is configured with RAID0 or no RAID mode, perform decommission on nodes that need to be replaced. Then replace the faulty disks in batches, and add the nodes to the cluster.

If multiple instance services are installed on a node, perform decommission according to the following operations. Take the HBase node, on which three service instances including RegionServer, NodeManager, and DataNode exist as an example.

1. Stop the management process first. For example, stop the RegionServer service instance on the HBase node. If the service is not stopped, stop it in batches.

2. Perform decommission on the NodeManager. Check whether a task is running on the Yarn WebUI. If a task is running, stop the process instances by cabinet in batches. If no service is running, stop the service instances in batches.

3. Perform decommission on the DataNode. You are advised to perform decommission of the role instance in batches by cabinet to ensure that data replicas are not lost.

Service decommissioning can be performed to adjust cluster parameters and to optimize the decommission operation. The service needs to be restarted when modifying the parameters. The following table shows that the parameters that need to be modified.

Component

Instance

Parameter Type

Parameter Name

Default Value

New value

Meaning

Scenario

Whether the Default Value Can Be Adjusted

HDFS

NameNode

balance/ decommission performance parameters

dfs.datanode.balance.bandwidthPerSec

20971520

209715200

Maximum bandwidth of each DataNode for load balancing (number of bytes per second)

balance – performance optimization

You are advised to retain the default value.

HDFS

DataNode

balance/ decommission performance parameters

dfs.datanode.balance.max.concurrent.moves

5

30

Maximum number of threads for load balancing on the DataNodes

 

You are advised to modify the default value.

HDFS

NameNode

balance/ decommission performance parameters

dfs.namenode.replication.max-streams

10

64

Maximum number of replication threads on the DataNodes

 

The default value has been changed to 64 in C70 and needs to be further adjusted.

HDFS

NameNode

balance/ out-of-service performance parameters

dfs.namenode.replication.max-streams-hard-limit

20

500

Hard limit on the number of replication threads on a DataNode

 

The default value has been changed to 128 in C70 and needs to be further adjusted.

HDFS

NameNode

balance/ decommission performance parameters

dfs.namenode.replication.work.multiplier.per.iteration

10

500

Advanced attributes. Exercise caution when modifying this parameter. Total number of blocks that are concurrently transmitted for replication on the DataNode when the NameNode sends a command list through the DataNode heartbeat.

 

 

HDFS

NameNode

running - performance optimization

dfs.namenode.handler.count

64

192

Number of NameNode processing threads

Large cluster and performance optimization

The default value can be adjusted but the memory usage will be increased.

HDFS

DataNode

running - performance optimization

dfs.datanode.handler.count

8

24

Number of NameNode processing threads

Large cluster and performance optimization

The default value can be adjusted but the memory usage will be increased.

HDFS

NameNode

running - performance optimization

ipc.server.read.threadpool.size

1

10

Size of the thread pool for processing NameNode requests

Large cluster and performance optimization

The default value can be adjusted but the memory usage will be increased.

HDFS

DataNode

running - performance optimization

dfs.datanode.max.transfer.threads

4096

8192

Maximum number of threads used to transfer data to the DataNodes

Cluster with high load and performance optimization

The default value has been changed in C70.



                                 




END