No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD 6.5.0 Administrator Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Instance Maintenance

Instance Maintenance

Starting, Stopping, or Restarting an Instance

Scenarios

You can perform the following maintenance operations on one or more instances on FusionInsight Manager:

  • Start a specified instance in the cluster. You can start a role instance in the Not Started, Stop Failed, or Startup Failed state to use the role instance.
  • Stop a specified instance in the cluster. You can stop a role instance that is no longer used or is abnormal.
  • Restart a specified instance in the cluster. You can restart an abnormal role instance to restore it.

You can perform such operations on the instance management page or an instance details page. The following procedure uses the instance management page as an example.

Procedure
  1. Log in to FusionInsight Manager.
  2. Choose Cluster > Service.
  3. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  4. Select the specified instance and click Start, or select Stop or Restart from the More drop-down list to perform the corresponding operations.

Performing a Rolling Restart of an Instance

Scenarios

Rolling restart means to restart a cluster without interrupting services after the service role is patched or the configuration is modified in the cluster.

To restart multiple instances without service interruption after modifying the instance configuration, perform the rolling restart operation.

NOTE:
  • Some services do not support a rolling restart. You cannot perform a rolling restart on FusionInsight Manager for these services.
  • For configurations that must take effect immediately, for example, configuration of the monitoring port for a server, a rolling restart is not recommended. Perform a common restart instead.
Impact on the System

Compared with a common restart, a rolling restart does not interrupt services, but it takes longer time than a common restart and may affect throughput and performance of the service to be restarted.

Procedure
  1. Log in to FusionInsight Manager.
  2. Choose Cluster > Service.
  3. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  4. Select the specified instances and choose More > Perform Rolling Restart.
  5. In the displayed dialog box, enter the password of the current login user and click OK.
  6. Set the parameters as required, as shown in Table 3-6.

    Table 3-6 Rolling restart parameters

    Parameter

    Description

    Restart only expired instances

    Specifies whether to restart only the modified instances in a cluster.

    Enable Rack Strategy

    Specifies whether to enable the concurrent rolling restart of rack strategy. This option takes effect for roles that meet the rolling restart requirements of the rack strategy. (The roles support the rack-aware function, and instances of the roles belong to two or more racks).

    NOTE:

    This parameter can be set only when a rolling restart is performed on HDFS or YARN.

    Data Nodes to Be Batch Restarted

    Specifies the number of instances that are restarted for each batch when the batch rolling restart strategy is used. The default value is 1.

    NOTE:
    • This parameter is valid only when the batch rolling restart strategy is used and the instance is the DataNode.
    • When the rack strategy is enabled, this parameter is invalid. In this case, the cluster uses the default maximum number of instances (20) configured in the rack strategy as the maximum number of instances that are concurrently restarted in a rack.
    • This parameter can be set only when a rolling restart is performed on HDFS, YARN, Kafka, Storm, or Flume.
    • This parameter for the RegionServer of HBase cannot be manually configured. Instead, it is automatically adjusted based on the number of RegionServer nodes. Specifically, if the number of RegionServer nodes is less than 30, the parameter value is 1. If the number is greater than or equal to 30 and less than 300, the parameter value is 2. If the number is greater than or equal to 300, the parameter value is 1% of the number (rounded-down).

    Batch Interval

    Specifies the interval between two batches of instances to be rolling restarted. The default value is 0.

    Decommissioning Timeout Interval

    Specifies the decommissioning timeout interval for role instances during a rolling restart. The default value is 1800s.

    Some roles (such as HiveServer and JDBCServer) stop providing services before the rolling restart. Stopped instances cannot establish new connections. Existing connections will be completed after a period of time. A proper configuration of the timeout parameters can minimize the risk of service interruption.

    NOTE:

    This parameter can be set only when a rolling restart is performed for Hive, Spark and Spark2x.

    Batch Fault Tolerance Threshold

    Specifies the tolerance times when the rolling restart of instances fails to be executed in batches. The default value is 0, which indicates that the rolling restart task ends after any batch of instances fails to be restarted.

    NOTE:

    Set advanced parameters, such as Data Nodes to Be Batch Restarted, Batch Interval, and Batch Fault Tolerance Threshold based on site requirements. Otherwise, services may be interrupted or the performance may be severely affected. Therefore, exercise caution when performing this operation.

    The following shows an example:

    • If Data Nodes to Be Batch Restarted is too large, a great number of instances are restarted at the same time. As a result, services are interrupted or the performance is severely affected because the number of remaining instances is small.
    • If Batch Fault Tolerance Threshold is too large, services will be interrupted when a new batch of instances is restarted after the previous instance restart failed.

  1. Click OK and wait until the rolling restart is complete.

Synchronizing Instance Configuration

Scenarios

If the Configuration Status of a role instance is Configurations Expired, the role instance has not been restarted after the configuration is modified, and the new configuration is saved only on FusionInsight Manager. In this case, you need to deliver the new configurations.

Impact on the System

After synchronizing the role instance configuration, you need to restart the role instance that is in the expired state. The role instance is unavailable during restart.

Procedure
  1. Log in to FusionInsight Manager.
  2. Choose Cluster > Service.
  3. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  4. Select the specified instance and choose More > Synchronize Configuration.
  5. In the displayed dialog box, click OK to restart the role instance.

Decommissioning and Recommissioning an Instance

Scenarios

Some role instances provide services for external services in distributed and parallel mode. Services independently store information about whether each instance can be used. Therefore, you need to use FusionInsight Manager to recommission or decommission these instances to change the instance running status. Recommission or decommission instances in the following scenarios:

  • Decommission a running instance if it is no longer used. To prevent data loss, decommission an instance before deleting it.
  • If a role instance is out of service and the node is not deleted, you must recommission the instance to start it before using it again.

Some instances do not support the recommissioning and decommissioning functions.

NOTE:

In FusionInsight HD, only the role DataNode in HDFS and the role NodeManager in YARN support the recommissioning and decommissioning functions.

  • If the number of the DataNodes is less than or equal to that of HDFS copies, decommissioning cannot be performed. If the number of HDFS copies is three and the number of DataNodes is less than four in the system, decommissioning cannot be performed. In this case, an error will be reported and force FusionInsight Manager to exit the decommissioning 30 minutes after FusionInsight Manager attempts to perform the decommissioning.
  • During MapReduce task execution, files with 10 copies are generated. Therefore, if the number of DataNode instances is less than 10, decommissioning cannot be performed.
  • If the number of DataNode racks (the number of racks is determined by the number of racks configured for each DataNode) is greater than 1 before the decommissioning, and after some DataNodes are decommissioned, that of the remaining DataNodes changes to 1, the decommissioning will fail. Therefore, before decommissioning DataNode instances, you need to evaluate the impact of decommissioning on the number of racks to adjust the DataNodes to be decommissioned.
  • If multiple DataNodes are decommissioned at the same time, and each of them stores a large volume of data, the DataNodes may fail to be decommissioned due to timeout. To avoid this problem, it is recommended that one DataNode be decommissioned each time and multiple decommissioning operations be performed.
Procedure
  1. Perform the following steps to perform a health check for the DataNodes before decommissioning:

    1. Use PuTTY to log in to the client installation node as a client user and switch to the client installation directory.
    2. For a security cluster, use user hdfs for permission authentication.
      source bigdata_env               #Configure client environment variables.
      kinit hdfs                       #Configure kinit authentication.
      Password for hdfs@HADOOP.COM:    #Enter the login password of user hdfs.
    3. Run the hdfs fsck / -list-corruptfileblocks command, and check the returned result.
      NOTE:

      If HDFS federation has been configured, run the hdfs fsck hdfs://NameService name/directory -list-corruptfileblocks command to check the health status of the file corresponding to the specified NameService.

      For example, run the hdfs fsck hdfs://ns1/ -list-corruptfileblocks or hdfs fsck hdfs://ns1/tmp -list-corruptfileblocks command to check the root directory or the /tmp directory of the NameService ns1.

      • If "has 0 CORRUPT files" is displayed, go to Step 2.
      • If the result does not contain "has 0 CORRUPT files" and the name of the damaged file is returned, go to 1.d.
    4. Run the hdfs dfs -rm Name of the damaged file command to delete the damaged file.

  2. Log in to FusionInsight Manager.
  3. Choose Cluster > Service.
  4. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  5. Select the specified DataNode or NodeManager role instance.
  6. Select Decommission or Recommission from the More drop-down list.

    In the displayed dialog box, enter the password of the current login user and click OK.

    Select I confirm to decommission these instances and accept the consequence of service performance deterioration. and click OK to perform the corresponding operation.
    NOTE:

    During the instance decommissioning, if the service corresponding to the instance is restarted in the cluster using another browser, FusionInsight Manager displays a message indicating that the instance decommissioning is stopped, but the operating status of the instance is displayed as Started. In this case, the instance has been decommissioned on the background. You need to decommission the instance again to synchronize the operating status.

Reinstalling an Instance

Scenarios

If a role instance of a cluster service is abnormal, you can reinstall the instance on the node where the abnormal role instance is located.

Prerequisites

The default SSH port No. (22) is used for each node in the cluster. Otherwise, the task in this section will fail.

Impact on the System

The role instance that is being reinstalled cannot be used. As a result, the overall service status may be abnormal. You need to stop the client or upper-layer application from connecting to or using the role instance.

Procedure
  1. Log in to FusionInsight Manager.
  2. Choose Cluster > Service.
  3. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  4. Select the specified instance and choose More > Reinstall.
  5. In the displayed dialog box, enter the password of the current login user and click OK.
  6. In the displayed dialog box, select I want to delete the selected instances and accept that this may result service faults and data losses..
  7. Determine whether to retain data after the instance is reinstalled.

    • If yes, go to Step 8.
    • If no, select Clear Data and go to Step 8.

  8. Click OK and wait until the instance is reinstalled.

Managing Instance Configurations

Scenarios

You can modify configuration parameters for each role instance. In the scenario where instances are migrated to a new cluster or the corresponding service needs to be deployed again, you can import or export all configuration data of a service on FusionInsight Manager to quickly copy configuration results.

FusionInsight Manager can manage configuration parameters of a single role instance. Modifying configuration parameters and importing or exporting instance configurations do not affect other instances.

Impact on the System

After modifying the configuration of a role instance, you need to restart the instance. The role instance is unavailable during restart. If the instance is not restarted, the configuration status of the instance is Configurations Expired.

Modifying Instance Configuration
  1. Log in to FusionInsight Manager.
  2. Choose Cluster > Service.
  3. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  4. Click the specified instance and select Instance Configuration.

    By default, Basic Configuration is displayed. To modify more parameters, select All Configurations. All parameter categories supported by the instance are displayed on the All Configurations tab page.

  5. In the navigation tree, select the specified parameter category and change the parameter values on the right.

    If you are not sure about the location of a parameter, you can enter the parameter name in search box in the upper right corner. The system searches for the parameter in real time and displays the result.

  6. Click Save. In the confirmation dialog box, click OK.

    Wait until the message "Operation succeeded." is displayed. Click Finish.

    The configuration is modified.

Exporting Instance Configuration
  1. Log in to FusionInsight Manager.
  2. Choose Cluster > Service.
  3. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  4. Click the specified instance and select Instance Configuration.
  5. Click Export.
Importing Service Configuration
  1. Log in to FusionInsight Manager.
  2. Choose Cluster > Service.
  3. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  4. Click the specified instance and select Instance Configuration.
  5. Click Import.

    Select the configuration parameter file of the instance and import the file.

Deleting a Role Instance

Scenarios
If a role instance in a service is no longer used or needs to be migrated to another host, delete the instance.
NOTE:

To avoid data loss, decommission role instances before deleting them. For details, see Decommissioning and Recommissioning an Instance.

Impact on the System

Some service configuration may expire due to role instance deletion. You need to restart the services for the configuration to take effect. These services are unavailable during restart.

Procedure
  1. Log in to FusionInsight Manager.
  2. Choose Cluster > Service.
  3. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  4. Select the specified instance and choose More > Delete.

    Enter the administrator password to verify the identity, and select I want to delete the selected instances and accept that this may result in service faults. And click OK.

Download
Updated: 2019-05-17

Document ID: EDOC1100074522

Views: 6113

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next