No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD 6.5.0 Product Description 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Reliability Enhanced Features

Reliability Enhanced Features

FusionInsight HD optimizes and improves reliability and performance of main service components based on Apache Hadoop open-source software.

System reliability

  • High availability (HA) for management nodes of all components

    Data and compute nodes of the Hadoop open-source version are designed based on the distributed system. Therefore, the whole system is not affected by single point of failures (SPOFs) of data and compute nodes. However, management nodes operate in centralized mode. SPOFs of management nodes affect the whole system reliability.

    Huawei FusionInsight HD provides the dual-node mechanism for management nodes, such as OMS server, HDFS, NameNode, Hive Server, HBase HMaster, YARN Resources Manager, Kerberos Server, and Ldap Server of all service components. The management nodes work in active/standby or load-sharing mode, preventing impact of SPOFs on system reliability.

  • Reliability guarantee in case of exceptions

    By reliability analysis, the following measures for software and hardware exceptions are provided to improve the system reliability:

    • After power supply is restored, services are running properly regardless of a power failure of a single node or the whole cluster, ensuring data reliability in case of unexpected power failures. Key data will not be lost unless the hard disk is damaged.
    • Health status check and fault handling of the hard disk do not affect services.
    • The file system faults can be automatically handled, and affected services can be automatically restored.
    • The process and node faults can be automatically handled, and affected services can be automatically restored.
    • The network faults can be automatically handled, and affected services can be automatically restored.
  • Remote disaster recovery (DR) for HBase clusters

    Real-time remote DR is adopted for HBase clusters, ensuring reliability of the HBase cluster system. The DR system for the HBase cluster is the first DR system in the industry that is more than 1000 km away from the active cluster. Health status monitoring and function switching are performed between the production and DR systems. When one system stops working due to disasters, such as fire, flood, earthquake, or malicious damage, the whole application system is switched to the other system, ensuring that functions are running properly.

    HBase cluster DR system also provides basic O&M methods, including DR relationship maintenance and reestablishment, data verification, and data synchronization progress viewing.

  • Data backup and restoration

    FusionInsight HD provides full backup, incremental backup, and restoration functions based on service requirements, preventing the impact of data loss and damage on services and ensuring fast system restoration in case of exceptions.

    • Automatic backup

      FusionInsight HD provides automatic backup for data on Manager. Based on the customized backup policy, data on HBase, OMSServer, LDAP server, and DBService and ESN codes can be automatically backed up.

    • Manual backup

      You can also manually back up data on Manager before capacity expansion, upgrade, and patch installation to recover the system functions upon faults.

      To improve the system reliability, data on OMS and HBase will be backed up to a third-party server manually.

Node reliability

  • OS health status monitoring

    FusionInsight HD provides the following monitoring measures for the OS:

    • Enabling the hardware watchdog function
    • Adjusting OS kernel parameters to restart the OS and restore services when a critical fault, for example, memory exhaust, invalid address accessing, kernel dead lock, or invalid dispatcher occurs in the OS
    • Periodically collecting OS running status data, including the processor status, memory status, hard disk status, and network status
  • Process health status monitoring

    NodeAgent is deployed on all nodes of FusionInsight HD to monitor service instance status and health status of service instance processes.

  • Automatic processing of hard disk faults

    FusionInsight HD is enhanced based on the community version. It can monitor the status of hardware and file systems on all nodes. If a partition is faulty, the corresponding partition will be separated from the storage pool. If the whole hard disk is faulty and replaced, the new hard disk will be added to the storage pool. In this case, maintenance operations are simplified. Replacement of faulty hard disks can be complete online. In addition, users can set hot backup disks to reduce the faulty disk restoration time and improve the system reliability.

  • RAID group configuration for nodes

    It is recommended that hard disk resources of nodes be planned based on service requirements to improve the FusionInsight HD's capability against hard disk faults.

    • It is recommended that the OSs of nodes be installed on RAID 1 formed by two hard disks to ensure system disk reliability.
    • If allowed, RAID 1 is recommended for hard disks (HDFS NameNode, database, and ZooKeeper) used for key processes of management nodes to ensure meta data reliability.
    • Configure no RAID groups for data disks (HDFS DataNode, Kafka, Storm Supervisor, Redis, SolrServerAdmin, and SolrServerN). If RAID groups are required (for disk identification), you can configure RAID 0 groups (only one disk in each RAID group).

Data reliability

FusionInsight HD monitors hardware (especially hard disks), OS, and processes of nodes to discover exceptions in time. In this case, the fault detection and restoration time is reduced, and the data persistence rate of the whole system is improved.

Download
Updated: 2019-05-17

Document ID: EDOC1100074548

Views: 3160

Downloads: 36

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next