No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


FusionInsight HD 6.5.0 Administrator Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).



FusionInsight HD supports backup and recovery of user data and system data. The backup function is provided on a component basis and the backup function of the following data is supported: Manager data (including OMS data and LdapServer data), HBase service data, HDFS service data, Hive service data, ZooKeeper service data, as well as DBService component metadata, HBase metadata, Kafka metadata, HDFS NameNode metadata, Redis metadata and Elasticsearch metadata.

FusionInsight HD supports backing up data to local disks, local HDFS, remote HDFS, CIFS or NAS. For details, see section Backing Up Data.

For components supporting multi-service, multiple instances of the same service can be backed up and restored. The backup and restoration operations are the same as those when there is one instance. For details about components that support multi-service, see section "Software Deployment Scheme" in Product Description.

The backup and recovery tasks are performed in the following scenarios:

  • Routine backup is performed to ensure the data security of the system and components.
  • When the system is faulty, the data backup can be used to recover the system.
  • When the active cluster is completely faulty, an image cluster same as the active cluster needs to be created, and backed up data can be used to perform restoration operations.
Table 13-1 Backing up Manager configuration data based on service requirements

Backup Type

Backup Content


Back up database data (excluding alarm data) and configuration data in the cluster management system by default.


User information, including the username, password, key, password policy, and group information, is backed up.

Table 13-2 Backing up service data of specific components based on service requirements

Backup Type

Backup Content


Back up table-level user data.

For clusters enabled with multi-service, the multiple HBase service instances can be backed up and restored. The backup and restoration operations are the same as those for the HBase service instance.


Back up the directories or files that correspond to user services.


Back up table-level user data.

For clusters enabled with multi-service, the multiple Hive service instances can be backed up and restored. The backup and restoration operations are the same as those for the Hive service instance.


Back up index data. For clusters enabled with multi-service, the backup and restoration function is supported for multiple Elasticsearch service instances and the backup and restoration operations are consistent with those of a single Elasticsearch service instance.


Back up service data stored in the ZooKeeper.

Table 13-3 Backing up component metadata based on service requirements

Backup Type

Backup Content


Back up metadata of components (including Loader, Metadata, Hive, Spark, Oozie, Hue, and Redis) managed by DBService. After the multi-instance function is enabled, the metadata of multiple Hive and Spark service instances is backed up.


Back up HDFS metadata. For clusters enabled with multi-service, the backup and recovery function is supported for these NameServices and the backup and recovery operations are consistent with those of the default instance hacluster.


Kafka metadata


tableinfo file and data files of HBase


Elasticsearch metadata, that is, data related to Elasticsearch security features that exist in ZooKeeper


Redis metadata.

Note that some components do not provide the data backup and restoration functions:

  • Kafka supports copies and allows multiple copies to be specified when a topic is created.
  • SolrServerAdmin and SolrServerN of Solr can implement data backup between different nodes by automatically creating Replica.
  • MapReduce and Yarn data is stored in the HDFS. Therefore, MapReduce and Yarn depend on the HDFS to provide the backup and restoration functions.
  • The underlying GraphBase data is stored in HBase tables, and the GraphBase index data is stored in Elasticsearch. Therefore, GraphBase depends on HBase and Elasticsearch for data backup and restoration.



Before backup or recovery, you need to create a backup or recovery task and set task parameters, such as the task name, backup data source, and type of backup file save path. Data backup and recovery can be performed by executing backup and recovery tasks. When the Manager is used to recover the data of HDFS, HBase, Elasticsearch, Hive, and NameNode, the cluster cannot be accessed.

Each backup task can back up data of different data sources and generates an independent backup file for each data source. All the backup files generated in each backup task form a backup file set, which can be used in recovery tasks. Backup data can be stored on Linux local disks, local cluster HDFS, and standby cluster HDFS. The backup task provides the full backup or incremental backup policies. HBase, Elasticsearch, HDFS, and Hive backup tasks support the incremental backup policy, while OMS, LdapServer, DBService, and NameNode backup task supports only the full backup policy.


Task execution rules:

  • If a task is being executed, the task cannot be executed repeatedly and other tasks cannot be started too.
  • The interval at which a periodical task is automatically executed must be greater than 120s; otherwise, the task is postponed and executed in the next period. Manual tasks can be executed at any interval.
  • When a period task is to be automatically executed, the current time cannot be 120s later than the task start time; otherwise, the task is postponed and executed in the next period.
  • When a periodical task is locked, it cannot be automatically executed and needs to be manually unlocked.
  • Before an OMS, LdapServer, DBService, Kafka or NameNode backup task starts, ensure that the LocalBackup partition on the active management node has more than 20 GB available space; otherwise, the backup task cannot be started.

When planning backup and recovery tasks, select the data to be backed up or recovered strictly based on the service logic, data store structure, and database or table association. The system creates a default periodic backup task default whose execution interval is one hour to perform full backup of OMS, LdapServer, DBService, and NameNode data to the Linux local disk.


The system adopts snapshot technology to quickly back up data. Snapshots include HBase snapshots HDFS snapshots and Elasticsearch snapshots.

  • HBase snapshot

    An HBase snapshot is a backup file of HBase tables at a specified time point. This backup file does not copy service data or affect the RegionServer. The HBase snapshot copies table metadata, including table descriptor, region info, and HFile reference information. The metadata can be used to recover data before the snapshot creation time.

  • HDFS snapshot

    An HDFS snapshot is a read-only backup copy of the HDFS file system at a specified time point. The snapshot is used in data backup, misoperation protection, and disaster recovery scenarios.

    The snapshot function can be enabled for any HDFS directory to create the related snapshot file. Before creating a snapshot for a directory, the system automatically enables the snapshot function for the directory. Creating a snapshot does not affect any HDFS operation. A maximum of 65536 snapshots can be created for each HDFS directory.

    When a snapshot is being created for an HDFS directory, the directory cannot be deleted or modified before the snapshot is created. Snapshots cannot be created for the upper-layer directories or subdirectories of the directory.

  • Elasticsearch snapshot

    Elasticsearch snapshot uses the index data policy (snapshot API) of the backup cluster provided by Elasticsearch. The status and data of the current cluster is backed up at a specified time and saved to the specified snapshot repository. In the first snapshot, all data is copied. In subsequent snapshots, only the difference between the existing snapshot and the new data is saved.


Distributed copy (DistCp) is a tool used to perform large-amount data replication in the cluster HDFS or between the HDFSs of different clusters. In an HBase, HDFS, Hive or Elasticsearch metadata backup or recovery task, if the data is backed up in the HDFS of the standby cluster, the system invokes DistCp to perform the operation. Install the FusionInsight HD system of the same version on the active and standby clusters.

DistCp uses MapReduce to implement data distribution, troubleshooting, recovery, and report. DistCp specifies different Map jobs for various source files and directories in the specified list. Each Map job copies the data in the partition that corresponds to the specified file in the list.

To use DistCp to perform data replication between the HDFS of two clusters, configure the cross-cluster trust relationship and cross-cluster replication function for both clusters. When backing up the cluster data to HDFS in another cluster, you need to install the Yarn component. Otherwise, the backup fails.

Local rapid recovery

After using DistCp to back up the HBase, HDFS, and Hive data of the local cluster in the HDFS of the standby cluster, the HDFS of the local cluster retains the backup data snapshots. Users can create local rapid recovery tasks to recovery day by using the snapshot files in the HDFS of the local cluster.


Network Attached Storage (NAS) is a dedicated data storage server which includes the storage device and embedded system software. It provides the cross-platform file sharing function. By using NFS (supporting NFSv3 and NFSv4) and CIFS (supporting SMBv2 and SMBv3) protocols, users can connect the FusionInsight service plane with the NAS server to back up or restore data to or from the NAS.

  • Before data is backed up to the NAS, the system automatically mounts the NAS shared address to a local partition. After the backup is complete, the system uninstalls the NAS shared partition.
  • To prevent backup and restoration failures, do not access the shared address where the NAS server mounts to the local host during data backup and restoration, for example, /srv/BigData/LocalBackup/nas.
  • When service data is backed up to the NAS, DistCp is used.
  • On the EulerOS 2.1 OS, NFS cannot be used to back up data to NAS or restore data from NAS.


Table 13-4 Backup and recovery feature specifications



Maximum number of backup or recovery tasks


Number of concurrent running tasks


Maximum number of waiting tasks


Maximum size of backup files on a Linux local disk (GB)


Table 13-5 Specifications of the default task






Backup period

1 hour

Maximum number of copies

168 (Historical records of seven days)

24 (Historical records of one day)

Maximum size of a backup file

10 MB

20 MB

100 MB

20 GB

Maximum size of disk space used

1.64 GB

3.28 GB

16.41 GB

480 GB

Save path of backup data

Data path/LocalBackup/ on active and standby management nodes

  • The administrator must regularly transfer the backup data of the default task to an external cluster based on the enterprise's O&M requirements.
  • The administrator can create a DistCp backup task to store data of OMS, LdapServer, DBService, and NameNode to an external cluster.
Updated: 2019-05-17

Document ID: EDOC1100074522

Views: 6051

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Previous Next