No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


FusionInsight HD 6.5.0 Product Description 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).



HBase is a distributed storage database of the KeyValue type. Data of a table is sorted in the alphabetic order based on RowKeys. If you query data based on a specified RowKey or scan data in the scale of a specified RowKey, HBase can quickly locate the data that needs to be read, enhancing the efficiency.

However, in most actual scenarios, you need to query the data of which the column value is XXX. HBase provides the Filter feature to query data with a specific column value: All data is scanned in the order of RowKey, and then the data is matched with the specific column value until the required data is found. The Filter feature scans some unnecessary data to obtain the required data. Based on the preceding description, the Filter feature cannot meet the requirements of the frequent query with high performance standards.

To address this issue, the HBase HIndex is generated. The HBase HIndex enables HBase to query data based on specific column values.

Figure 4-11 HIndex
  • Rolling upgrade is not supported for indexed data.
  • Composite Index: User must put/delete all the columns participating in composite index together in single mutation else it will cause inconsistencies.
  • User should not configure any split policy explicitly to table which has indexed data.
  • Does not support other mutation operations like increment, append.
  • Index on columns having maxVersions > 1 is not supported.
  • The column on which index is added should not have value size more than 32KB.
  • When user data is deleted because of column family level TTL expiry , corresponding index data is not immediately deleted. Index data will get deleted during major compaction.
  • TTL of the user column family should not be changed after index creation.
    • If column family TTL is changed to higher value after creating an index, the index should be dropped and created again, otherwise some of the already generated index data may get deleted earlier than the user data.
    • If column family TTL is changed to lower value after creating an index, index may get deleted later than user data.

Multi-point Division

When users create tables that are pre-divided by region in HBase, users may not know the data distribution trend so the division by region may be inappropriate. After the system runs for a period, regions need to be divided again to achieve better performance. Only empty regions can be divided.

The region division function delivered with HBase divides regions only when they reach the threshold. This is called single point division.

To achieve better performance when regions are divided based on users' requirements, multi-point division is developed, which is also called dynamic division. The multi-point division function pre-divides an empty region into multiple regions to prevent the performance being deteriorated due to insufficient region space.

Figure 4-12 Multi-point division

Connection Limitation

Too many sessions mean that too many queries and MapReduce tasks are running on HBase, which compromises HBase performance or even causes service rejection. You can configure parameters to limit the maximum number of sessions that can be established between the client and the HBase server to achieve HBase overload protection.

Disaster Recovery

The disaster recovery (DR) capabilities between the active and standby cluster can enhance the high availability (HA) of the HBase data. The active cluster provides data services and the standby cluster backs up data. When the active cluster is faulty, the standby cluster takes over data services. Compared with the open source Replication function, the functions are enhanced as follows:

  1. The standby cluster whitelist function is only applicable to pushing data to a specified cluster IP address.
  2. In the open source version, replication is synchronized based on WAL, and data backup is implemented by replaying WAL in the standby cluster. For bulk loads, since no WAL is generated, data will not be replicated to the standby cluster. By recording BulkLoad operations on the WAL and synchronizing them to the standby cluster, the standby cluster can read BulkLoad operation records through WAL and load HFile in the active cluster to implement data backup.
  3. In the open source version, HBase filters ACLs. Therefore, ACL information will not be synchronized to the standby cluster. By adding a filter (org.apache.hadoop.hbase.replication.SystemTableWALEntryFilterAllowACL), ACL information can be synchronized to the standby cluster. You can configure hbase.replication.filter.sytemWALEntryFilter to enable the filter and implement ACL synchronization.
  4. As for read-only restriction of the standby cluster, only super users within the standby cluster can modify the HBase of the standby cluster. In other words, HBase clients outside the standby cluster can only read the HBase of the standby cluster.


In the actual application scenario, data in various sizes needs to be stored, for example, image data and documents. Data whose size is smaller than 10 MB can be stored in HBase. HBase can yield the best read-and-write performance for data whose size is smaller than 100 KB. If the size of data stored in HBase is greater than 100 KB or even reaches 10 MB and the same number of data files are inserted, the total data amount is large, causing frequent compaction and split, high CPU consumption, high disk I/O frequency, and low performance.

MOB data (whose size ranges from 100 KB to 10 MB) is stored in a file system (for example, HDFS) in HFile format. Tools expiredMobFileCleaner and Sweeper are used to manage HFiles and save the address and size information about the HFiles to the store of HBase as values. This greatly decreases the compaction and split frequency in HBase and improves performance.

In Figure 4-13, MOB indicates mobstore stored on HRegion. mobstore stores keys and values. Wherein, a key is the corresponding key in HBase, and a value is the reference address and data offset stored in the file system. When reading data, mobstore uses its own scanner to read keyvalue data objects and uses the address and data size information in the value to obtain target data from the file system.

Figure 4-13 MOB data storage principle


HBase FileStream (HFS) is an independent HBase file storage module. It is used in FusionInsight HD upper-layer applications by encapsulating HBase and HDFS interfaces to provide these upper-layer applications with functions such as file storage, read, and deletion.

In the Hadoop ecosystem, the HDFS and HBase face tough problems in mass file storage in some scenarios:

  • If mass small files are stored in the HDFS, NameNode will have great pressure.
  • Some large files cannot be directly stored in the HBase due to HBase interfaces and internal mechanism.

The HFS is developed for the mixed storage of mass small files and some large files in the Hadoop. Simply speaking, mass small files (smaller than 10 MB) and some large files (greater than 10 MB) need to be stored in HBase tables.

For such a scenario, the HFS provides unified operation interfaces similar to HBase function interfaces.

Updated: 2019-05-17

Document ID: EDOC1100074548

Views: 3277

Downloads: 36

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Previous Next