No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD V100R002C60SPC200 Product Description 06

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
HBase

HBase

Basic Concept

Overview

HBase is a distributed, column-oriented storage system built on the Hadoop Distributed File System (HDFS). The column-based HBase features high reliability, performance, and scalability. HBase is suitable for storing big table data (a table containing billions of rows and millions of columns) and allows real-time data access.

  • Uses the HDFS as the file storage system to provide database systems with high reliability, high performance, column-based storage, scalability, and real-time read and write capabilities.
  • Allows Spark and Hadoop MapReduce to process mass data in real time.
  • Uses ZooKeeper as the co-ordination service.
Architecture

An HBase cluster consists of active and standby HMaster processes and multiple RegionServer processes, as shown in Figure 2-9.

Figure 2-9 HBase architecture
Table 2-4 HBase modules

Module

Function

Master

Also called HMaster. In High Availability (HA) mode, Master involves the active Master and standby Master.

  • Active HMaster: Manages RegionServer in HBase, including the creation, deletion, modification, and query of a table, balances the load of RegionServer, adjusts the distribution of Region, splits Region and distributes Region after it is split, and migrates Region after RegionServer expires.
  • Standby HMaster: Takes over services from the active HMaster if the active HMaster is faulty. The original active HMaster serves as the standby HMaster when the fault is rectified.

Client

Client communicates with Master and RegionServer by using the Remote Procedure Call (RPC) mechanism of HBase. Client communicates with Master for management and with RegionServer for data operation.

RegionServer

Provides read/write services of table data as a data processing and computing unit in HBase.

The RegionServer is usually deployed with the DataNode of the HDFS cluster to store data.

ZooKeeper cluster

Provides distributed coordination services for processes in the HBase cluster. Each RegionServer is registered with ZooKeeper so that Active HMaster can obtain the health status of each RegionServer.

HDFS cluster

Provides highly reliable file storage services for HBase. All HBase data is stored in the HDFS.

Principle
  • HBase Data Model

    Data is stored as tables in HBase. Figure 2-10 shows the HBase data model. Data in a table is divided into multiple Regions, which are allocated by HMaster to RegionServers for management.

    Each Region contains data within a Row Key range. At the beginning, an HBase data table contains only one Region. As data increases and reaches the upper limit of the Region capacity, the Region is split into two Regions.

    Figure 2-10 HBases data model
    Table 2-5 Columns in a data model

    Column

    Description

    Row Key

    Functions as the primary key of a relationship table, which is the unique ID of the data in each row. A RowKey can be a string, integer, or binary string. All records are stored after being sorted by RowKey.

    Timestamp

    Indicates the time stamp of a data operation. Data can be specified with different versions by time stamp. Data of different versions in each cell is stored by time in descending order.

    Cell

    The smallest store unit of HBase, which consists of Key and Valve. The Key consists of 6 fields: row, column, family column, qualifier, timestamp type, and MVCC version. The Value is the binary data object that is stored.

    Column Family

    Indicates the column family. A table consists of one or multiple column families horizontally. A column family can consist of multiple random columns. A column is a label under the column family, which can be added as required when data is written. The column family supports dynamic expansion so the number and type of columns do not need to be predefined. Columns of a table in HBase are sparsely distributed. The number and type of columns in different rows can be different. Each column family has the independent time to live (TTL). You can lock the row only. Operations on a row are always original operations.

    Column

    Similar to traditional databases, HBase tables also use columns to store data of the same type.

  • RegionServer Data Storage

    RegionServer manages Regions allocated by HMaster. Figure 2-11 shows the RegionServer data storage structure.

    Figure 2-11 RegionServer data storage structure

    Table 2-6 describes each component shown in Figure 2-11.

    Table 2-6 Region structure

    Component

    Description

    Store

    A Region consists of one or multiple Stores. Each Store maps a Column Family in Figure 2-10.

    MemStore

    A Store contains one MemStore. The MemStore caches data inserted by the client to the Region. When the MemStore size reaches the upper limit, RegionServer flushes data in MemStore to the HDFS.

    StoreFile

    The data flushed on the HDFS is stored as a StoreFile in the HDFS. As more data is inserted, multiple StoreFiles are generated in a Store. When the number of StoreFiles reaches the limit, RegionServer merges multiple StoreFiles into a big StoreFile.

    HFile

    HFile defines the storage format of StoreFiles in a file system. HFile is underlying implementation of StoreFile.

    HLog

    HLogs prevent data loss when RegionServer is faulty. Multiple Regions in a RegionServer share the same HLog.

  • Metadata Table

    The metadata table is a special HBase table, which is used by the client to locate a Region. Metadata table includes hbase:meta table.hbase:meta table records Region information of user tables, such as the Region location and start and end Row Keys.

    Figure 2-12 shows the mapping relationships between metadata tables and user tables.

    Figure 2-12 Mapping relationships between metadata tables and user tables
  • Data Processing Process
    Figure 2-13 shows how data is processed in HBase.
    Figure 2-13 Data processing process
    1. When adds, deletes, modifies, and queries HBase data, the HBase client first connects to ZooKeeper to obtain information (Operations that modify the namespace, such as the creation and deletion of the table, need to access HMaster to update the meta information.) about the RegionServer where the hbase:meta table is located.
    2. The HBase client connects to the RegionServer where the Region of the hbase:meta table is located and obtains the information about the RegionServer where the Region of the user table is located.
    3. The HBase client connects to the RegionServer where the Region of the user table is located and issues a data operation command to the RegionServer. The RegionServer executes the command.

    To improve data processing efficiency, the HBase client caches Region information of the hbase:meta table, and user table in memory. When an application initiates a data operation, the HBase client queries the Region information from the memory. If no match is found in the memory, the HBase client performs the preceding operations to obtain Region information of the hbase:meta table, and user table.

HA Overview

Background

The HMaster in the HBase allocates regions and maintains meta information tables; migrates corresponding meta from a failed regionserver to another regionserver. The HMaster HA feature is brought in to prevent HBase functions from being affected by the HMaster SPOF.

Implementation
Figure 2-14 HMaster HA architecture

The HMaster HA architecture is implemented by creating empheral zookeeper node in the ZooKeeper cluster.

Upon startup, two HMaster nodes try to create a master znode in the ZooKeeper cluster. The HMaster node that creates the master znode becomes the Active HMaster, and the other is the Standby HMaster which will add watch events to the master node.

If the active node fails, it disconnects from the ZooKeeper cluster. After the session expires, the master node disappears. The standby node detects the disappearance of the node through watch events and creates a master node to make itself be the Active Master. Then, the Active/Standby switchover completes. If the subsequently failed node detects existence of the master node after being restarted, it enters the Standby state and adds watch events to the master znode.

When the client accesses the HBase, it first obtains the HMaster's address based on the master node information on the Zookeeper and then establishes a connection to the active HMaster.

Relationship with Other Components

Relationship Between HDFS and HBase

HDFS is the subproject of Apache Hadoop. HBase uses the Hadoop Distributed File System (HDFS) as the file storage system. HBase is located in structured storage layer. The HDFS provides highly reliable support for lower-layer storage of HBase. All the data files of HBase can be stored in the HDFS, except some log files generated by HBase.

Relationship Between ZooKeeper and HBase

Figure 2-15 describes the relationship between ZooKeeper and HBase.

Figure 2-15 Relationship between ZooKeeper and HBase
  1. HRegionServer registers itself to ZooKeeper in Ephemeral mode. ZooKeeper stores the HBase information, including the HBase metadata and HBase address.
  2. HMaster detects the health status of each HRegionServer using ZooKeeper, and monitors them.
  3. HBase can deploy multiple HMasters (like HDFS NameNode). When the active HMatser node is faulty, the standby HMaster node obtains the state information of the entire cluster using ZooKeeper, which means that HBase single point faults can be avoided using ZooKeeper.
Translation
Download
Updated: 2019-04-10

Document ID: EDOC1000104139

Views: 6644

Downloads: 65

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next