No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD V100R002C60SPC200 Product Description 06

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Hive

Hive

Basic Principle

Function

Hive is an open-source data warehouse built on Hadoop. It provides storage of structured data and basic data analysis services using the Hive query language (HQL), a language like the structured query language (SQL). Hive converts HQL statements into MapReduce tasks or Spark tasks for querying and analyzing massive data stored in Hadoop clusters.

Hive provides the following functions:

  • Analyzes massive structured data and summarizes analysis results.
  • Allows complex MapReduce jobs to be compiled in SQL languages.
  • Supports data storage formats, including JavaScript object notation (JSON), comma separated values (CSV), TextFile, RCFile, SequenceFile, and Optimized Row Columnar (ORC).
Structure

Hive is a single-instance service process that provides services by translating HQL statements into related MapReduce jobs or HDFS operations. Figure 2-51 shows the Hive structure.

Figure 2-51 Hive structure

Table 2-13 Module description

Concept

Description

HiveServer

Multiple HiveServers can be deployed in a cluster in load sharing mode. HiveServer provides Hive database services externally, translates HQL statements submitted by users into related Yarn tasks, Spark tasks, or HDFS operations to complete data extraction, conversion, and analysis.

MetaStore

  • Multiple MetaStores can be deployed in a cluster in load sharing mode. MetaStore provides Hive metadata services as well as reads, writes, maintains, and modifies the structure and attributes of Hive tables.
  • MetaStore provides Thrift interfaces for HiveServer, Spark, WebHCat, and other MetaStore clients to access and operate metadata.
WebHCat Multiple WebHCats can be deployed in a cluster in load sharing mode. WebHCat provides Rest interfaces and uses them to run the Hive command to submit MapReduce tasks.
Hive client Includes the human-machine command-line interface (CLI) Beeline, JDBC driver provided for JDBC applications, Python driver provided for Python applications, and HCatalog JAR packages provided for MapReduce.

ZooKeeper cluster

As a temporary node, ZooKeeper records the IP address list of each HiveServer instance. The client driver connects to ZooKeeper to obtain the list and select the HiveServer instance according to the routing mechanism.

HDFS/HBase cluster

Hive table data is stored in the HDFS cluster.

MapReduce/Yarn cluster

Provides distributed computing services. Most Hive data operations rely on MapReduce. The main function of HiveServer is to translate HQL statements into MapReduce jobs to process massive data.

Spark cluster

HiveServer converts HQL statements into Spark tasks and runs the Spark tasks in the Spark cluster.

HCatalog is a table and storage management layer on Hadoop. Users who use different data processing tools (such as MapReduce) can read and write data more easily in the cluster by using HCatalog. As shown in Figure 2-52, development personnel's application programs use HTTP requests to access Hadoop MapReduce (Yarn), Pig, Hive, and HCatalog DDL. If the request is an HCatalog DDL command, it will be executed directly. If the request is a MapReduce, Pig, or Hive task, it will be placed in the queue of the WebHCat (Templeton) server, allowing the progress to be monitored or stopped. Development personnel specify the paths to the processing results of MapReduce, Pig, and Hive tasks in the HDFS.
Figure 2-52 WebHCat logical architecture
Principle

Hive functions as a data warehouse based on HDFS and MapReduce architecture and translates HQL statements into MapReduce jobs or HDFS operations.

Figure 2-53 shows the Hive structure.

Metastore – reads, write, and update metadata such as tables, columns, and partitions. Its lower layer is relational databases.

Driver – manages the life cycle of HQL execution and participates in the entire Hive job execution.

Compiler – translates HQL statements into a series of interdependent MapReduce jobs.

Optimizer – is classified into logical optimizer and physical optimizer to optimize HQL execution plans and MapReduce jobs, respectively.

Executor – executes Map and Reduce jobs based on job dependencies.

ThriftServer – provides thrift interfaces as the servers of JDBC and ODBC, and integrates Hive and other applications.

Clients – includes Web UI and JDBC/ODBC interfaces, and provides interfaces for user access.

Figure 2-53 Hive effect

Relationship with Components

Relationship Between Hive and HDFS

Hive is the subproject of Apache Hadoop. Hive uses the Hadoop Distributed File System (HDFS) as the file storage system. Hive parses and processes structured data, and HDFS provides highly reliable lower-layer storage support for Hive. All data files in the Hive database are stored in HDFS, and all data operations on Hive are also performed using HDFS APIs.

Relationship Between Hive and MapReduce

Hive data computing depends on MapReduce. MapReduce is a subproject of the Apache Hadoop project. It is a parallel computing framework based on HDFS. During data analysis, Hive translates HQL statements submitted by users into MapReduce jobs and submits the jobs for MapReduce to execute.

Relationship Between Hive and DBService

MetaStore (metadata service) of Hive processes the structure and attribute information of Hive databases, tables, and partitions. The information needs to be stored in a relational database and is maintained and processed by MetaStore. In FusionInsight HD, the relational database is maintained by the DBService component.

Translation
Download
Updated: 2019-04-10

Document ID: EDOC1000104139

Views: 5975

Downloads: 64

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next