No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD 6.5.0 Product Description 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
GraphBase

GraphBase

Basic Principles

Introduction

With the quick development of network technologies, enterprises in the Internet era are facing massive data. With the increase of data sets, the query performance of traditional relational databases deteriorates, especially for some special service scenarios. Therefore, a new solution is urgently needed to cope with this crisis. To resolve the complex relationship problem, GraphBase came into being.

VertexFilter is described as follows: GraphBase is a distributed graph database based on FusionInsight HD. Based on the distributed storage mechanism of HBase, it supports data of tens of billions of nodes and hundreds of billions of relationships, and provides Spark-based data import and Elasticsearch-based index mechanisms. FusionInsight GraphBase is widely used in recommendations, relationship analysis, and financial anti-fraud. FusionInsight GraphBase has the following features:

  • Distributed and seamless integration with the Hadoop ecosystem.
  • Queries of Hundreds of billions of relationships on tens of billions of nodes in just seconds.
  • Easy-to-use REST APIs are provided to facilitate data query and analysis.
  • The powerful Gremlin graph traversal function is provided to implement complex service logic.
  • Offline batch import, real-time stream import, and import performance optimization.
Architecture

GraphBase contains the GraphServer and LoadBalancer roles.

  • GraphServer: includes the GremlinServer and StandardServer services. GremlinServer is used for the graph query using Gremlin, and StandardServer is used for the REST service. When the system is started, the meta graph is started first. The meta graph is used to store multi-graph metadata and asynchronous tasks. ZooKeeper monitors live instances in services and provides distributed lock services.
  • LoadBalancer: provides the load sharing capability for graph services.

Figure 2-25 shows the GraphBase architecture.

Figure 2-25 GraphBase architecture
  • Access layer
    • Gremlin API: is an open-source standard language interface for graph interactive query provided by the open-source Apache TinkerPop Gremlin component.
    • REST APIs: includes APIs for graph query, modification, and management, and graph algorithm of Huawei enhanced online analysis.
    • Load Balancer: provides load sharing for multi-instance GraphServer.
  • Computing layer
    • Provides a core engine of data management and metadata management for GraphBase.
    • Provides interface adaptation for backend storage and index.
  • Storage layer
    • Distributed KV storage that provides massive graph data storage capabilities.
    • Provides a search engine with secondary index, full-text search, and fuzzy search capabilities.

Typical application scenarios:

  • Anti-financial fraud
  • Knowledge map
  • Relationship analysis

Key Features

Multi-graph

Scenarios

  • Different service departments can use the same graph database to import different graphs for application development.
  • Different applications use different data. Data is not associated, which facilitates service isolation.

Design of multi-graph solution

  • GraphServer: includes the GremlinServer and StandardServer services. GremlinServer is used for the graph query using Gremlin, and StandardServer is used for the REST service. When the system is started, the meta graph is started first. The meta graph is used to store multi-graph metadata and asynchronous tasks. ZooKeeper monitors live instances in services and provides distributed lock services.
  • LoadBalancer: provides the load sharing capability for graph services.
  • GraphWriter: is the module for batch data import.
  • GraphStreaming: is used for real-time data import.
Importing Data

Batch import and real-time import

FusionInsight GraphBase supports batch data import and real-time data import. For batch data import, Spark is used to import all historical data stored in HDFS to GraphBase. For real-time data import, Kafka and SparkStreaming are used to import data to GraphBase in real time.

Flexible data mapping rules are provided to map original data to graph models.

BulkLoad supported in batch data import

The capability of importing data in BulkLoad mode is added to support faster data import.

During data import, graph HFiles and inner secondary index HFiles can be generated in one MapReduce job.

Relationship with Other Components

GraphBase stores service data and metadata in HBase to handle massive data. External index data is stored in Elasticsearch to implement query capabilities, such as full-text search and fuzzy match. Spark is used to import data in batches and in real time. MapReduce is used to rebuild indexes and delete indexes in batches. ZooKeeper is used to implement distributed coordination of multiple instances of the computing engine.

Figure 2-26 shows the relationship between GraphBase and other components.

Figure 2-26 Relationship between GraphBase and other components

Download
Updated: 2019-05-17

Document ID: EDOC1100074548

Views: 3245

Downloads: 36

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next