No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD 6.5.0 Administrator Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Technical Principles

Technical Principles

Multi-Tenant Management

Unified Multi-Tenant Management

FusionInsight Manager is a unified multi-tenant management platform in FusionInsight HD, which integrates multiple functions, such as tenant lifecycle management, tenant resource configuration, and tenant resource usage statistics, delivering a mature multi-tenant management model and achieving centralized tenant and service management.

Graphical User Interface

FusionInsight Manager provides the graphical multi-tenant management interface and manages and operates multiple levels of tenants using the tree structure. Additionally, FusionInsight Manager integrates the basic information and resource quota of the current tenant in one interface to facilitate O&M and management, as shown in Figure 9-1.

Figure 9-1 Tenant management page of FusionInsight Manager

Level-based Tenant Management

FusionInsight Manager supports a level-based tenant management model in which you can add sub-tenants to an existing tenant to re-configure resources. For example, in Figure 9-1, tenant_t_1 is a sub-tenant of tenant_I_1, in which tenant_t_1 is a level-1 tenant, and tenant_I_1 is a level-2 tenant, and so on. FusionInsight Manager provides enterprises with a field-tested multi-tenant management model, enabling centralized tenant and service management.

Simplified Rights Management

In FusionInsight Manager, common users are shielded from internal rights management details and administrators' rights management operations are simplified, improving rights management usability and user experience.

  • FusionInsight Manager adopts the role-based access control (RBAC) mode to configure rights for users as required during multi-tenant management.
  • Administrator of tenants, the administrator has tenants' management rights, including viewing resources and services of the current tenant, adding or deleting sub-tenants of the current tenant, and managing rights of sub-tenants' resources. The administrator of a single tenant can be defined and the management over a tenant can be delegated to another user except the system administrator.
  • Roles corresponding to tenants, roles have all rights on the computing resources and storage resources of a tenant. During the creation of a tenant, the system automatically creates a corresponding role. You can add a user and bind the user to the tenant role so that it can use the resources of the tenant.
Clear Resource Management
  • Self-Service Resource Configuration

    In FusionInsight Manager, you can configure the computing resources and storage resources during the creation of a tenant and add, modify, or delete the resources of a tenant.

    Rights of the role that corresponds to the current tenant are updated automatically when you modify the computing resources and storage resources of a tenant.

  • Resource Usage Statistics

    Resource usage statistics is critical for administrators to make O&M decisions based on the status of cluster applications and services, improving the cluster O&M efficiency. The FusionInsight Manager displays the real-time resource statistics of tenant through the Resource Quota, including the dynamic computing resource VCores and Memory of tenant and the usage statistics of HDFS storage resources (Space).

    NOTE:

    When the tenant administrator is bound to a tenant role, the tenant administrator has the rights to manage the tenant and use all resources of the tenant.

  • Graphical Resource Monitoring

    The resource graphical monitoring supports the graphical display of monitoring items listed in Table 9-1, as shown in Figure 9-2

    Figure 9-2 Refined monitoring

    Real-time monitoring data is displayed by default. You can click to customize a time range. The default time ranges are as follows: 1 hour, 2hours, 6 hours, 12 hours, 1 day, 1 week, and 1 month. Choose > Customize to select the monitoring items to be displayed

    Table 9-1 Item

    Service

    Metric

    Description

    HDFS

    HDFS Tenant Quota Information

    • File and Dir Consumed
    • Quota

    HDFS can select a specified storage directory for monitoring. The storage directory is the same as the directory added by the current tenant in Resource.

    HDFS Tenant Space Information

    • Tenant Space Quota
    • Space Consumed

    YARN

    YARN Allocated Cores

    • Maximum Number of CPU Cores in an AM
    • Allocated Core
    • Number of Used CPU Cores in an AM

    Monitoring information of the current tenant can be displayed. If no subitem is configured for a tenant, this information is not displayed.

    The monitoring data is obtained from Scheduler > Application Queues > Queue: tenant name on the native WebUI of YARN.

    YARN Allocated Memory

    • Allocated Maximum AM Memory
    • Allocated Memory
    • Used AM Memory

Models Related to Multi-Tenant

Models Related to Multi-Tenant

Figure 9-3 shows the models related to multi-tenant.

Figure 9-3 Models related to multi-tenant

Table 9-2 describes the concepts involved in Figure 9-3.

Table 9-2 Concepts involved

Concept

Description

User

A natural person who has a name and password and uses the big data platform.

Figure 9-3 shows three different users: user A, user B, and user C.

Role

A role is a carrier of one or more rights. Rights are assigned to specific objects, for example, access rights for the /tenant directory in HDFS.

Figure 9-3 shows four roles: role t1, role t2, role t3, and role Manager_tenant.

  • Roles t1, t2, and t3 are automatically generated when tenants are created. The role names are the same as the tenant names. That is, roles t1, t2, and t3 map to tenants t1, t2, and t3. Role names and tenant names need to be used together.
  • Role Manager_tenant is the role of the cluster and cannot be used separately.

Tenant

A tenant is a resource set divided from a big data cluster. Multi-tenant refers to multiple tenants. The source sets further divided in a tenant are called sub-tenants.

Figure 9-3 shows three tenants: tenant t1, tenant t2, and tenant t3.

Resource

  • Computing resources include CPUs and memory.

    The computing resources of a tenant are divided from the total computing resources of the cluster. One tenant cannot occupy the computing resources of another tenant.

    In Figure 9-3, computing resources 1, 2, and 3 are divided from the cluster's computing resources by tenants t1, t2, and t3.

  • Storage resources include disks and third-party storage systems.

    The storage resources of a tenant are divided from the total storage resources of the cluster. One tenant cannot occupy the storage resources of another tenant.

    In Figure 9-3, storage resources 1, 2, and 3 are divided from the cluster's storage resources by tenants t1, t2, and t3.

If a user wants to use a tenant's resources or add or delete a sub-tenant from a tenant, the user needs to be bound to both the tenant role and role Manager_tenant. Table 9-3 shows the roles bound to each user in Table 9-3.

Table 9-3 Roles bound to each user

User

Role

Rights

User A

  • Role t1
  • Role t2
  • Role Manager_tenant
  • Uses the resources of tenants t1 and t2.
  • Adds or deletes sub-tenants for tenants t1 and t2.

User B

  • Role t3
  • Role Manager_tenant
  • Uses the resources of tenant t3.
  • Adds or deletes sub-tenants for tenant t3.

User C

  • Role t1
  • Role Manager_tenant
  • Uses the resources of tenant t1.
  • Adds or deletes sub-tenants for tenant t1.

One user can be bound to multiple roles, and one role can be bound to multiple users. Users are associated with tenants by binding themselves to the tenants. For this reason, tenants and users are in many-to-many relationship. One user can use the resources of multiple tenants, and multiple users can use the resources of a tenant. In Figure 9-3, user A uses the resources of tenants t1 and t2, and users A and C uses the resources of tenant t1.

Multi-Tenant Platform

Tenant is a core concept of the FusionInsight big data platform. It assists in transforming the big data platform from the user-centered platform to the multi-tenant-centered platform to better cope with the multi-tenant application environment of modern enterprises.

Figure 9-4 User-centered platform and multi-tenant-centered platform

On the user-centered big data platform, users can directly access and use all resources and services.

  • However, some cluster resources may not be used, lowering resource utilization.
  • The data of different users may be stored together, decreasing data security.

On the multi-tenant-centered big data platform, users use required resources and services by accessing the tenants.

  • Resources are allocated and scheduled based on application requirements and used based on tenants, increasing resource utilization.
  • Users can access the resources of tenants only after being assigned roles, enhancing access security.
  • The data of tenants is isolated, ensuring data security.

Resource Overview

The resources of the FusionInsight HD big data platform are divided into computing resources and storage resources. Multi-tenant enables resource isolation:

  • Computing Resource

    Computing resources include CPUs and memory. One tenant cannot occupy the computing resources of another tenant.

  • Storage Resource

    Storage resources include disks and third-party storage systems. One tenant cannot access the data of another tenant.

Computing Resource

Computing resources are divided into static service resources and dynamic resources.

  • Static service resources

    Static service resources are computing resources allocated to each service. The total volume of computing resources allocated to each service is fixed. Such services include FTP Server, Flume, HBase, HDFS, Solr, and Yarn.

  • Dynamic resources

    Dynamic resources are computing resources dynamically scheduled to a task queue by distributed resource management service Yarn. Yarn dynamically schedules resources for the task queues of MapReduce, Spark, Flink, and Hive.

NOTE:

The resources allocated to Yarn in a big data cluster are static service resources and can be dynamically allocated to task queues by Yarn.

Storage Resource

Storage resources are data storage resources that can be allocated by distributed file storage service HDFS. Directories are used as the basic unit of HDFS storage resource allocation. Tenants can obtain storage resources by specifying directories in the HDFS file system.

Dynamic Resources

Overview

Yarn provides the distributed resource management function for a big data cluster. The total volume of resources allocated to Yarn can be configured. Then Yarn allocates and schedules computing resources for task queues. The computing resources of MapReduce, Spark, and Hive task queues are allocated and scheduled by Yarn.

Yarn queues are basic units of computing resource allocation.

For tenants, the resources obtained using Yarn task queues are dynamic resources. Users can dynamically create and modify the quotas of task queues and view the status and statistics of task queues.

Resource Pool

Complex cluster environments and upper-layer requirements are facing enterprise IT systems. For example:

  • Heterogeneous cluster: The computing speed, storage capacity, and network performance of each node in the cluster are different. All the tasks of complex applications need to be properly allocated to each compute node in the cluster based on service requirements.
  • Computing isolation: Data must be shared among multiple departments but computing resources must be distributed onto different compute nodes.

Compute nodes must be partitioned.

Resource pools are used to specify the configuration of dynamic resources. Yarn task queues are associated with resource pools for resource allocation and scheduling.

Only one default resource pool can be set for a tenant. Users can bind to the role of a tenant to use the resources in the resource pool of the tenant. If resources in multiple resource pools need to be used, users can bind themselves to multiple tenant roles.

Scheduling Mechanism

Yarn dynamic resources support label based scheduling. This policy creates labels for compute nodes (Yarn NodeManager nodes) of Yarn clusters and adds the compute nodes with the same label into the same resource pool. Then Yarn dynamically associates the task queues with resource pools based on the resource requirements of the task queues.

For example, a cluster has more than 40 nodes. Labels Normal, HighCPU, HighMEM, and HighIO are created based on the hardware and network configurations of nodes and added four resource pools. Table 9-4 describes the performance of each node in the resource pool.

Table 9-4 Performance of each node in a resource pool

Label

Number of Nodes

Hardware and Network Configuration

Added To

Association

Normal

10

Minor

Resource pool A

Common task queue

HighCPU

10

High-performance CPU

Resource pool B

Computing-intensive task queue

HighMEM

10

Large memory

Resource pool C

Memory-intensive task queue

HighIO

10

High-performance network

Resource pool D

I/O-intensive task queue

Task queues can use the compute nodes in the associated resource pools only.

  • Common task queues are associated with resource pool A and use nodes with hardware and network configurations labeled with Normal.
  • Computing-intensive task queues are associated with resource pool B and use nodes with CPUs labeled with HighCPU.
  • Memory-intensive task queues are associated with resource pool C and use nodes with memory labeled with HighMEM.
  • I/O-intensive task queues are associated with resource pool C and use nodes with the network labeled with HighIO.

In FusionInsight HD, Yarn task queues are associated with specified resource pools to efficiently utilize resources in resource pools and ensure node performance.

FusionInsight Manager supports a maximum of add 50 resource pools. A Default resource pool is included in the system by default.

Introduction to Schedulers

Schedulers are divided into the open source Capacity scheduler and Huawei proprietary Superior scheduler.

The Capacity scheduler is an open source capacity regulator.

The Superior scheduler is an enhanced version and named after the Lake Superior, indicating that the scheduler can manage a large amount of data.

To meet enterprise requirements and tackle challenges facing the Yarn community in scheduling, Huawei develops the Superior scheduler. In addition to inheriting the advantages of the Capacity scheduler and Fair scheduler, this scheduler is enhanced in the following aspects:

  • Enhanced resource sharing policy

    The Superior scheduler supports queue hierarchy. It integrates the functions of open source schedulers and shares resources based on configurable policies. In terms of instances, administrators can use the Superior scheduler to configure an absolute value or a percentage policy for queue resources. The resource sharing policy of the Superior scheduler enhances the label scheduling policy of Yarn as a resource pool feature. Nodes in a Yarn cluster can be grouped based on the capacity or service type to ensure that queues can more efficiently utilize resources.

  • Tenant-based resource reservation policy

    Resources required by tenants must be ensured for running critical tasks. The Superior scheduler builds a resource reservation mechanism. With this mechanism, reserved resources can be allocated to tasks run by tenant queues in a timely manner to ensure proper task execution.

  • Fair sharing among tenants and resource pool users

    The Superior scheduler allows shared resources to be configured for users in a queue. Each tenant may have users with different weights. Heavily weighted users may require more shared resources.

  • Ensured scheduling performance in a big cluster

    The Superior scheduler receives heartbeats from each NodeManager and saves resource information in memory, which enables the scheduler to control cluster resource usage globally. The Superior scheduler uses the push scheduling model, which makes the scheduling more precise and efficient and remarkably improves cluster resource utilization. Additionally, the Superior scheduler delivers excellent performance when the interval between NodeManager heartbeats is long and prevents heartbeat storms in big clusters.

  • Priority policy

    If the minimum resource requirement of a service cannot be met after the service obtains all available resources, a preemption occurs. The preemption function is disabled by default.

Storage Resource

Overview

As a distributed file storage service in a big data cluster, HDFS stores all the user data of the upper-layer applications in the big data cluster, including the data written to HBase tables or Hive tables.

Directories are used as the basic unit of HDFS storage resource allocation. HDFS supports the conventional hierarchical file structure. Users can create directories and create, delete, move, or rename files in directories. Tenants can obtain storage resources by specifying directories in the HDFS file system.

Scheduling Mechanism

In FusionInsight HD, HDFS directories can be stored on nodes with specified labels or disks of specified hardware types. For example:

  • When both real-time query and data analysis tasks are running in one cluster, the real-time query tasks are deployed on some nodes; therefore, the queried data must be stored on these nodes.
  • Based on actual service requirements, key data needs to be stored on nodes with high reliability.

Administrators can flexibly configure HDFS data storage policies based on actual service requirements and data features to store data on specified nodes.

For tenants, storage resources indicate the HDFS resources occupied by them. They can implement storage resource scheduling by storing data of specified directories in storage paths configured by tenants to ensure data isolation between tenants.

Users can add or delete HDFS storage directories of tenants and set the file quantity quota and storage capacity quota of directories to manage storage resources.

Download
Updated: 2019-05-17

Document ID: EDOC1100074522

Views: 6083

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next