No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


FusionInsight HD 6.5.0 Product Description 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).


Task Priority Scheduling

In the native YARN resource scheduling mechanism, if the whole Hadoop cluster resources are occupied by those MapReduce jobs submitted earlier, jobs submitted later will be kept in pending state until all running jobs are executed and resources are released.

Huawei provides a mechanism for scheduling tasks by priority. With this feature, you can define jobs of different priorities. Jobs of high priority can preempt resources of jobs of low priority though they are submitted later. Jobs of low priority will be suspended and cannot start unless those jobs of high priority are completed and resources are released.

This feature enables services to flexibly control their own computing tasks, achieving optimal utilization of cluster resources.


The Container Reuse feature is in conflict with the Task Priority Scheduling feature. If Container Reuse is enabled, resources will not be released. In other words, the Task Priority Scheduling feature does not work.

Figure 4-38 Task priority scheduling

The Timeout Parameter Can Be Set When a User Submits a MapReduce Job

Open-source function: If a MapReduce job is executed for a long time, it will be suspended. Users have to wait, but cannot determine the cause.

Therefore, a timeout parameter application.timeout.interval is added for setting the MapReduce job execution timeout. The unit of the timeout parameter is second. If the job execution time expires, then the job will be stopped.

yarn jar <App_Jar_Name> [Main_Class] -Dapplication.timeout.interval=<timeout>


The value must be an integer. If the value is not configured (null), then the execution timeout function will not be executed. If the configured value is invalid (other than integer value), then by default the value is set to 5 minutes.

Permission Control

The permission control mechanism of Hadoop YARN is achieved by access control lists (ACLs), which grants different permission to different users. ACLs are mainly categorized as follows:

  • Admin ACL

    Specify the O&M managers on YARN clusters. The O&M manager ACL is specified by the yarn.admin.acl parameter. The O&M manager can access the ResourceManager WebUI and operate NodeManager nodes, queues, and node labels, but cannot submit tasks.

  • Queue ACL

    Specify the queue to which users/users groups are assigned. Each queue is granted the permission to submit applications and manage applications (such as killing an application).

Permission control of the open-source YARN:

Users in YARN are categorized into the following roles:

  • Cluster O&M manager
  • Queue administrator
  • Common user

However, the APIs (such as the Web UI, REST API, and Java API) provided by an open-source YARN do not support role-specific permission control. Therefore, all users have the permission to access the application and cluster information, which does not meet the isolation requirements in the multi-tenant scenario.

Enhanced permission control of the YARN provided by Huawei:

In security mode, the APIs of Huawei-provided YARN provide enhanced permission control. That is, the APIs support role-specific permission control.

The permission of different roles is as follows:

  • Cluster O&M manager: performs O&M in the YARN cluster, such as accessing the ResourceManager WebUI, refreshing queues, setting node labels, and performing active/standby switchover.
  • Queue administrator: views and modifies the respective queues in YARN clusters.
  • Common user: views and modifies the applications submitted by oneself in YARN clusters.

Superior Scheduler

Superior Scheduler is a high-performance scheduling engine designed for Hadoop YARN distributed resource management system. Superior Scheduler addresses multi-tenant service requirements on converged resource pool for enterprise customers.

Superior Scheduler achieves all functions of open source schedulers, Fair Scheduler, and Capacity Scheduler. Compared with the open source schedulers, Superior Scheduler is enhanced in the enterprise multi-tenant resource scheduling policy, resource isolation and sharing by multiple users in a tenant, scheduling performance, system resource usage and cluster scalability. Superior Scheduler is designed to replace open source schedulers.

Similar to open source Fair Scheduler and Capacity Scheduler, Superior Scheduler follows the YARN scheduler plugin interface to interact with YARN resource manager component to offer resource scheduling functionalities. Figure 4-39 shows components of Superior Scheduler.

Figure 4-39 Components of Superior Scheduler

Main components of Superior Scheduler are as follows:

  • Superior Scheduler Engine: High Performance scheduler engine with rich scheduling policies
  • Superior Scheduler Plugin: Established connection between YARN Resource Manager and Superior Scheduler Engine and interacts with YARN Resource Manager.

    The scheduling principle of open source schedulers is that resources match jobs based on the heartbeats of computing nodes. Specifically, each computing node periodically sends heartbeat messages to Resource Manager of YARN to notify the node status and starts the scheduler to assign jobs to the node itself. In this scheduling mechanism, the scheduling period depends on the heartbeat. If the cluster scale increases, bottleneck on system scalability and scheduling performance may occur. In addition, because resources match jobs, the scheduling accuracy of an open source scheduler is limited. For example, data affinity is random and the system does not support load-based scheduling policies. The scheduler may not make the best choice due to lack of global resource view when selecting jobs.

    Superior Scheduler adopts multiple scheduling mechanisms. There are dedicated scheduling threads in Superior Scheduler, separating heartbeats with scheduling and preventing system heartbeat storms. Superior Scheduler matches jobs with resources, providing each scheduled job with a global resource view and increasing the scheduling accuracy. Compared with the open-source scheduler, Superior Scheduler excels aspects such as system throughput, resource usage and data affinity.

Figure 4-40 Comparison of Superior Scheduler with open source schedulers

Apart from the enhanced system throughput and utilization, Superior Scheduler provides following major scheduling features:

  • Multiple Resource Pools

    Pools help in logically dividing the cluster resources and sharing them among multiple tenant(s) /Queue(s). The division of resource pools supports heterogeneous resources. Resource pools can be divided exactly according to requirements on the application resource isolation. Further policies for different queues for a pool can be configured.

  • Multi-Tenant Scheduling (reserve, min, share, max) per Resource Pool

    Superior Scheduler provides flexible hierarchical multi-tenant scheduling policies. Different policies can be configured for different tenants or queues that can access different resource pools. Table 4-4 lists supported policies.

    Table 4-4 Policy description

    Policy Name



    Reserved resource for tenant. Even though tenant has no workload, other tenant cannot use the reserved resource. The value can be percentage or absolute value. If both the percentage and absolute value are configured, the percentage is automatically calculated into an absolute value, and the greater absolute value is used. The default value is 0. Compared with the method of specifying a dedicated resource pool and dedicated machines, the reserve policy provides a flexible floating reservation function. In addition, because no specific hosts are specified, the data affinity for calculation is improved and the impact by the faulty hosts is avoided.


    Minimum guaranteed resource with preemption support. Other tenant can use this portion of resource and subject to preemption.

    This current tenant has the priority to minimum guaranteed resources. The value can be percentage or absolute. If both the percentage and absolute value are configured, the percentage is automatically calculated into an absolute value, and the greater absolute value is used. The default value is 0.


    Share resource without preemption. Tenant needs to wait for other tenant to finish workload and release resource. The value can be percentage or absolute.


    Maximum number of resource allowed. Tenant cannot get more than what maximum resource is allowed. If both the percentage and absolute value are configured, the percentage is automatically calculated into an absolute value, and the greater absolute value is used. By default value, there is no restriction on resources.

    Figure 4-41 shows principle of resource scheduling.

    Figure 4-41 Principle of resource scheduling

    total in the figure indicates the total resources.

    Compared with open-source scheduler, Superior Scheduler supports both percentage on tenants and absolute value for allocating resources, flexibly addressing resource scheduling requirements of enterprise-level tenants. For example, resources can be allocated according to the absolute value of level-1 tenants, avoiding impact caused by changes of cluster scale. However, resources can be allocated according to the percentage of lower-layer sub tenants, improving resource usage.

  • Heterogeneous and Multi-dimensional Resource Scheduling

    Superior Scheduler supports following functions:

    • Node Labels can be used to identify multi dimensional attributes of a node like GPU_ENABLED and SSD_ENBALED. And scheduling can be done based on these labels
    • Pools can be used to group resources of the same kind and assigned to specific tenant(s)/queue(s)
  • User-based Fair Share Scheduling

    In a leaf tenant, multiple users can use the same queue to submit jobs. Compared with the open-source schedulers, Superior Scheduler supports configuring flexible resource sharing policy among different users in a same tenant. For example, higher resource access weight is allowed to be configured for VIP users.

  • Data Aware Scheduling

    Superior adopts Workload to Node Placement. For example, a given Workload is attempted to be scheduled among the available nodes such that the selected nodes suit the best for the given workload. By doing this Scheduler will have a holistic view of the cluster and the data. In this approach if there is an opportunity to place the task nearer to the data then it guarantees the locality. Whereas YARN open source schedulers do "Node to workload placement" where in a given node is tried to match with a suitable workload.

  • Container Scheduling Dynamic Resource Reservation

    In a heterogeneous and diversified computing environment, some containers need more resources or multiple resources. For example, Spark job may require large memory. When such containers compete with containers requiring fewer resources, containers requiring more resources may not obtain sufficient resources within a reasonable period. Open resource scheduler match resources with jobs, which may cause unreasonable resource reservation. This mechanism leads to the waste of overall system resources. Superior Scheduler differs from open source schedulers in following aspects:

    • Demand based match making: As Superior Scheduler does Workload to Node PlacemenWorkload to Node Placement scheduling; it will be able to select a suitable node to reserve to reduce startup time of these containers and avoid waste.
    • Rebalancing among tenants: Open source schedulers do not honor sharing policy defined by system when enabling reservation logic. Superior Scheduler takes a different approach. In every scheduling cycle, superior scheduler will traverse tenants and try to re-balance based on multi-tenant policies and tries to honor all (reserve, min, share, etc.), so that reservation can be freed up to flow available resource to other deserved containers under different tenants.
  • Dynamic Queue Status Control (Open/Closed/Active/InActive)

    Supports multiple queue states which helps Admin operate and maintain multiple Tenants.

    • Open State (Open/Closed): Will accept applications to be submitted to this queue if Open(default) and if closed no applications will be accepted.
    • ActiveState (Active/Inactive): Applications in the (default) active state can be scheduled and assigned resources to. Applications in the inactive state cannot be scheduled.
  • Application Pending Reason

    Cause of job waiting is provided if the application is not yet launched.

Table 4-5 shows the comparison result of Superior Scheduler and YARN open source schedulers.

Table 4-5 Competitive analysis


YARN Open Source Schedulers

Superior Scheduler

Multi-tenant scheduling

In homogeneous clusters, only one of Capacity Scheduler or Fair Scheduler can be selected and FusionInsight cluster does not support Fair Scheduler. Capacity Scheduler only allows configuring the percentage for scheduling and Fair Scheduler supports only allows configuring the absolute value.

  • Support heterogeneous cluster and multiple resource pools.
  • Support reserve to guarantee immediate resource access.

Optimized Placement

The scheduling policy pointing from nodes to jobs reduces the success rate of data localized scheduling and potentially affects application execution performance.

Workload to node placement can generate optimal result. Data locality scheduling has more accurate data awareness. The success rate of data localized scheduling is higher.

Balancing based on load of hosts

Not available

Balancing based on load of hosts and resource allocation is supported.

User-based Fair share

Not available.

Support flat user-based fair-share, default and others.

Pending Reason

Not available.

Pending reason information shows why workload is pending.

In conclusion, Superior Scheduler is a high-performance scheduler with various scheduling policies and is better than Capacity Scheduler in terms of functionality, performance, resource usage and network scalability.

Supporting Strict CPU Isolation Enforcement in YARN

YARN is unable to control every container's CPU usage strictly. When using CPU sub-systems, containers may use more resources than required. Therefore, CUPset, but not CPU, is used to control resource allocation.

To solve this problem, CPU will be allocated to containers strictly based on the vcores to physical core ratio. If a container requires an entire physical core, then it is allocated with the entire physical core. If some containers only need part of a physical core, these containers may share a physical core. Figure 4-42 shows an example of CPU allocation with the vcore to physical core ratio being 2:1.

Figure 4-42 CPU allocation

ResourceManager Restart Performance Optimizations

During recovering, Resource Manager obtains running applications.

If there are a large number of completed applications in the Statestore of Resource Manager, the recovery may take a long time, causing time-consuming ResourceManager startup, HA switch, or ResourceManager restarts. To solve the problem, obtain the list of uncompleted applications before starting ResourceManager and recover completed applications in a separate asynchronous thread in background.

Figure 4-43 shows the Resource Manager startup recovery flow.

Figure 4-43 ResourceManager startup recovery flow

Updated: 2019-05-17

Document ID: EDOC1100074548

Views: 3179

Downloads: 36

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Previous Next