No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD 6.5.0 Administrator Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Configuring the Threshold

Configuring the Threshold

Scenarios

You can configure monitoring indicator thresholds to monitor the health status of indicators on FusionInsight Manager. If abnormal data occurs and the preset conditions are met, the system triggers an alarm and displays the alarm information on the alarm page.

Procedure

  1. Log in to FusionInsight Manager.
  2. Choose O&M > Alarm > Threshold Configuration.
  3. In the Threshold Configuration area, switch to the Host or Service tab page.
  4. Select specified monitoring indicators from the monitoring categories.

    Figure 7-3 Configuring indicator thresholds

    For example, after selecting Host Memory Usage, the information about this indicator threshold is displayed.
    • If the alarm sending switch is displayed as , an alarm is triggered if the alarm threshold is reached.
    • The alarm ID and alarm name contain the alarm information that is triggered by the threshold:
    • FusionInsight Manager checks whether the value of each monitored indicator reaches the threshold. If the number of consecutive check times is equal to the value of Trigger Count, and the threshold is not reached in these checks, the system sends an alarm. The value can be customized. Check Period (s) indicates the interval for the system to check monitoring indicators.
    • Rules for triggering an alarm.

  5. Click Create Rule to add rules used for monitoring indicators.

    Table 7-3 Monitoring indicator rule parameters

    Parameter

    Value

    Description

    Rule Name

    CPU_MAX (example value)

    Name of a rule.

    Severity

    • Critical
    • Major
    • Minor
    • Warning

    Alarm Severity

    • Critical
    • Major
    • Minor
    • Warning

    Threshold Type

    • Max value
    • Min value

    You can select the maximum or minimum value of an indicator. Setting this parameter to Max value, the system generates an alarm when the actual value of the indicator is greater than the threshold. Setting this parameter to Min value, the system generates an alarm when the actual value of the indicator is less than the threshold.

    Date

    • Daily
    • Weekly
    • Others

    This parameter is used to set the date when the rule takes effect.

    Add Date

    09-30

    This parameter is available only when Date is set to Others. You can set the date when the rule takes effect. Multiple options are available.

    Threshold Configuration

    Start and End Time: 00: 00 to 08:30

    This parameter is used to set the time range when the rule takes effect.

    Threshold: 10

    Specifies the threshold of the rule monitoring indicator.

    NOTE:

    For the last parameter in the table, you can click or to add or delete multiple start and end time or alarm indicator thresholds.

  6. Click OK to save the rules.
  7. Locate the row that contains an added rule, and click Apply in the Operation column. The value of Effective for this rule changes as Yes.

Monitoring Indicator Reference

FusionInsight Manager alarm monitoring indicators are categorized into node information indicators and cluster service indicators. Table 7-4 describes the indicators whose thresholds can be configured on nodes. For all the monitoring indicator list, see the Monitoring Indicator Description.

Table 7-4 Monitoring indicators on each node

Monitoring Indicator Group Name

Indicator Name

Description

Default Threshold

CPU

Host CPU Usage

This indicator reflects the computing and control capabilities of the current cluster in a measurement period. By observing the indicator value, you can better understand the overall resource usage of the cluster.

80%

Disk

Disk Usage

Indicates the disk usage of a host.

90%

Disk Inode Usage

Indicates the disk inode usage in a measurement period.

80%

Memory

Host Memory Usage

Indicates the average memory usage at the current time.

90%

Host Status

Host File Handle Usage

Indicates the usage of file handles of the host in a measurement period.

80%

Host PID Usage

Indicates the PID usage of a host.

90%

Network Status

TCP Ephemeral Port Usage

Indicates the usage of temporary TCP ports of the host in a measurement period.

80%

Network Reading

Read Packet Error Rate

Indicates the read packet error rate of the network interface on the host in a measurement period.

0.5%

Read Packet Dropped Rate

Indicates the read packet dropped rate of the network interface on the host in a measurement period.

0.5%

Read Throughput Rate

Indicates the average read throughput (at MAC layer) of the network interface in a measurement period.

80%

Network Writing

Write Packet Error Rate

Indicates the write packet error rate of the network interface on the host in a measurement period.

0.5%

Write Packet Dropped Rate

Indicates the write packet dropped rate of the network interface on the host in a measurement period.

0.5%

Write Throughput Rate

Indicates the average write throughput (at MAC layer) of the network interface in a measurement period.

80%

Process

Uninterruptible Sleep Process

Indicates the number of D state processes on the host in a measurement period.

0

omm Process Usage

Indicates the usage of the omm process within a measurement period.

90%

Table 7-5 Cluster service indicators

Service

Monitoring Indicator Group Name

Indicator Name

Description

Default Threshold

DBService

Database

Usage of the Number of Database Connections

Indicates the usage of the number of database connections.

90%

Elasticsearch

Garbage Collection

GC Time

Indicates the garbage collection time of the Elasticsearch instance process.

30000ms

Memory

Heap Memory Usage

Indicates the Elasticsearch heap memory usage.

95%

Replica Quantity Statistics

Bad Primary Shard Number

Down primary shards exist in the Elasticsearch instance.

0

Bad Replica Shard Number

Down replica shards exist in the Elasticsearch instance.

1

Flume

Agent

Heap Memory Usage Calculate

Indicates the Flume heap memory usage.

95%

Flume Direct Memory Usage Statistics

Indicates the Flume direct memory usage.

80%

Flume Non-heap Memory Usage

Indicates the Flume non-heap memory usage.

80%

Total GC duration of Flume process

Indicates the Flume total GC time.

12000 ms

FTP-Server

Process

Heap Memory Usage Calculate

Indicates the FTP-Server heap memory usage.

95%

FTP-Server Direct Buffer Usage Statistics

Indicates the FTP-Server direct memory usage.

80%

Non Heap Memory Usage Calculate

Indicates the FTP-Server non-heap memory usage.

80%

Total GC duration of FTP-Server process

Indicates the total GC time of FTP-Server.

12000 ms

GraphBase

Request

Real time request

Total number of real-time GraphServer requests.

2000

Real time request

Number of real-time requests of a single GraphServer node.

500

HBase

GC

GC time for old generation

Indicates he total GC time of RegionServer.

5000 ms

GC time for old generation

Indicates he total GC time of HMaster.

5000 ms

CPU and Memory

RegionServer Direct Memory Usage Statistics

Indicates theRegionServerReg direct memory usage.

90%

RegionServer Heap Memory Usage Statistics

Indicates the RegionServer heap memory usage.

90%

HMaster Direct Memory Usage

Indicates the HMaster direct memory usage.

90%

HMaster Heap Memory Usage Statistics

Indicates the HMaster heap memory usage.

90%

Service

Regions

Indicates the number of regions of a RegionServer.

2000

Replication

Replication sync failed times

Indicates the number of times that DR data fails to be synchronized.

1

HDFS

File and Block

Lost Blocks

Number of missing copy blocks in the HDFS file system.

0

RPC

Average Time of Active NameNode RPC Processing

Indicates the average RPC processing time.

100 ms

Average Time of Active NameNode RPC Queuing

Indicates the average RPC queuing time.

200 ms

Disk

Disk Usage

Indicates the HDFS disk usage.

80%

Percentage of DataNode Capacity

Indicates the disk usage of DataNodes in the HDFS.

80%

Faulty Volumes

Abnormal disks on DataNode.

0

Percentage of Reserved Space for Replicas of Unused Space

Indicates the percentage of the reserved disk space of all the copies to the total unused disk space of DataNodes.

90%

Resource

Faulty DataNodes

Indicates the number of faulty DataNodes.

3

NameNode Non Heap Memory Usage Statistics

Indicates the percentage of NameNode non-heap memory usage.

90%

NameNode Direct Memory Usage Statistics

Indicates the percentage of direct memory used by NameNodes.

90%

NameNode Heap Memory Usage Statistics

Indicates the percentage of NameNode non-heap memory usage.

95%

DataNode Non Heap Memory Usage Statistics

Indicates the percentage of DataNode non-heap memory usage.

90%

DataNode Direct Memory Usage Statistics

Indicates the percentage of direct memory used by DataNodes.

90%

DataNode Heap Memory Usage Statistics

Indicates the percentage of DataNode non-heap memory usage.

95%

Garbage Collection

GC Time

Indicates the Garbage collection (GC) duration of NameNodes per minute.

12000 ms

GC Time

Indicates the GC duration of DataNodes per minute.

12000 ms

Hive

HQL

Percentage of HQL Statements That Are Executed Successfully by Hive

Indicates the percentage of HQL statements that are executed successfully by Hive.

90%

Background

Background Thread Usage

Indicates the percentage of Background thread usage.

90%

GC

Total GC Time in Milliseconds

Indicates the total GC time of MetaStore.

12000ms

Total GC Time in Milliseconds

Indicates the total GC time of HiveServer.

12000ms

Capacity

Percentage of HDFS Space Used by Hive to the Available Space

Indicates the percentage of HDFS space used by Hive to the available space.

85%

CPU and Memory

MetaStore Direct Memory Usage Statistics

Indicates the MetaStore direct memory usage.

95%

MetaStore Non-Heap Memory Usage Statistics

Indicates the MetaStore non-heap memory usage.

95%

MetaStore Heap Memory Usage Statistics

Indicates the MetaStore heap memory usage.

95%

HiveServer Direct Memory Usage Statistics

Indicates the HiveServer direct memory usage.

95%

HiveServer Non-Heap Memory Usage Statistics

Indicates the HiveServer non-heap memory usage.

95%

HiveServer Heap Memory Usage Statistics

Indicates the HiveServer heap memory usage.

95%

Session

Percentage of Sessions Connected to the HiveServer to Maximum Number of Sessions Allowed by the HiveServer

Indicates the percentage of the number of sessions connected to the HiveServer to the maximum number of sessions allowed by the HiveServer.

90%

Kafka

Partition

Percentage of Partitions That Are Not Completely Synchronized

Indicates the percentage of partitions that are not completely synchronized to total partitions.

50%

Other

Unavailable Partition Percentage

Disk usage of the disk where the Broker data directory is located.

40%

Disk

Broker Disk Usage

Indicates the disk usage of the disk where the Broker data directory is located.

80%

Process

Broker GC Duration per Minute

Indicates the GC duration of the Broker process per minute.

12000 ms

Heap Memory Usage of Kafka

Indicates the Kafka heap memory usage.

95%

Kafka Direct Memory Usage

Indicates the Kafka direct memory usage.

95%

Loader

Memory

Heap Memory Usage Calculate

Indicates the Loader heap memory usage.

95%

Loader Direct Memory Usage Statistics

Indicates the Loader direct memory usage.

80%

Non heap Memory Usage Calculate

Indicates the Loader non-heap memory usage.

80%

GC

Total GC time in milliseconds

Indicates the total GC time of Loader.

12000 ms

MapReduce

Garbage Collection

GC Time

Indicates the GC time.

12000 ms

Resource

JobHistoryServer Direct Memory Usage Statistics

Indicates the JobHistoryServer direct memory usage.

90%

JobHistoryServer Non Heap Memory Usage Statistics

Indicates the JobHistoryServer non-heap memory usage.

90%

JobHistoryServer Heap Memory Usage Statistics

Indicates the JobHistoryServer non-heap memory usage.

95%

Metadata

Other

Heap Memory Usage Calculate

Indicates the Metadata heap memory usage.

95%

MetadataDirect Memory Usage Statistics

Indicates the metadata direct memory usage.

80%

Non Heap Memory Usage Calculate

Indicates the metadata non-heap memory usage.

80%

Total GC time in milliseconds

Indicates the metadata total GC time.

12000 ms

Oozie

Memory

Heap Memory Usage Calculate

Indicates the Oozie heap memory usage.

95%

Oozie Direct Buffer Resource Percentage

Indicates the Oozie direct memory usage.

80%

Non Heap Memory Usage Calculate

Indicates the Oozie non-heap memory usage.

80%

GC

Total GC duration of Oozie process

Indicates the Oozie total GC time.

12000 ms

Solr

Replica Quantity Statistics

Bad Replica Number

Indicates the number of bad replicas.

1

Garbage Collection

GC Time

Indicates the GC time of Solr instances.

12000 ms

Memory

Heap Memory Usage

Indicates the heap memory usage.

95%

Spark

Memory

JDBCServer Heap Memory Usage Statistics

Indicates the JDBCServer heap memory usage.

95%

JDBCServer Direct Memory Usage Statistics

Indicates the JDBCServer direct memory usage.

95%

JDBCServer Non-Heap Memory Usage Statistics

Indicates the JDBCServer non-heap memory usage.

95%

JobHistory Direct Memory Usage Statistics

Indicates the JobHistory direct memory usage.

95%

JobHistory Non-Heap Memory Usage Statistics

Indicates the JobHistory non-heap memory usage.

95%

JobHistory Heap Memory Usage Statistics

Indicates the JobHistory heap memory usage.

95%

GC number

Full GC Number of JDBCServer

Indicates the total GC number of JDBCServer.

12

Full GC Number of JobHistory

Indicates the total GC number of JobHistory.

12

GC Time

Total GC time in milliseconds

Indicates the total GC time of JDBCServer.

12000 ms

Total GC time in milliseconds

Indicates the total GC time of JobHistory.

12000 ms

Spark2x

Memory

JDBCServer2x Heap Memory Usage Statistics

Indicates the JDBCServer2x heap memory usage.

95%

JDBCServer2x Direct Memory Usage Statistics

Indicates the JDBCServer2x direct memory usage.

95%

JDBCServer2x Non-Heap Memory Usage Statistics

Indicates the JDBCServer2x non-heap memory usage.

95%

JobHistory2x Direct Memory Usage Statistics

Indicates the JobHistory2x direct memory usage.

95%

JobHistory2x Non-Heap Memory Usage Statistics

Indicates the JobHistory2x non-heap memory usage.

95%

JobHistory2x Heap Memory Usage Statistics

Indicates the JobHistory2x heap memory usage.

95%

GC number

Full GC Number of JDBCServer2x

Indicates the total GC number of JDBCServer2x.

12

Full GC Number of JobHistory2x

Indicates the total GC number of JobHistory2x.

12

GC Time

Total GC time in milliseconds

Indicates the total GC time of JDBCServer2x.

12000 ms

Total GC time in milliseconds

Indicates the total GC time of JobHistory2x.

12000 ms

Storm

Cluster

Number of Available Supervisors

Indicates the number of available Supervisor processes in the cluster in a measurement period.

1

Slot Usage

Indicates the slot usage in the cluster in a measurement period.

80%

Nimbus

Heap Memory Usage Calculate

Indicates the Nimbus heap memory usage.

80%

Yarn

Application

Timed out Applications

Indicates the number of timeout applications.

5

Resources

NodeManager Direct Memory Usage Statistics

Indicates the percentage of direct memory used by NodeManagers.

90%

NodeManager Heap Memory Usage Statistics

Indicates the percentage of NodeManager heap memory usage.

95%

NodeManager Non Heap Memory Usage Statistics

Indicates the percentage of NodeManager non-heap memory usage.

90%

ResourceManager Direct Memory Usage Statistics

Indicates the Kafka direct memory usage.

90%

ResourceManager Heap Memory Usage Statistics

Indicates the ResourceManager heap memory usage.

95%

ResourceManager Non Heap Memory Usage Statistics

Indicates the ResourceManager non-heap memory usage.

90%

Garbage collection

GC Time

Indicates the GC duration of NodeManager per minute.

12000 ms

GC Time

Indicates the GC duration of ResourceManager per minute.

12000 ms

ZooKeeper

Connection

ZooKeeper Connections Usage

Indicates the percentage of the used connections to the total connections of ZooKeeper.

80%

Memory

Heap Memory Usage Calculate

Indicates the ZooKeeper direct memory usage.

95%

Directmemory Usage Calculate

Indicates the ZooKeeper heap memory usage.

80%

GC

ZooKeeper GC Duration per Minute

Indicates the GC time of ZooKeeper every minute.

12000 ms

Download
Updated: 2019-05-17

Document ID: EDOC1100074522

Views: 6007

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next