Principles
Alarm and Event Management
Alarms are generated when iMaster NCE-Campus has potential fault risks, or devices, services, or systems managed by iMaster NCE-Campus or its connections to peripheral systems fail or have potential risks. You can use the alarm management function to monitor alarm information in real time and troubleshoot faults based on alarm information and handling suggestions in a timely manner, ensuring normal operation of services.
iMaster NCE-Campus supports the setting of device alarm thresholds, such as the CPU and memory usage thresholds. When a threshold is reached, an alarm is reported to iMaster NCE-Campus.
Alarm and Event
If the system or managed objects (MOs) detect an exception or a significant status change, an alarm or event will be displayed on the alarm management page. Table 4-1 describes the definitions of alarms and events.
Type |
Description |
Differences |
Similarities |
---|---|---|---|
Alarm |
A notification generated when the system or an MO is faulty. |
|
Alarms and events are presented as notifications. |
Event |
Indicates a notification of status changes generated when the system or an MO is running properly. |
Alarm Category
Alarms managed by iMaster NCE-Campus are classified into controller alarms and device alarms based on the alarm source. Table 4-2 describes the alarm categories by alarm sources.
Alarm Category |
Description |
Administrator |
---|---|---|
Controller alarm |
Alarms generated by iMaster NCE-Campus, including:
|
System administrator |
Device alarm |
Alarms and events reported by devices to iMaster NCE-Campus. |
Tenant administrator |
Alarms managed by iMaster NCE-Campus are classified into current alarms, historical alarms, masked alarms, and events by processing status, as described in Table 4-3.
Alarm Category |
Description |
---|---|
Current alarm |
Include uncleared and unacknowledged alarms, acknowledged but uncleared alarms, and unacknowledged but cleared alarms. NOTE:
Some alarms cannot be automatically cleared and need to be manually cleared on the alarm page. |
Historical alarm |
Include alarms that have been cleared and acknowledged. |
Masked alarm |
Alarms that do not need to be handled can be masked. Masked alarms are displayed in the Masked alarms list. If such alarms are generated later, they will not be displayed in the Current alarm list. |
Event |
An event alarm has the lowest severity and indicates that an event occurs. Events do not need to be processed. |
Figure 4-1 shows the relationship between the alarm processing status and processing operation.
Alarm Severity
The alarm severity indicates the importance and urgency of a fault. It helps O&M personnel quickly identify the importance of an alarm and take corresponding handling policies. You can also change the severity of an alarm as required.
Table 4-4 lists the alarm severities.
Alarm Severity |
Color |
Description |
Handling Policy |
---|---|---|---|
Critical |
Services are affected. Corrective measures must be taken immediately. |
Rectify the fault immediately. Otherwise, services may be interrupted or the system may break down. |
|
Major |
Services are affected. If the fault is not rectified in a timely manner, serious consequences may occur. |
Rectify the fault in a timely manner. Otherwise, key services will be affected. |
|
Minor |
There is a minor impact on services. Problems of this severity may result in serious faults, and therefore corrective actions are required. |
Find out the cause of the alarm and rectify the fault. |
|
Warning |
A potential or imminent fault that affects services is detected, but services are not affected currently. |
Handle warning alarms based on the network and NE running status. |
Alarm Status
Table 4-5 lists the alarm statuses.
Status Name |
Alarm Status |
Description |
---|---|---|
Acknowledgement status |
Acknowledged and unacknowledged |
The initial acknowledgment status is Unacknowledged. A user who views an unacknowledged alarm and plans to handle it can acknowledge the alarm. When an alarm is acknowledged, its status changes to Acknowledged. Acknowledged alarms can be unacknowledged. When an alarm is unacknowledged, its status is restored to Unacknowledged. You can also configure auto acknowledgment rules to automatically acknowledge alarms. |
Clearance status |
Cleared and uncleared |
The initial clearance status is Uncleared. When a fault that causes an alarm is rectified, a clearance notification is automatically reported to Alarm Management and the clearance status changes to Cleared. For some alarms, clearance notifications cannot be automatically reported. You need to manually clear these alarms after corresponding faults are rectified. The background color of cleared alarms is green. |
Alarm and Event Types
Setting alarm and event types facilitate users to query, analyze, and process alarms and events. You can select types when filtering alarms and events.
Table 4-6 describes the types of alarms and events.
Type |
Description |
---|---|
Communications alarm |
Alarms caused by failures of the communications in an NE, between NEs, between an NE and a management system, or between management systems. Example: device communication interruption alarm. |
Service quality alarm |
Alarms caused by service quality deterioration. Example: device congestion alarm. |
Processing error alarm |
Alarms caused by software or processing errors. Example: version mismatch alarm |
Equipment alarm |
Alarms caused by physical resource faults. Example: board fault alarm |
Environmental alarm |
Alarms generated when the environment where the device resides is faulty. For example, temperature alarm generated when the hardware temperature is too high. |
Integrity alarm |
Alarms generated when requested operations are denied. Example: alarms caused by unauthorized modification, addition, and deletion of user information |
Operation alarm |
Alarms generated when the required services cannot run properly due to problems such as service unavailability, faults, or incorrect invocation. For example, alarms caused by service rejection, service exit, and procedural errors. |
Physical resource alarm |
Alarms generated when physical resources are damaged. Example: alarms caused by cable damage and intrusion into an equipment room. |
Security alarm |
Alarms generated when security issues are detected by a security service or mechanism. Example: alarms caused by authentication failures and unauthorized accesses. |
Time domain alarm |
Alarms generated when an event occurs at improper time. Example: alarms caused by information delay, invalid key, or resource access at unauthorized time. |
Property change |
Events generated when MO attributes change. Example: events caused by addition, reduction, and change of attributes. |
Object creation |
Events generated when an MO instance is created. |
Object delete |
Events generated when an MO instance is deleted. |
Relationship change |
Events generated when MO relationship attributes change. |
State change |
Events generated when MO status attributes change. |
Route change |
Events generated when routes change. |
Protection switching |
Alarms or events caused by the switchover. |
Over limit |
Alarms or events reported when a performance metric reaches the threshold. |
File transfer status |
Alarms or events reported when the file transfer succeeds or fails. |
Backup status |
Events generated when MO backup status changes. |
Heart beat |
Events generated when heartbeat notifications are sent. |
Current Alarms and Historical Alarms
Table 4-7 describes current alarms and historical alarms.
Name |
Description |
---|---|
Current alarms |
Current alarms include uncleared and unacknowledged alarms, acknowledged and uncleared alarms, and unacknowledged and cleared alarms. When monitoring current alarms, you can identify faults in time, operate accordingly, and notify O&M personnel of these faults. |
Historical alarms |
Acknowledged and cleared alarms are historical alarms. You can analyze historical alarms to optimize system performance. |
Internal Alarm Handling Process
The internal alarm handling process of alarm management involves operations such as alarm masking, correlation analysis, and severity redefinition.
Figure 4-2 shows the internal alarm handling process.
Table 4-8 describes the internal alarm handling process.
Operation |
Description |
---|---|
Name redefinition |
After receiving an alarm, alarm management changes the names of the alarms that meet the name redefinition rules. |
Alarm masking |
Alarm management discards the alarms that meet the masking rules, that is, the alarms are not archived to the database, or records the alarms in the masked alarm data table. |
Intermittent or toggling (pre-processing) |
Alarm management records the alarms that meet the intermittent/toggling handling rules in the intermittent or toggling data table. |
Alarm update |
Alarm management updates the information of current alarms, such as clearing alarms and changing the severities, based on the reported alarm changes. |
Severity and type redefinition |
Alarm management redefines the alarms that meet the severity and type redefinition rules. |
Correlation analysis |
Alarm management marks the alarms that meet the correlation rules as root alarms, and handles the root alarms or correlative alarms based on the actions in the rules. |
Automatic acknowledgement |
Alarm management automatically acknowledges the alarms that meet the auto acknowledgment rules. The alarms that are automatically acknowledged are recorded the historical alarm data table. |
Archiving alarms to the database |
Alarm management archives the remaining alarms to the database. Post-processing is not performed on the alarms that are masked or moved to historical alarms during alarm pre-processing. The information on the alarms is updated in real time. |
Intermittent or toggling (post-processing) |
Alarm management analyzes the alarms in the intermittent/toggling data table and handles the alarms that meet the intermittent or toggling policies. |
Alarm merging |
Alarm management merges the alarms that meet the merging conditions. |
Real-time notification |
Alarm management updates the alarm information on the alarm interface in real time. |
Alarm Dump
To prevent excessive alarm data in the alarm database, iMaster NCE-Campus dumps historical alarms and events according to the configured conditions. If the remote SFTP function is enabled, iMaster NCE-Campus dumps data to the remote SFTP server.
- iMaster NCE-Campus performs dump detection every 4 hours.
- Data dump is triggered when the data storage duration exceeds the configured time.
- If the remote SFTP function is enabled, data is packed, compressed, and then uploaded to the remote SFTP server when the data size reaches 5 MB.
- A data dump is terminated when all expired data has been dumped.
- Only historical alarms and events are dumped.
Alarm Dump File Format
- Historical alarm dump file: 192.168.3.4_AcHistoryAlarmEntity_2017_11_10_11_26_20.zip
- Historical event dump file: 192.168.3.4_AcEventEntity_2017_11_10_11_26_20.zip
Log Management
During the running of iMaster NCE-Campus, it can record system management operation logs and run logs, which facilitate audit and fault locating.
Log Types
Table 4-9, Table 4-10, and Table 4-11 list the types of logs recorded by iMaster NCE-Campus.
Log Type |
Description |
---|---|
Operation log |
Records all add, delete, and modify operations triggered by users or iMaster NCE-Campus to facilitate audit. |
Security log |
Records operations related to user accounts, such as login to and logout of iMaster NCE-Campus as well as password change, to facilitate audit. |
Run log |
Records status information generated during the running of iMaster NCE-Campus and execution of tasks. Run logs can be used to diagnose iMaster NCE-Campus faults. |
Registration log |
Records the registration logs of devices that are not managed by iMaster NCE-Campus, including the first registration time, last registration time, and number of registration times. |
Log Type |
Description |
---|---|
Operation logs |
Records all add, delete, and modify operations triggered by users or iMaster NCE-Campus to facilitate audit. |
Security logs |
Records operations related to user accounts, such as login to and logout of iMaster NCE-Campus as well as password change, to facilitate audit. |
Log Type |
Description |
---|---|
Operation log |
Records all add, delete, and modify operations triggered by users or iMaster NCE-Campus to facilitate audit. |
Security log |
Records operations related to user accounts, such as login to and logout of iMaster NCE-Campus as well as password change, to facilitate audit. |
Portal Online and Offline log |
Records portal user online and offline information to help tenant administrators manage and maintain devices. |
Device login and logout logs |
Records device login and logout information to help tenant administrators manage and maintain devices. |
RADIUS Online and Offline log |
Records RADIUS user online and offline information to help tenant administrators manage and maintain devices. |
HWTACACS Online and Offline log |
Records HWTACACS user online and offline information to help tenant administrators manage and maintain devices. |
Log Levels
A log level identifies the severity of a log message. Table 4-12 lists the log levels.
Level |
Definition |
Description |
---|---|---|
0 |
Emergency |
Emergency condition. |
1 |
Alert |
Errors needing immediate actions. |
2 |
Critical |
Critical condition. |
3 |
Error |
Errors needing attention but not critical. |
4 |
Warning |
Warning condition, meaning errors are likely to occur. |
5 |
Notice |
Information requiring your attention. |
6 |
Informational |
Generic prompt information. |
7 |
Debugging |
Debugging information. |
Log Dump
Dumping of Security, Operation, and Run Logs
All logs of iMaster NCE-Campus are stored in the database. To prevent the database performance from being affected due to excessive logs, the system checks the number of historical logs every 4 hours. If the number of logs or the log storage duration exceeds the corresponding threshold, the system temporarily saves the earliest data records to the local disk on the primary iMaster NCE-Campus node to ensure that the remaining data amount in the database is lower than the threshold and all the remaining logs are not expired. If the size of dump files in the local disk reaches the local storage capacity or the maximum storage duration is exceeded, iMaster NCE-Campus automatically deletes the earliest dump files. To allow users to view full log files, you are advised to enable data overflow dump. After this function is enabled, temporary dump files that are newly transferred to the local disk will be automatically uploaded to the remote SFTP server. Users can view full log files on the SFTP server.
Example of log dump file names:
- Security log: 10.170.209.91_SecurityLog_Store_2019_06_08_01_07.zip
- Operation log: 10.170.209.91_OperationLog_Store_2019_06_08_01_07.zip
- Run log: 10.170.209.91_SystemLog_Store_2019_06_08_01_07.zip
- In the file name, 10.170.209.91 indicates the IP address of the cluster node where the log is dumped, and 2019_06_08_01_07 indicates the time when the log file is dumped.
- If an exception occurs during the dump, iMaster NCE-Campus generates an alarm. Rectify the fault based on handling suggestions in the alarm information.
Dump Process of Security, Operation, and Run Logs
Figure 4-3 shows the log dump process.
User Management
After obtaining an account, users can log in to iMaster NCE-Campus and perform operations. User management involves user account management and user rights management.
User Type
iMaster NCE-Campus involves three types of users: system administrator, MSP administrator, and tenant administrator.
User Type |
Account |
Description |
Application Scenarios |
---|---|---|---|
System administrator |
admin |
The admin user is the default system administrator and has the highest rights. |
The admin user has the highest rights. |
Sub-account created by the system administrator |
The system administrator can create multiple sub-accounts and assign different rights to each sub-account based on the account role. This implements rights-based management. |
Rights-based management |
|
Workgroup administrator account created by the system administrator |
The function of a workgroup administrator account created by the system administrator is similar to that of a sub-account created by the system administrator. The system administrator assigns different rights to each sub-account based on the account role. |
Rights-based management |
|
MSP administrator |
Root MSP administrator created by the system administrator |
An MSP administrator created by a system administrator is the root MSP administrator and has the highest MSP rights. |
The root MSP administrator has the highest MSP rights. |
Sub-account created by the MSP administrator |
The root MSP administrator can create multiple sub-accounts and assign different rights to each sub-account based on the account role. This implements rights-based management. |
Rights-based management |
|
Workgroup administrator account created by the MSP administrator. |
An MSP workgroup administrator can maintain services for tenant workgroups. Tenant administrators can create workgroups, assign different rights to workgroup administrators, and assign the permission of managing tenant workgroups to MSP workgroup administrators. When MSPs are authorized to manage tenant services, MSP workgroup administrators only have the permission to manage services of tenant workgroups authorized by tenant administrators. |
Tenant administrators need to assign specified rights to MSP administrators so that MSP workgroup administrators have only the rights to manage services of specified tenant workgroups. |
|
Tenant administrator |
Tenant administrator account created by the MSP administrator |
A tenant administrator created by an MSP administrator is the root tenant administrator and has the highest tenant rights. |
The root tenant administrator has the highest tenant rights. |
Sub-account created by a tenant administrator |
A root tenant administrator can create multiple sub-accounts and assign different rights to each sub-account based on the account role. This implements rights-based management. In addition, each sub-account be configured to manage specific sites. This implements domain-based management. |
Rights- and domain-based management |
|
Workgroup administrator account created by the tenant administrator |
Tenant administrators can create workgroups, assign different rights to workgroup administrators, and assign the permission of managing tenant workgroups to MSP workgroup administrators. When MSPs are authorized to manage tenant services, MSP workgroup administrators only have the permission to manage services of tenant workgroups authorized by tenant administrators. |
Tenant administrators can authorize MSP workgroup administrators to manage services of specified tenant workgroups. |
You are advised to select Modify password first login for a newly created account. When you log in to iMaster NCE-Campus for the first time, change the password as prompted. Additionally, periodically change the password based on the password policy.
User Role
iMaster NCE-Campus defines different user roles to facilitate user rights control. When creating an account, you need to specify a user role. Some default roles are preset on iMaster NCE-Campus, as listed in Table 4-14.
If the default roles do not meet requirements, an administrator can create roles and assign rights to the roles.
User Type |
Default Role |
Description |
---|---|---|
System administrator |
System Administrator |
Manages iMaster NCE-Campus servers, monitors clusters, manages cluster alarms, and configures system services and other functions. |
Operator |
Manages system service running. |
|
Open Api Operator |
Performs authentication when a northbound third-party system invokes northbound APIs of iMaster NCE-Campus. |
|
MSP administrator |
MSP Administrator |
Operates tenant services and performs related configurations. |
Operator |
Manages system service running. |
|
Open Api Operator |
Operates open API services and performs related configurations. |
|
Tenant administrator |
Monitor |
Views tenant services and related configurations. |
Open Api Operator |
Operates open API services and performs related configurations. |
|
Tenant Administrator |
Operates tenant services and performs related configurations. |
|
Operator |
Manages system service running. NOTE:
The Operator role is unavailable for tenant administrators created on iMaster NCE-Campus V300R019C00. |
Service Quality Monitoring
iMaster NCE-Campus can monitor and collect statistics on intra-site links and applications, inter-site links and applications, and network-wide applications in terms of health status, communication quality, traffic, and application.
Health Status
iMaster NCE-Campus monitors the health status of each site based on comprehensive information about devices and links of the site.
Communication Quality
Communication quality indicators are as follows:
- Link quality measurement (LQM): calculated based on the delay, jitter, and packet loss rate of a link.
- Application quality measurement (AQM): calculated based on the delay and packet loss rate of an application.
The delay is calculated according to the following formula:
Delay (in ms) = WAN delay + Server delay
The communication quality data is updated every minute.
Traffic
Traffic indicators are as follows:
- Traffic (in MB): sum of uplink traffic and downlink traffic in a site.
- Capacity (in Mbit/s): sum of uplink capacity or downlink capacity of all physical links in a site. This parameter is used to calculate the uplink or downlink bandwidth usage but not the transmission capability of a site.
- Throughput (bps/pps/Bps): average traffic rate within a certain period. The throughput is calculated according to the following formula: Throughput = Sum of uplink and downlink traffic/Time period.
- Bandwidth usage: Bandwidth usage = Throughput/Capacity
If multiple links are bound to the same physical interface, the bandwidth capacity of only one link is calculated. For active and standby links, the bandwidth capacity of only the active link is calculated.
The communication quality data is updated every five minutes.
Application
Applications are programs to which data packets belong, such as Facebook. The network communication quality can be monitored by application.