The UMGs Attached to the AR on the IP Bearer Network of a Province Are Out of Service

Publication Date:  2012-07-27 Views:  150 Downloads:  1
Issue Description

  Phenomenon: UMG1 and UMG3 attached to HZAR1 and HZAR2 are out of service. That is, the communications between the UMGs and SSs are interrupted, and as a result, the UMGs fail to register with the SSs and are out of service.
Version: HZAR1, HZAR2: NE80E V300R003C02B697
For the networking, see the attachment. The following is a general description of the networking:
UMG1 accesses SS4; UMG2 accesses SS8; UMG3 accesses SS12.
The traffic model is as follows:
UMG1->HZAR1->HZAR2->SZBR2->GZCR2->GZBR2->GZAR4->SS4
UMG2->HZAR1->HZAR2->SZBR2->GZCR2->GZBR2->GZAR4->SS8
UMG3->HZAR2->HZAR1->SZBR1->GZCR1->GZBR1->GZAR3->SS12
The connection between HZAR1 and HZAR is as follows:
POS 5/0/0, POS 11/0/0, and POS 12/0/0 form an IP-Trunk. The system uses the hash algorithm to determine the physical interface for transmitting traffic. 
 

      
Alarm Information

  1. SS4 reports an alarm indicating that the attached MGW is out of service.
2. SS12 reports an alarm indicating that the attached MGW is out of service.
3. UMG1 and UMG3 report an alarm indicating that the virtual media gateway is not in the service state. 

 

Handling Process

   1. Based on the traffic model, only the signaling interaction between UMG1 and UMG3 and their corresponding SSs is interrupted. It is inferred that the link between HZAR1 and HZAR2 is faulty.
2. At 9:37 a.m., an alarm is generated on POS 5/0/0, indicating low optical power, before the UMG1 and UMG3 are out of service.
Slot5 PIC0 port0 ESFP RxPower is too low, maybe fiber not plugged.
3. At 9:38:33 a.m. before UMG1 and UMG3 are out of service, the customer removes POS 11/0/0 from the IP-Trunk because the link connected to POS 11/0/0 is of poor quality. Then, the undo ip-trunk 2 command is run, and the link connected to POS 11/0/0 is checked to locate the fault.
4. At 9:38:33 a.m., the customer removes POS 5/0/0 from the IP-Trunk because an alarm indicating low optical power is generated on POS 5/0/0. Now, the IP-Trunk has only one member interface, POS 12/0/0. Since the least active-linknumber 2 command has been configured on IP-Trunk 2, this IP-Trunk becomes Down.
5. At 9:38:52 a.m., the LDP session is Down because the IP-Trunk is Down.
6: At 9:38:55 a.m., POS 5/0/0 becomes Up. Since POS 12/0/0 is also Up, the IP-Trunk becomes Up again. The IS-IS peer relationship between HZAR1 and HZAR2 also becomes Up.
Owing to the alarm message(s), Pos5/0/0 went Up.
The status of the trunk member turns Up.
The neighbor of ISIS was changed.
(IsisProcessId=1, Neighbor=×××××××, InterfaceName=Ip-Trunk2, CurrentState=up, ChangeType=3_WAY_UP)
7. Although the IS-IS peer relationship is Up, the LDP session is still Down.
The UMGs are out of service since the signaling packets between the UMGs and the SSs are encapsulated into LDP packets.
8. At 10:12 a.m., the link connected to POS 5/0/0, IP-Trunk, and IS-IS become Down.
SLOT=5;The status of trunk member turns down
The line protoco on the interface ip-trunk has entered DOWN state
isis down
9. At 10:13 a.m., POS 11/0/0 is added to the IP-Trunk. The IP-Trunk, IS-IS, and LDP are all Up and the services are restored.
The line protoco on the interface ip-trunk has entered UP state
isis UP
LDP SESSION UP 

 

Root Cause

 IS-IS packets are guided to POS 12/0/0 based on the hash algorithm, and IS-IS neighbor relationships can be normally set up between HZAR1 and HZAR2.
The signaling packets between the UMGs and SSs are encapsulated into LDP packets and guided to the problematic link of POS 5/0/0, which leads to the interruption of the LDP session. As a result, the signaling interaction between the UMGs and the SSs is interrupted, and the UMGs are out of service. 

 
 

Suggestions

 Before locating the link fault, ensure that the faulty interface is shut down before it is removed from the IP-Trunk. That is, the faulty link shall not be used for traffic forwarding. 

 

END