No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>


To have a better experience, please upgrade your IE browser.


IGP, LDP, BGP, and VRRP Flapping Occurs on PE Devices of a Site Due to Loops at the Access Side or Attacks

Publication Date:  2013-09-03 Views:  134 Downloads:  0
Issue Description
For the topology, see the attachment. The device version is V3R3C02B608+SPH031.
Fault symptom: From 14:00 to 16:00 in a day, all 2G and 3G services at this site are disconnected and devices on the core network cannot be remotely managed. The log buffers on Huawei PE devices show that the OSPF, LDP, and VRRP status frequently changes. After 16:00, all services are automatically restored.

Apr 15 2010 14:01:17 WGL-NE40E-4-PE-A %%01OSPF/3/NBR_CHG_DOWN(l): Neighbor event:neighbor state changed to Down. (ProcessId=2, NeighborAddress=, NeighborEvent=InactivityTimer, NeighborPreviousState=Full, NeighborCurrentState=Down)
Apr 15 2010 14:01:17 WGL-NE40E-4-PE-A %%01OSPF/6/NBR_DOWN_REASON(l): Neighbor state leaves full or changed to Down. (ProcessId=2, NeighborRouterId=, NeighborAreaId=0, NeighborInterface=Eth-Trunk0.1,NeighborDownImmediate reason=Neighbor Down Due to Inactivity, NeighborDownPrimeReason=Hello Not Seen, NeighborChangeTime=[2010/04/15] 14:01:17)
Apr 15 2010 14:02:33 WGL-NE40E-4-PE-A %%01OSPF/3/NBR_CHG_DOWN(l): Neighbor event:neighbor state changed to Down. (ProcessId=1, NeighborAddress=, NeighborEvent=InactivityTimer, NeighborPreviousState=Full, NeighborCurrentState=Down)
Apr 15 2010 14:02:33 WGL-NE40E-4-PE-A %%01OSPF/6/NBR_DOWN_REASON(l): Neighbor state leaves full or changed to Down. (ProcessId=1, NeighborRouterId=, NeighborAreaId=0, NeighborInterface=Eth-Trunk5,NeighborDownImmediate reason=Neighbor Down Due to Inactivity, NeighborDownPrimeReason=Hello Not Seen, NeighborChangeTime=[2010/04/15] 14:02:33)
#Apr 15 14:00:55 2010 WGL-NE40E-4-PE-A LDP/4/SessionDown: Session( public Instance)'s state change to Down
#Apr 15 14:00:55 2010 WGL-NE40E-4-PE-A LDP/4/SessionDown:The session went Down.
#Apr 15 14:00:55 2010 WGL-NE40E-4-PE-A LDP/4/SessionDownReason:The session went Down.
#Apr 15 14:00:55 2010 WGL-NE40E-4-PE-A LSPM/4/TRAP: LSP 73683 went Down.
Apr 15 2010 14:00:55 WGL-NE40E-4-PE-A %%01LDP/6/PRONOTI(l): The session was deleted and the notification sent by the peer was handled.(Notification=HOLD_TIMER_EXPIRED, PeerId= )
Apr 15 2010 14:00:55 WGL-NE40E-4-PE-A %%01RM/3/LDP_SESSION_STATE(l): RM received the status DOWN of the LDP session on the Eth-Trunk0.2.
Apr 15 2010 14:00:55 WGL-NE40E-4-PE-A %%01OSPF/6/LDP_SYNC_EVENT(l): Interface Eth-Trunk0.2 received LDP Session Down from RM, LDP synchronization state Sync-Achieved change to HoldMaxCost.
Apr 15 2010 14:00:55 WGL-NE40E-4-PE-A %%01RM/3/NOTIFY_OSPF_MSG(l): RM notified OSPF of the status DOWN of the LDP session on the Eth-Trunk0.2.
#Apr 15 14:00:59 2010 WGL-NE40E-4-PE-A LDP/4/SessionDown: Session( public Instance)'s state change to Down
#Apr 15 14:00:59 2010 WGL-NE40E-4-PE-A LDP/4/SessionDown:The session went Down.
#Apr 15 14:00:59 2010 WGL-NE40E-4-PE-A LDP/4/SessionDownReason:The session went Down.
Apr 15 2010 14:00:59 WGL-NE40E-4-PE-A %%01LDP/4/HOLDTMREXP(l): Sessions were deleted because the hello hold timer expired. (PeerId=
Apr 15 2010 14:00:59 WGL-NE40E-4-PE-A %%01LDP/4/DELSSNSENDNOTI(l): The session was deleted and the notification HOLD_TIMER_EXPIRED was sent to the peer
Apr 15 2010 14:00:59 WGL-NE40E-4-PE-A %%01LDP/6/PEERRESTART(l): The peer LSR was restarting.
Apr 15 14:07:10 2010 WGL-NE40E-4-PE-A VRRP/3/VRRPCHANGETOMASTER:OID Became to be new master!
Apr 15 2010 14:07:10 WGL-NE40E-4-PE-A %%01VRRP/4/STATEWARNING(l): Virtual Router state BACKUP changed to MASTER, because of priority calculation. (Interface=Vlanif50, VrId=1)
#Apr 15 14:07:10 2010 WGL-NE40E-4-PE-A VRRP/3/VRRPCHANGETOMASTER:OID Became to be new master!
Apr 15 2010 14:07:10 WGL-NE40E-4-PE-A %%01VRRP/4/STATEWARNING(l): Virtual Router state BACKUP changed to MASTER, because of priority calculation. (Interface=Vlanif60, VrId=11)
Handling Process

The possible causes are as follows:
1. Protocol packets are modified before they are sent to the CPU.
2. The tunnel for sending protocol packets to the PCU is congested, leading to discarding of the protocol packets.

1. Based on experience, the faults are caused by loops at the access side. When loops occur, a large number of multicast packets are sent to the CPU, leading to congestion of the tunnel for sending the packets. As a result, normal protocol packets are discarded. Confirm with the frontline personnel whether loops occur at the access side. The frontline personnel confirm with the customer, and the customer gives feedback that no operation is performed during this period of time.
2. Later, R&D personnel collect lower-layer information. Based on the collected IP, TCP, and UDP statistics, no checksum error occurs in historical records. The faulty device can be connected using Telnet from card 1, which indicates that packets are not likely to be modified (packets of a certain type may be modified or a bit of packets is modified). In addition, if packets are modified, services cannot be automatically restored in most cases; therefore, packets are not modified.
3. Continue to check whether protocol packets are discarded when being sent to the CPU. Based on the subsequent analysis, a large number of reserved multicast packets are discarded. The fault has been rectified when the frontline personnel report the fault, so the source address of these multicast packets cannot be queried. Additionally, other lower-layer information is normal and the statistics about discarded packets are accumulated. Therefore, whether packets are discarded in this period cannot be determined. The fault may be caused by loops at the access side or attacks from virus.
[WGL-NE40E-4-PE-A]dis cpu-defend slot 1 car index 51
slot : 1
(51)Special IPV4 multicast packet: information:
  passed : 14199976 packets
  dropped : 1193577803 packets
  cir : 1500kbps
  cbs : 90000bytes
  priority : high
  min-packet-length : 128bytes
[WGL-NE40E-4-PE-B]dis cpu-defend slot 1 car index 51
slot : 1
(51)Special IPV4 multicast packet: information:
  passed : 25623880 packets
  dropped : 592814929 packets
  cir : 1500kbps
  cbs : 90000bytes
  priority : high
  min-packet-length : 128bytes
Root Cause
The device is attacked and the tunnel for sending multicast packets to the CPU is congested, so flapping of protocols using multicast packets occurs.
Check whether downstream devices are attacked or have loops.
A loop or an attack may occur at the access side.
If the fault occurs again, run the following commands to locate the fault:
1. Run the display interface interface-num command in the user view to check whether many multicast and broadcast packets increase on the interface where the neighbor relationship is interrupted.
2. Check whether many broadcast packets and packets with reserved multicast addresses are discarded by CAR.
Run the dis cpu-defend slot slot-num car index 51 command in the hidden view to check whether packets with reserved multicast addresses increase.
Run the dis cpu-defend slot slot-num car index 8 command in the hidden view to check whether broadcast ARP packets increase.
3. Check whether the CPU of the LPU and MPU increases and CPU usage occupied by tasks.
Run the display health command in the user view to check the CPU usage.
Run the display cpu-usage command in the user view to check the CPU usage occupied by tasks.
Run the display cpu-usage slot slot-num command in the user view to check the CPU usage occupied by tasks of the MPU.
4. Capture packets and locate the attack source.
[PE-A-hidecmd]display pe-probe 5 0 epe-dmadata

Set the highest priority for protocol packets such as OSPF, LDP, VRRP, and BGP so that the protocol packets are sent first when a fault occurs. Configure suppression for multicast and broadcast packets on the access layer interface.