No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Malfunction DNS traffic in 3rd party equipement caused ME60-16 Internet Service interrupted

Publication Date:  2012-07-27 Views:  52 Downloads:  0
Issue Description
Problem Description:
T Operator in Malaysia is using ME60-16 as Bras. Triple-play services are running on platform such as high speed internet, VoIP and VoD.T-Operator informed that there are around 100 users has logged complaint they can’t browse internet. Detected intermittent in ME60 connected to PE1 and PE2. Between BRAS and PE, there were PTS equipments.


Software version: V6R2C02SPC500

Topology: Refer to the Image






Alarm Information
Null



Handling Process
Checking on PTS, the PTSD module detected down in several boxes.

T-Operator requested to bypass the PTS.

We place the box number 2,3 and 4 in shunt mode, but the link still intermitternt confirmend by Network Control Center(NCC) and Huawei Bras team.

We put all the boxes in shunt mode.

Confirmed from NCC and Huawei BRAS team, link was stable and user can browse the internet after bypass all the PTS(in shunt mode).




Root Cause

PTS Normal Working Mode:
1. In normal circumstances, all module in PTS shall be showing "up" status as it means it's inspecting all kind of packets such as DNS, BGP, IGP, etc as it's normal behaviour.
2. In T-Operator deployment, all PTS are working in cluster mode.(Refer to the diagram)



Cluster Mode Diagram.

3rd Party Vendor Equipment uses Multiple Spanning Tree Protocol (MSTP) in cluster link :
• Each PTS uses it’s own VLAN, Per VLAN Spanning Tree in the cluster.
• Each PTS starts its own VLAN as the root bridge.
• This allows 4 cluster links to be fully utilized.
•The green arrow shows the cluster links are utilized from 1 PTS point of view.

Analysis:
During the investigation, it is confirmed that some of ICMP packets traversing through PTS-2 had a different ICMP checksum on ingress of the 14K versus on egress.

Chronology of Events

Chronological order of activities during the maintenance window:

  1. Ping tests were performed for verification while PTS cluster was in shunt mode. No packet loss was observed.
  2. PTS-1 was taken out of shunt and the same ping test was performed.
  3. The ping test From the BRAS to the Internet failed.
  4. Confirmed that the ICMP packets were being processed by PTS-2 - PPU-8.
  5. The ICMP packets were captured on the MSE and PTS-2 simultaneously for comparison.
  6. All the packet captures revealed that ICMP checksums were incorrect.
  7. PTS-1 was placed back into shunt mode and the PING test to the Internet was successful.
  8. PTS-1 was taken out of shunt same ping test failed.
  9. The cluster links on PTS-2 were disabled and the same ICMP test was preformed. This time, the ICMP packets were rebalanced to PPU-8 on PTS-3.
  10. The ICMP test to Internet was successful. (captures were taken on both PTS-3 and the MSE)
  11. Took PTS-2 out of shunt and performed the same PING test with ICMP packets transmitted through the data interface of PTS-2. This time the packets were inspected on PTS2 - PPU-4. The Ping test failed.
  12. PTS-2 was rebooted and the same PING test was performed. The same packet issue was observed.
  13. The remaining PTS elements were rebooted one at a time. All PTS elements were put back in shunt after reboot.
  14. Verification tests were performed and all ping tests were successful.


Symptoms

ICMP packets traversing through PTS-2 get corrupted. If the PTS cluster is in shunt, the problem doesn’t occur.


Root cause:
The issue basically was due to the malformed DNS traffic travelling across the PTS platform cluster(software issue). This malformed DNS traffic when being analyzed caused issue for the processing modules and created diagnostic core files.

Suggestions

To upgrade the PTS box with latest software release.

END