PTS Normal Working Mode:
1. In normal circumstances, all module in PTS shall be showing "up" status as it means it's inspecting all kind of packets such as DNS, BGP, IGP, etc as it's normal behaviour.
2. In T-Operator deployment, all PTS are working in cluster mode.(Refer to the diagram)
Cluster Mode Diagram.
3rd Party Vendor Equipment uses Multiple Spanning Tree Protocol (MSTP) in cluster link :
• Each PTS uses it’s own VLAN, Per VLAN Spanning Tree in the cluster.
• Each PTS starts its own VLAN as the root bridge.
• This allows 4 cluster links to be fully utilized.
•The green arrow shows the cluster links are utilized from 1 PTS point of view.
During the investigation, it is confirmed that some
of ICMP packets traversing through PTS-2 had a different ICMP checksum on ingress of the 14K versus on egress.
Chronology of Events
Chronological order of activities during the maintenance window:
Ping tests were performed for verification while PTS cluster was in shunt mode. No packet loss was observed.
PTS-1 was taken out of shunt and the same ping test was performed.
The ping test From the BRAS to the Internet failed.
Confirmed that the ICMP packets were being processed by PTS-2 - PPU-8.
The ICMP packets were captured on the MSE and PTS-2 simultaneously for comparison.
All the packet captures revealed that ICMP checksums were incorrect.
PTS-1 was placed back into shunt mode and the PING test to the Internet was successful.
PTS-1 was taken out of shunt same ping test failed.
The cluster links on PTS-2 were disabled and the same ICMP test was preformed. This time, the ICMP packets were rebalanced to PPU-8 on PTS-3.
The ICMP test to Internet was successful. (captures were taken on both PTS-3 and the MSE)
Took PTS-2 out of shunt and performed the same PING test with ICMP packets transmitted through the data interface of PTS-2. This time the packets were inspected on PTS2 - PPU-4. The Ping test failed.
PTS-2 was rebooted and the same PING test was performed. The same packet issue was observed.
The remaining PTS elements were rebooted one at a time. All PTS elements were put back in shunt after reboot.
Verification tests were performed and all ping tests were successful.
ICMP packets traversing through PTS-2 get corrupted. If the PTS cluster is in shunt, the problem doesn’t occur.
The issue basically was due to the malformed DNS traffic travelling across the PTS platform cluster(software issue). This malformed DNS traffic when being analyzed caused issue for the processing modules and created diagnostic core files.