NE40E version is V600R001C00SPC800, the topology is:STP1(CE1; IP:10.10.10.2/30)--NE40E(PE1 IP:10.10.10.1/30;QOS:AF4)---NE80E----NE40E(PE2;IP:10.10.10.5/30;QOS:AF4)—MSC2(CE2;IP: 10.10.10.6/30); The customer Core team complain that the communication between MSC1 and MSC2 do not work properly.
There is SCTP packets lost alarm at MSC
Use ping 10.10.10.1 at CE1 site, there is no packets drop. It proves that the communication between CE1 to PE1 is good.
Use ping 10.10.10.5 at CE2 site, there is no packets drop. It proves that the communication between CE2 to PE2 is good.
Use ping 10.10.10.6 at CE1 site, there is 20% packets lost. It prove the communication between CE1 to CE2 have problem.
But when use ping -vpn VPNA -a 10.10.10.1 10.10.10.5 at PE1 site, there is no packets drop, it proves that the communication between CE1 to PE1 is good. It proves that the communication between PE1 to PE2 is good
The conclusion of NO. 4 and NO.5 is conflict,
After we check with R&D, we know after version V6R1 all ICMP packets which are sent by MPU the mark is CS6. So we use ping -vpn VPNA -tos 128 -a 10.10.10.1 10.10.10.5 at PE1, the same packets drop coming. Then we can make a conclusion that there are some QOS drops in AF4 queue somewhere of the network. Then we check the QOS drop hop by hop we find that the EF flow from P to PE2 is too higher it cause AF4 flow packets drop.
Then change the EF flow limitation to lower, the problem solved.
Because after version V6R1 the mark of all ICMP packets which are sent by MPU is CS6, so if we use normal ping from PE1 to PE2, we can not find any packets lost when the physical link do not have problem. If we use ping –tos 128(means AF4 ping)change the mark of packets we can find the drop, it means there is QOS drop somewhere in the network but not physical link problem.
Because the ICMP packets mark is different from CE and PE , we can use this rule to indentify whether this problem related to physical link problem or not, and find out the QOS drop.