No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


Mobile Service Impacted because of Bad Design of Network (contigous pairs of PEs and same OSPF Instance)

Publication Date:  2012-07-27 Views:  70 Downloads:  0
Issue Description
the integration of one new MGW site to MPLS Network (add new MGW + 2 NE40 CE) caused an accident of Mobile Service for thousands of suscribers in other regions attached to other MGWs (especially 2 other MGWs), in fact, the suscribers complains that service is perturbated and there is intermittent cut-off,

Below the toplogy and technology used for the new site:

Alarm Information

Handling Process
When the traffic is cutted between 2 MGWs, so in order to determine the root cause of that issue, here is the handling process:

1) Using trace-route command, we can notice that the traffic folow an abonrmal path: in fact the traffic sent from MGW02 toward MGW01 didn't pass through IP-Backbone, but he pass through the 2 CEs of another New Site wich is under-construction.

2) Log into the new site, display the configuration of ospf, and notice that ospf instance used is the same as the 2 old sites.
Also notice that this new site share the same PEs with the two old Sites.
Also by checking display logbuffer , notice that the physical interfaces between the CE and PE of new site is flopping between up/down.

3) Confirm with the engineers on site that this is a new site still under construction, that's why the links are flopping between up/down.

4) Change the ospf instance of the new site 1017 to another one, then the problem is solved.

5) Suggest to Customer to change the design of this new site (please check Suggestion and Summary part)

Root Cause
Analyse of network has showed that ip-reachability become down between 2 other MGW sites, after checking, we have found that the ping between those 2 MGWs become sometimes down:

After checking the network very well, we find that the path between the impacted MGW folow an abnormal path, and this path is through the new site under-construction:

In fact, the traffic between the 2 MGW go through one New Site, and because the new site is still under-construction, so the engineers onsite always stop and active the links between the 2 CEs, so the generate lot of fast cutt to real traffic between the other MGWs, and this without they now that,

the reason of this is bad design of network, in fact, because the engineers in the new site have used the same ospf instance 1017 , so MGW 02 , receive the route of MGW 01 with both OSPF (through ospf instance ospf 1017, and also through MPBGP, but because the route from MPBGP when it's imported to ospf it become LSA5, but the route coming with OSPF 1017 through the new site is LSA3, so CE-02 of MGW 02 prefer the path through the new site (he prefer LSA3 over LSA5), wich means the CE of MGW 02 send traffic to PE1 , and the traffic of PE1 don't send traffic to ip backbone , but he send traffic to the CEs of the new sites:

So As a result, each time the engineers on the new site block the link between the 2 New CEs or between the new CEs and PE, so this cause the traffic cutted for some short time,

For Mobile Service, and in order to avoid a traffic between 2 MGW pass through some CE links outside backbone (other sites), so a good network should be well designated, and each MGW CEs should be paralel to all other MGWs CEs, and also the ospf instance should be different,

here is one example of one network that should be rectified:

(the sites with red color are sites of MGW that share incorrectly the PEs (not using parallel pairs of PEs), so in that case if ospf instance is not different, so traffic path is not correct like the example of the case)