The partner raised a ticket to Huawei TAC reporting a problem with E9000 about final customer
cannot connect to Fusion Compute portal from the management-PC, so it is not possible to manage the VMs.
As the logs of E9000 were not possible to get because no acces to the Fusion Compute, Huawei Tac requested
info about the network infraestucture to know if some modification was performed.
The next topology was confirmed:
A remote session to check the issue was scheduled
After this, Huawei TAC engineers found the next issues:
1. The browsers used for trying the access were Explorer 11 and Firefox 40, and those are not matching the
suggested by Huawei, so in this way the VRM will have problems to watch properly the content.
The browsers need to have the follow characteristics
2. The IP address that the Management-PC has to make login in the fusion compute belongs to the segment
192.168.50.12 and management-PC belongs to the segment 40.x as you can see in the below picture:
As you can see the Management-PC that the innerconsulting partner confirmed has configured an IP 192.168.40.2,
the gateway that is configured in this PC is 192.168.40.1, Huawei’s TAC engineers tried to do a ping to the gateway
without success nor 192.168.50.12 IP fusion Management float IP was reachable by ping.
As you can notice, these IPs belong to different segments, this can be proved checking the subred mask configured.
This server used as a Management-PC has 4 NIC cards working 2 of them unable.
The first NIC and how it works was explained above, for the second Nic the configuration is the next:
In this case TAC made a ping to the gateway getting a positive response
Therefore Huawei TAC asked to the customer engineers check the issue as network connection problem.
TAC scheduled another remote session in order to fix the issue, Huawei’s TAC
engineers realized that the firewall had some miss configuration and that was the reason that the E9000 was
unreachable with the Management-PC, after fixing the issue , we were able to reach the next IPs:
192.168.50.10 --- VRM01
But the IP that we used to make login in the VRM still was unreachable:
192.168.50.12 float IP (unreachable)
After that the customer got problems to get the password and ID of the VRMs.
RnD help to Huawei’s TAC engineers to find the passwords:
To enter in the VRM:
The user is “gandalf” password: “Pwd8800_magic$”
After to enter in the VRM as Gandalf user we need to change to the root user as below:
User: “root” password: “Galax@8800”
Once that Huawei’s TAC engineers make login into the VRMs they realized that the gateway that was configured in
VRM´s and also in the blades servers is 192.168.50.1, the servers and the VRM´s belong to the same segment and
was impossible to do a ping successfully, that was one of the reason that there was no traffic in/out
Huawei local engineers connected remotely and configured the VRM - GW IP 192.168.50.1 in the FW.
After disabling the ping reply restrictions on the corresponding FW interfaces, the heartbeat connection between
VRM 1 and VRM2 was reestablished and the VRM 1 become “Primary” server and got the floating IP
192.168.50.12 used for Fusion Compute access. After this, the connection to Fusion Compute was success.
Customer ITC confirmed that access was working normally and they were able to restart two VM that were down.
The Huawei GTAC engineer realized that some flapping in the VRM was happening between the VRM and the web
portal, the root cause was found as follows:
Using the 192.168.50.10/192.168.50.11 VRM IP ping the gateway IP 192.168.50.1, the time
delay up to 28672ms
Then Fusion Compute system detect the delay more than 1s and over 9 times, the Primary
VRM will change the status to Secondary , in the mean time, the floating IP is not reachable,
then FC portal can not login:
Root cause: Large traffic to appear in the management plane, it result in CPU usage up to 99% in the FW, then the
response of CPU will be delayed.
The suggestion to avoid this behavior was given to customer and is to remove GW IP IP 192.168.50.1 from FW and
configure this in the Core Switch.
There were 2 VRMs, they work doing exactly the same but exist for redundancy as a backup, each one has an IP to
be manage by console (one of them needs to be configured as primary and the other as secondary) and there is one IP
that we called float IP, that is used to get connection via GUI. The float IP when the application graphic is working is
linked with one VRM (the primary) but when one VRM got down this IP will be linked to the other VRM
(secondary will change to primary) so in that way the management never will be interrupted.
The problem happened when the heart beat was interrupted because the gateway IP cannot get ping through between
the VRM01 and VRM02 so each VRM get secondary status cannot became in primary status then the user couldn’t
login Fusion Compute portal (the float IP never found the primary).
The IP of gateway was not configured in any NE,
Please take a look in the below picture:
Configure Gateway IP in some external lan network device which has interfaces in the same IP segment. and assure
that this IP is never removed, otherwise the communication between VRMs will fail and access to Fusion Compute
Portal not possible
To avoid high CPU usage in FW, it is suggested to configure this gateway IP: 192.168.50.1 in core switch.