ALM-1200067 DHCP Service Is Unavailable
Description
This alarm is generated when the DHCP component on the host is unable to provide DHCP service for all or some of the networks.
Attribute
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
1200067 |
Major |
Yes |
Parameters
Name |
Meaning |
---|---|
Fault Location Info |
service: specifies the name of the service for which the alarm is generated. The default value is VPC. MicroService: specifies the name of the microservice for which the alarm is generated. The default value is DHCP. host_id: specifies the ID of the host for which the alarm is generated. |
Additional Info |
Service: specifies the name of the service for which the alarm is generated. The default value is VPC. MicroService: specifies the name of the microservice for which the alarm is generated. The default value is DHCP. hostname: specifies the name of the host for which the alarm is generated. error_info: specifies the cause of the alarm. |
Impact on the System
If the DHCP service is unavailable, new VMs cannot communicate after creation. Existing VMs cannot renew their IP addresses, and when their IP addresses expire before they can be renewed, VM communication will be affected.
Possible Causes
- The network over which the DHCP service is provided is disconnected.
- The port in the DHCP namespace is down.
- The dnsmasq process that provides the DHCP service is unavailable.
- The DHCP agent is not working properly.
Procedure
- Check whether the alarm is automatically cleared.
- If yes, no further action is required.
- If no, go to 2.
- Log in to the host for which this alarm is generated.
- Use PuTTY to log in to a FusionSphere OpenStack controller node.
Ensure that the reverse proxy IP address and username fsp are used to establish the connection.
- Run the following command and enter the password of user root to switch to user root:
su - root
- Run the following command to disable user logout upon system timeout:
TMOUT=0
- Import CPS environment variables and use CPS authentication. For details, see Importing Environment Variables.
- Run the cps host-list command to query the management IP address of the host identified in Fault Location Info.
- Run the following commands to log in to the host for which the alarm is generated:
su - fsp
ssh fsp@HOST_MANAGE_IP
Enter the system private key password as prompted. The default password is Huawei@CLOUD8!. If newly-generated public and private key files have replaced the old ones, enter the password of the new private key. Alternatively, press Enter and enter the password of user fsp.
Run the su - root command to switch to user root.
- Import OpenStack environment variables and use Keystone V3 authentication. For details, see Importing Environment Variables.
- Use PuTTY to log in to a FusionSphere OpenStack controller node.
- Check the additional alarm information.
- If error_info is "Network network_id failed to ping IP from the DHCP namespace", go to 4.
- If error_info is "The dnsmasq process(es) of network(s) network_id does(do) not exist or unavailable", go to 5.
- If error_info is "The state of the DHCP agent is abnormal", go to 6.
- If error_info is "The namespace's port state of network(s) network_id is(are) abnormal", go to 7.
- Based on network_id in error_info obtained in 3, check whether the packet with the VLAN is allowed to pass through related ports of the switch.
neutron net-show network_id
B4B7CB29-D21D-B211-8F3E-0018E1C5D866:~ # neutron net-show b5e7b481-b07e-444c-a4aa-ddc5a0316252 +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | | | created_at | 2018-07-24T09:48:10 | | description | | | id | b5e7b481-b07e-444c-a4aa-ddc5a0316252 | | ipv4_address_scope | | | ipv6_address_scope | | | mtu | 1500 | | name | network2198 | | port_security_enabled | True | | provider:network_type | vlan | | provider:physical_network | physnet1 | | provider:segmentation_id | 2198 | | qos_policy_id | | | router:external | False | | shared | False | | status | ACTIVE | | subnets | | | tags | | | tenant_id | 8ed209105d2547fba0e74b9dfbcaa859 | | updated_at | 2018-07-24T09:48:10 | +---------------------------+--------------------------------------+
- According to the result of 4.a, obtain the ID of the VLAN that needs to be allowed.
- If provider:network_type is vlan, provider:segmentation_id is the VLAN ID.
In this example, the VLAN ID is 2198, as shown in the command output in 4.a.
- If provider:network_type is vxlan and OpenStack is interconnected with SDN, SDN determines whether to allow packets from the VLAN on the switch to pass through. In this case, go to 10.
- If provider:network_type is vxlan and OpenStack is not interconnected with SDN, run the following command to obtain the VLAN ID:
cps network-list | grep tunnel_bearing
In this example, the VLAN ID is 4003, as shown below.
B4B7CB29-D21D-B211-8F3E-0018E1C5D866:~ # cps network-list | grep tunnel_bearing | tunnel_bearing | 1 | internal_auto | 172.28.48.0-172.28.63.255 | 172.28.48.0/20 | | 4003 | | | tx_limit: | | System vm tunnel network. |
- If provider:network_type is vlan, provider:segmentation_id is the VLAN ID.
- According to the result of 4.a, query the NIC ports connecting the network.
Log in to the FusionSphere OpenStack web client.
For details, see Logging In to the FusionSphere OpenStack Web Client (ManageOne Mode).
- If provider:network_type is vlan, choose Configuration > Network > Configure NIC Mapping, locate the host group that contains the host of the faulty DHCP service, and check the NIC ports corresponding to provider:physical_network.
As shown in the figure below, the NIC ports corresponding to physnet1 are nic0 and nic1.
- If provider:network_type is vxlan, choose Configuration > OpenStack > Neutron, and click Configure Tunnel Bearing Network. Locate the host group that contains the host of the faulty DHCP service, and check the NIC ports corresponding to tunnel_bearing.
As shown in the figure below, the NIC ports corresponding to tunnel_bearing are nic0 and nic1.
- If provider:network_type is vlan, choose Configuration > Network > Configure NIC Mapping, locate the host group that contains the host of the faulty DHCP service, and check the NIC ports corresponding to provider:physical_network.
- Run the following command to query the eth ports corresponding to the NIC ports:
cat /usr/bin/ports_info | python -mjson.tool
B4B7CB29-D21D-B211-8F3E-0018E1C5D866:~ # cat /usr/bin/ports_info | python -mjson.tool { "Logic-phyMapInfo": { "nic0": "eth0", "nic1": "eth1", "nic2": "eth2", "nic3": "eth3", "nic4": "eth4", "nic5": "eth5" }, ...... }
- Check whether the VLAN is allowed to pass through these eth ports of the switch. If negative, configure these ports to allow the VLAN to pass through.
- Go to 8.
- According to the result of 4.a, obtain the ID of the VLAN that needs to be allowed.
- Based on network_id in error_info obtained in 3, check whether the dnsmasq process exists.
ps aux | grep -w dnsmasq | grep -v grep | grep -w network_id
- If the following information is displayed, where the process status is T, indicating that the process is stopped, run kill -18 process ID.In this example, the process ID is 7605. Then, go to 8.
B4B7CB29-D21D-B211-8F3E-0018E1C5D866:~ # ps aux | grep -w dnsmasq | grep -v grep | grep -w 82262c70-e8bf-491c-b289-730f60462741 opensta+ 7605 0.0 0.0 13964 940 ? T Jul24 0:00 dnsmasq --no-hosts --no-resolv --strict-order --except-interface=lo --pid-file=/var/lib/neutron/dhcp/82262c70-e8bf-491c-b289-730f60462741/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/82262c70-e8bf-491c-b289-730f60462741/host --addn-hosts=/var/lib/neutron/dhcp/82262c70-e8bf-491c-b289-730f60462741/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/82262c70-e8bf-491c-b289-730f60462741/opts --dhcp-leasefile=/var/lib/neutron/dhcp/82262c70-e8bf-491c-b289-730f60462741/leases --dhcp-match=set:ipxe,175 --bind-interfaces --interface=tapc5f8ef1a-54 --dhcp-range=set:tag0,10.7.8.0,static,2592000s --dhcp-option-force=option:mtu,1500 --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq.conf --domain=openstacklocal
- Otherwise, go to 10.
- If the following information is displayed, where the process status is T, indicating that the process is stopped, run kill -18 process ID.
- Restore the DHCP agent. For details, see neutron-dhcp-agent Component Troubleshooting. Then, go to 8.
- Run the following command to query the down port in the namespace based on the value of network_id in error_info obtained in 3:
ip netns exec qdhcp-network_id ifconfig -a
- If the following information is displayed, the tapc0fa43d7-98 port is in the DOWN state. If the port is up, "UP" is displayed in the flags field.
70033389-3304-74BA-E811-93EBFEF7D209:~ # ip netns exec qdhcp-38f49e94-96f8-4faf-a728-580ce221fe80 ifconfig -a lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1 (Local Loopback) RX packets 2 bytes 688 (688.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2 bytes 688 (688.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 tapc0fa43d7-98: flags=2<BROADCAST> mtu 1500 inet 192.168.1.254 netmask 255.255.255.0 broadcast 192.168.1.255 ether fa:16:3e:50:32:7c txqueuelen 1000 (Ethernet) RX packets 11055 bytes 765844 (747.8 KiB) RX errors 0 dropped 4 overruns 0 frame 0 TX packets 10874 bytes 830188 (810.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Run the following command to set the port in the down state queried in the namespace to UP:
ip netns exec qdhcp-network_id ifconfig port up
The query result is as follows:
70033389-3304-74BA-E811-93EBFEF7D209:~ # ip netns exec qdhcp-38f49e94-96f8-4faf-a728-580ce221fe80 ifconfig tapc0fa43d7-98 up
- After 5 to 6 minutes, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 9.
- Check whether error_info of the alarm is updated.
- Contact technical support for assistance.
Related Information
None