No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
FusionSphere OpenStack Operations

FusionSphere OpenStack Operations

How Can I Handle a Failure in Forcible Time Synchronization?

Symptom

When a user attempted to forcibly synchronize system time on the FusionSphere OpenStack web client, the task failed, causing a system exception.

Possible Causes

During the time synchronization process, an exception occurred on the external clock source.

Troubleshooting Guideline

Restore the system and then check whether the external clock source is functional.

Prerequisites

You have logged in to the FusionSphere OpenStack web client.

Procedure

  1. Use PuTTY to log in to the first host.

    Ensure that the host management IP address and username fsp are used to establish the connection.

    The default password of user fsp is Huawei@CLOUD8.

    The system supports the login authentication using a password or private-public key pair. If you use a private-public key pair to authenticate the login, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

  2. Run the following command and enter the password of user root to switch to user root:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  3. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  4. Import environment variables. For details, see Importing Environment Variables.

    Enter 2 to enable the CPS authentication.

  5. Run the following command to view the time synchronization result:

    ntp result-show

    Check whether any command output is displayed.

    • If yes, go to 6.
    • If no, go to 9.

  6. Run the following commands to restore the system:

    ntp start-hosts --host all

    After 5 minutes, check whether the system is successfully restored.

    • If yes, go to 7.
    • If no, go to 9.

  7. Run the following command to check whether the clock source is functional:

    ntp time-delta --host all

    • If the value of the time difference between the NTP server and the external clock source is Fail, go to 8.
    • If no, go to 9.

  8. Restore the NTP server and the external clock source, and then manually synchronize the system time again. For details, see ALM-6010 Time Difference Between the NTP-Server and the External Clock Source Exceeds Threshold Value.
  9. If message Max retries exceeded with url: /cps/v1/ntp/sync is displayed, restart the three controller nodes and then perform the handling procedure. If other error messages are displayed, contact technical support for assistance.

Configuring Alarm Thresholds

Scenarios

Configure alarm thresholds on the Service OM web client. When the value of the monitored entity reaches the threshold, the system automatically generates an alarm.

Prerequisites

You have logged in to the Service OM web client.

Procedure

  1. On the Service OM web client, choose Service OM > Centralized O&M > Alarm > Alarm Settings.

    The Alarm Thresholds page is displayed.

  1. Click the entity for which the alarm threshold is to be configured.

    The alarm objects are displayed.

  1. Locate the row that contains a target alarm object, click Modify.

    A dialog box is displayed.

  1. Select alarm severities.

    If no alarm severity is selected, no alarm will be generated for this entity.

  1. Enter the alarm thresholds in the text boxes on the right of the alarm severities.

    For example, to enable a major alarm to be generated when the CPU usage is greater than 70% but less than 80% and a critical alarm to be generated when the CPU usage is greater than 80%, set the thresholds as follows:

    CPU usage:

    • Critical: > 80%
    • Major: > 70%

  1. Configure the alarm threshold offset.

    Offset: specifies the allowable threshold offset when an alarm is cleared. This parameter is valid only for alarm clearance. It is not valid for alarm generation.

    For example, if the CPU usage alarm threshold is set to 80% and the offset is set to 10%, an alarm is reported when the CPU usage reaches 80%, and it is cleared when the CPU usage lowers to 72%, which is the result of the formula 80% x (100% – 10%).

    NOTE:

    However, if the CPU usage is greater than the alarm clearance value calculated based on the offset when the system already has a critical CPU usage alarm reported, a critical alarm, rather than a major alarm, is generated even though the CPU usage is less than the major alarm threshold. Therefore, you are advised to set an alarm offset that makes the result obtained from the formula "higher-level alarm threshold x (100% - offset)" be greater than the lower-level alarm threshold.

  1. Click Save.

nova-compute Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.
  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If no, go to 3.
    • If yes, go to 8.
  3. Run the following command to check whether the status of the nova-compute service is normal:

    nova service-list | grep nova-compute

    Check whether the service status is down.

    • If no, go to 8.
    • If yes, check whether the host service is disabled.

      Check whether the status of the nova-compute service in the displayed result is disable.

      • If no, go to 4.
      • If yes, run the following command to manually restore the service:

        nova service-enable hostId nova-compute

        NOTE:

        Replace hostId with the actual host ID, for example, A8983960-F114-E611-824B-08C0210791BD.

  4. Run the following command to check whether the status of the neutron-openvswitch-agent component is normal:

    cps template-instance-list --service neutron neutron-openvswitch-agent

    Check whether the service status is active.

  5. Run the following command to check whether the status of the nova-conductor service is normal:

    nova service-list | grep nova-conductor

    Check whether the service status is down.

  6. Run the following command to check whether storage multipathing is enabled:

    cps template-params-show --service nova nova-compute | grep libvirt_iscsi_use_ultrapath

    Query the value of the libvirt_iscsi_use_ultrapath configuration item.

    • true indicates that the multipathing service is enabled. Check whether the multipathing service is enabled.

      Run the ps -ef | grep upservice command to check whether the service is normally enabled.

      • If no, the multipathing service is not installed. In this case, install the service. .
      • If yes, the multipathing service is installed. In this case, go to 7.
    • false indicates that the multipathing service is disabled. In this case, enable the multipathing function or install the multipathing software package.
    NOTE:
    • Perform the following operations to manually enable or disable storage multipathing.
      • Run the following command to enable storage multipathing: cps template-params-update --service nova nova-compute --parameter libvirt_iscsi_use_ultrapath=true; cps commit
      • Run the following command to disable storage multipathing: cps template-params-update --service nova nova-compute --parameter libvirt_iscsi_use_ultrapath=false; cps commit
  7. Run the cps host-template-instance-operate --service nova nova-compute --action start command.

    Wait for 1 minute and run the cps template-instance-list --service nova nova-compute command to check whether the faulty component is in the active state.

    • If yes, no further action is required.
    • If no, go to 8.
  8. Contact technical support for assistance.

nova-api Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.
  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If no, go to 3.
    • If yes, go to 4.
  3. Run the cps host-template-instance-operate --service nova nova-api --action start command.

    Wait for 1 minute and run the cps template-instance-list --service nova nova-api command to check whether the faulty component is in the active state.

    • If yes, no further action is required.
    • If no, go to 4.
  4. Contact technical support for assistance.

nova-scheduler Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.

    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.

  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If no, go to 3.
    • If yes, go to 5.

  3. Run the following command to check whether the status of the nova-scheduler service is normal:

    nova service-list | grep nova-scheduler

    Check whether the service status is down.

    • If no, go to 5.
    • If yes, check whether the host service is disabled.
      Check whether the status of the nova-scheduler service in the displayed result is disable.
      • If no, go to 4.
      • If yes, run the following command to manually restore the service:

        nova service-enable hostId nova-scheduler

        NOTE:

        Replace hostId with the actual host ID, for example, A8983960-F114-E611-824B-08C0210791BD.

  4. Run the cps host-template-instance-operate --service nova nova-scheduler --action start command.

    Wait for 1 minute and run the cps template-instance-list --service nova nova-scheduler command to check whether the faulty component is in the active state.

    • If yes, no further action is required.
    • If no, go to 5.

  5. Contact technical support for assistance.

nova-conductor Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.

    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.

  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If no, go to 3.
    • If yes, go to 5.

  3. Run the following command to check whether the status of the nova-conductor service is normal:

    nova service-list | grep nova-conductor

    Check whether the service status is down.

    • If no, go to 5.
    • If yes, check whether the host service is disabled.

      Check whether the status of the nova-conductor service in the displayed result is disable.

      • If no, go to 4.
      • If yes, run the following command to manually restore the service:

        nova service-enable hostId nova-conductor

        NOTE:

        Replace hostId with the actual host ID, for example, A8983960-F114-E611-824B-08C0210791BD.

  4. Run the cps host-template-instance-operate --service nova nova-conductor --action start command.

    Wait for 1 minute and run the cps template-instance-list --service nova nova-conductor command to check whether the faulty component is in the active state.

    • If yes, no further action is required.
    • If no, go to 5.

  5. Contact technical support for assistance.

nova-proxy Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.

    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.

  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If no, go to 3.
    • If yes, go to 8.

  3. Run the following command to check whether the status of the nova-proxy service is normal:

    cps template-instance-list --service nova nova-proxy00X
    NOTE:

    The process name of nova-proxy is nova-proxy00X. X indicates a digit (such as 1, 2, or 3) which is generated in sequence when the cascaded FusionSphere OpenStack system is connected. For example, the process name is nova-proxy001 when the first cascaded system is connected, and the process name is nova-proxy002 when the second cascaded system is connected. Other process names can be deduced in the same way.

    Check whether the service status is active.

    • If yes, go to 8.
    • If yes, check whether the host service is disabled.

      nova service-list | grep nova-compute

      Check whether the status of the nova-compute service in the displayed result is disable.

      • If no, go to 4.
      • If yes, run the following command to manually restore the service:
        nova service-enable hostId nova-compute
        NOTE:

        Replace hostId with the actual host ID, for example, A8983960-F114-E611-824B-08C0210791BD.

  4. Run the following command to check whether the status of the neutron-openvswitch-agent component is normal:

    cps template-instance-list --service neutron neutron-openvswitch-agent

    Check whether the service status is active.

  5. Run the following command to check whether the status of the nova-conductor service is normal:

    nova service-list | grep nova-conductor

    Check whether the service status is down.

  6. Run the following command to check whether storage multipathing is enabled:

    cps template-params-show --service nova nova-compute | grep libvirt_iscsi_use_ultrapath

    Query the value of the libvirt_iscsi_use_ultrapath configuration item.

    • true indicates that the multipathing service is enabled. Check whether the multipathing service is enabled.

      Run the ps -ef | grep upservice command to check whether the service is normally enabled.

      • If no, the multipathing service is not installed. In this case, install the service.
      • If yes, the multipathing service is installed. In this case, go to 7.
    • false indicates that the multipathing service is disabled. In this case, enable the multipathing function or install the multipathing software package.
    NOTE:
    • Perform the following operations to manually enable or disable storage multipathing.
      • Run the following command to enable storage multipathing: cps template-params-update --service nova nova-compute --parameter libvirt_iscsi_use_ultrapath=true; cps commit
      • Run the following command to disable storage multipathing: cps template-params-update --service nova nova-compute --parameter libvirt_iscsi_use_ultrapath=false; cps commit

  7. Run the cps host-template-instance-operate --service nova nova-proxy00X --action start command.

    Wait for 1 minute and run the cps template-instance-list --service nova nova-proxy00X command to check whether the faulty component is in the active state.

    If yes, no further action is required.

    If no, go to 8.

  8. Contact technical support for assistance.

fc-nova-compute Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.
  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If no, go to 3.
    • If yes, go to 5.
  3. Run the following command to check whether the status of the fc-nova-compute00X service is normal:

    nova service-list | grep fc-nova-compute00X

    In the command, X indicates a digit, such as 1, 2, or 3, which is the same as the name of the faulty component.

    Check whether the service status is down.

    • If no, go to 5.
    • If yes, check whether the host service is disabled.

      Check whether the status of the fc-nova-compute00X service in the displayed result is disable.

      • If no, go to 4.
      • If yes, run the following command to manually restore the service:
        nova service-enable hostId fc-nova-compute00X
        NOTE:

        Replace hostId with the actual host ID, for example, A8983960-F114-E611-824B-08C0210791BD.

  4. Run the cps host-template-instance-operate --service nova fc-nova-compute00X --action start command.

    Wait for 1 minute and run the cps template-instance-list --service nova fc-nova-compute00X command to check whether the faulty component is in the active state.

    • If yes, no further action is required.
    • If no, go to 5.
  5. Contact technical support for assistance.

vmware-nova-compute Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.
  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If no, go to 3.
    • If yes, go to 5.
  3. Run the following command to check whether the status of the vmware-nova-compute00X service is normal:

    nova service-list | grep vmware-nova-compute00X

    X indicates 1, 2, or 3.

    Check whether the service status is down.

    • If no, go to 5.
    • If yes, check whether the host service is disabled.

      Check whether the status of the vmware-nova-compute00X service in the displayed result is disable.

      • If no, go to 4.
      • If yes, run the following command to manually restore the service:
        nova service-enable hostId vmware-nova-compute00X
        NOTE:

        Replace hostId with the actual host ID, for example, A8983960-F114-E611-824B-08C0210791BD.

  4. Run the cps host-template-instance-operate --service nova vmware-nova-compute00X --action start command.

    Wait for 1 minute and run the cps template-instance-list --service nova vmware-nova-compute00X command to check whether the faulty component is in the active state.

    • If yes, no further action is required.
    • If no, go to 5.
  5. Contact technical support for assistance.

cinder-volume Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.

    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.

  2. On the FusionSphere OpenStack web client, check whether the backend storage configuration items are correct.

    1. Log in to the FusionSphere OpenStack web client and choose Configuration > Manage Resource Pool > KVM > Configure Storage Cluster.
    2. Check whether the REST URL, storage resource pool name, storage IP address, username, and password are correctly configured.

  3. If the Huawei disk array is connected, check whether the Huawei multipathing function is enabled and the multipathing software package is correctly installed.

    1. Log in to the node with cinder-volume deployed, run the grep ultra /etc/cinder/cinder-volume.conf command, and check whether the multipathing function is enabled.
      • true indicates that the multipathing function is enabled.
      • false indicates that the multipathing function is not enabled.

        Log in to the FusionSphere OpenStack web client, choose Configuration > OpenStack > Cinder > Configure Storage Multipathing, select ultrapath, and click Submit.

        If multiple NEs have been configured, choose Configuration > VM Diversification, select ultrapath, and click Submit.

    1. Run the ps -ef | grep upservice command to check whether the multipathing service is started properly.
      • If yes, the multipathing service is started properly.
      • If no, the multipathing service is not started. Run the rpm -qa | grep UltraPath command. If no information is displayed, the multipathing service package is not installed. Install and configure the multipathing service package.
      NOTE:

      To obtain and install the multipathing service software package, perform the following steps:

      1. Obtain and decompress the FusionSphere SIA 6.5.0_Drivers.zip driver package.
      2. Obtain and decompress the ULTRAPATH_XXX_XXX_FSOV1R6C30.tar.gz package in the 06 Driver for FusionSphere 6.5 > for FusionSphere OpenStack > 01 UltraPath path.
      3. In the decompressed ULTRAPATH_XXX_XXX_FSOV1R6C30.tar.gz folder, open the os subfolder, obtain the UltraPath-xxx-xxx.rpm file, and run the following command to install the multipathing service software package:

        rpm -ivh UltraPath-xxx-xxx.rpm

      4. Run the following command to start the multipathing service:

        service nxup start

  4. Check whether the node can communicate with the storage and management networks of the storage device.

    Log in to the FusionSphere OpenStack web client, choose O&M > System Check, execute the Checking the Storage Driver Status test case, and check whether error information contains the IP addresses which cannot be pinged.

  5. Contact technical support for assistance.

cinder-proxy Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. On the FusionSphere OpenStack web client, check whether the backend storage configuration items are correct.
    1. Log in to the FusionSphere OpenStack web client and choose Configuration > Manage Resource Pool > KVM > Configure Storage Cluster.
    2. Check whether the REST URL, storage resource pool name, storage IP address, username, and password are correctly configured.
  2. If the Huawei disk array is connected, check whether the Huawei multipathing function is enabled and the multipathing software package is correctly installed.
    • Log in to the node where cinder-proxy is located, run the grep ultra /etc/cinder/cinder-proxy00X.conf command (X indicates 1, 2, or 3), and check whether the multipathing function is enabled.
      NOTE:
      • The process name of nova-proxy is cinder-proxy00X. X indicates a digit (such as 1, 2, or 3) which is generated in sequence when the cascaded FusionSphere OpenStack system is connected. For example, the process name is cinder-proxy001 when the first cascaded system is connected, and the process name is cinder-proxy002 when the second cascaded system is connected. Other process names can be deduced in the same way.
      • The following uses cinder-proxy001 as an example to describe how to query the host where the component is located: Run the cps template-instance-list --service cinder cinder-proxy001 command. If the command output contains the host IP address, you can log in to the host using the IP address.
      • true indicates that the multipathing function is enabled.
      • false indicates that the multipathing function is disabled. In this case, log in to the FusionSphere OpenStack web client, choose Configuration > OpenStack > Cinder > Configure Storage Multipathing, select ultrapath, and click Submit.
    • Run the ps -ef | grep upservice command to check whether the multipathing service is normally enabled.
      • If yes, the multipathing service is normally enabled.
      • If no, the multipathing service is not enabled. In this case, run the rpm -qa | grep UltraPath command. If no information is displayed, the Huawei multipathing package is not installed. You need to search for "Use storage multipathing" in the product documentation and install and configure multipathing based on the product documentation. Otherwise, contact technical support for assistance.
  3. Check whether the node can communicate with the storage and management networks of the storage device.
    • Log in to the FusionSphere OpenStack web client, choose O&M > System Check, execute the Checking the Storage Driver Status test case, and check whether error information contains the IP addresses which cannot be pinged.
  4. Contact technical support for assistance.

nova-compute-ironic Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.

    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.

  2. Run the following command to check whether the status of the nova-compute-ironic component is normal:

    cps template-instance-list --service ironic nova-compute-ironic

    Check whether the service status is active.

    • If yes, go to 4.
    • If no, go to 3.

  3. Check the status of the host.

    cps host-list | grep fault

    If the host status is fault, go to 4.

  4. Contact technical support for assistance.

Glance Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.

    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.

  2. Run the following command to check whether the Glance service is normal:

    cps template-instance-list --service glance glance

    Check whether the service status is active.

    • If yes, go to 4.
    • If no, go to 3.

  3. Check the status of the host.

    cps host-list | grep fault

    • If the host status is fault, go to 4.

  4. Contact technical support for assistance.

GaussDB Component Troubleshooting

Possible Causes

  • The component is running improperly.
  • The system environment is damaged.
  • A network exception occurs.
  • The host disk space is used up.
  • The active database is faulty, and the standby database fails to switch to the active one due to abnormal data synchronization.

Troubleshooting Guideline

  1. Check whether the node status is normal.
  2. Check whether the node is powered off.
  3. Check whether the node network is reachable.
  4. Check whether packet loss occurs during data synchronization.
  5. Check whether the disk space is insufficient and the database is faulty.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.
  2. Run the following command to check the GaussDB status:

    cps template-instance-list --service gaussdb gaussdb

    NOTE:

    In separated deployment scenarios, replace gaussdb with the corresponding name, for example, gaussdb_keystone.

    Check whether the GaussDB statuses in the command output contain fault.

    • If yes, run the cps host-list | grep$hostid command to check whether the node status is normal.
      • If yes, go to 7.
      • If no, the node is faulty, and you need to identify the fault cause. In this case, go to 3.
    • If no, go to 7.
  3. Check whether the node is powered off.
    • If yes, power on the node.
    • If no, go to 4.
  4. Check the network status.

    Run the cps host-list | grep $hostid command to query the management IP address of the faulty node. Then, run the ping $manageip command to check whether the management IP address can be pinged.

    • If yes, go to 5.
    • If no, the fault node is unreachable, and you need to handle the network fault.
  5. Check whether the status of one GaussDB component is active and that of the other is fault based on the results in 2.
    • If yes, go to 6.
    • If no, go to 7.
  6. Check whether packet loss occurs during data synchronization.

    Log in to the active GaussDB node and run the following command to obtain the IP address for GaussDB data synchronization:

    ip addr show brcps|grep -E "gaussdb0|gaussdb1"

    Log in to the faulty GaussDB node and ping the obtained IP address for three minutes. Check whether any packet is lost.

    • If yes, handle the packet loss fault.
    • If no, go to 7.
  7. Identify and rectify the fault by performing operations provided in section "Rectifying Database Faults" in the product documentation.
  8. Contact technical support for assistance.

Keystone Component Troubleshooting

Possible Causes

  • The component is running improperly.
  • The system environment is damaged.
  • The network is abnormal.

Troubleshooting Guideline

  1. Check whether the node status is normal.
  2. Check whether the node is powered off.
  3. Check whether the node network is reachable.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.
  2. Run the following command to check the Keystone status:

    cps template-instance-list --service keystone keystone

    Check whether the Keystone statuses in the command output contain fault.

    • If yes, run the cps host-list | grep$hostid command to check whether the status of the node is normal.
      • If yes, go to 6.
      • If no, the node is faulty, and you need to identify the fault cause. In this case, go to 3.
    • If no, go to 6.
  3. Check whether the node is powered off.
    • If yes, power on the node.
    • If no, go to 4.
  4. Check the network status.

    Run the cps host-list | grep $hostid command to query the management IP address of the faulty node. Then, run the ping $manageip command to check whether the management IP address can be pinged.

    • If yes, go to 5.
    • If no, the fault node is unreachable, and you need to handle the network fault.
  5. Run the following command to check the GaussDB status:

    cps template-instance-list --service gaussdb gaussdb

    Check whether the GaussDB statuses in the command output contain fault.

  6. Contact technical support for assistance.

RabbitMQ Component Troubleshooting

Possible Causes

  • The component is running improperly.
  • The system environment is damaged.
  • The network is abnormal.

Troubleshooting Guideline

  1. Check whether the node status is normal.
  2. Check whether the node is powered off.
  3. Check whether the node network is reachable.
  4. Check whether the IP address defined in rabbitmq_extend_ip is used by other nodes. If the IP address is used, IP address conflict may occur.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    1. Import environment variables. For details, see Importing Environment Variables.
  2. Run the following command to check the RabbitMQ status:

    cps template-instance-list --service rabbitmq rabbitmq

    Check whether the RabbitMQ statuses in the command output contain fault.

    • If yes, run the cps host-list | grep$hostid command to check whether the status of the node is normal.
      • If yes, go to 11.
      • If no, the node is faulty, and you need to identify the fault cause. In this case, go to 3.
    • If no, go to 11.
  3. Check whether the node is powered off.
    • If yes, power on the node.
    • If no, go to 4.
  4. Check the network status.

    Run the cps host-list | grep $hostid command to check manageip of the faulty node. Run the ping $manageip command to check whether the IP address defined in manageip can be pinged.

    • If yes, go to 5.
    • If no, the fault node is unreachable, and you need to handle the network fault.
  5. Run the following command to obtain the IP address of the active RabbitMQ node:

    cps template-instance-list --service rabbitmq rabbitmq

    +------------+---------------+---------+--------------------------------------+------------+
    | instanceid | componenttype | status  | runsonhost                           | omip       |
    +------------+---------------+---------+--------------------------------------+------------+
    | 0          | rabbitmq      | active  | 6F4AB81F-B799-8348-A691-E29D5B25EC64 | 10.12.53.23 |
    | 1          | rabbitmq      | standby | CD7D28B5-8958-A048-9B61-0E4F2B08A15A | 10.12.53.86 |
    +------------+---------------+---------+--------------------------------------+------------+
  6. Use PuTTY to log in to the RabbitMQ node using the IP address of the active node.
  7. Import environment variables. For details, see Importing Environment Variables.
  8. Run the cps template-params-show --service rabbitmq rabbitmq command to check whether the IP address and systeminterface in rabbitmq_extend_ip can be obtained.
    CD7D28B5-8958-A048-9B61-0E4F2B08A15A:/home/fsp # cps template-params-show --service rabbitmq rabbitmq
    +------------------------+----------------------------------------------------+
    | Property               | Value                                              |
    +------------------------+----------------------------------------------------+
    | heartbeat_timeout      | 6                                                  |
    | memory_high_watermark  |                                                    |
    | om_monitor_ip          | {}                                                 |
    | rabbit_password        | 1#4Q1osznCm5jJawCBI/cllelcDyYGd6ZSZxdqxbxsquCPv/9l |
    |                        | mWN2FCSeluDgkvbc                                   |
    | rabbit_use_ssl         | true                                               |
    | rabbitmq_extend_ip     | [{"ip": "10.10.10.10", "systeminterface": "externa |
    |                        | l_om", "mask": "24", "gateway": "10.10.10.1"}]      |
    | socket_connection_mode | CHAP                                               |
    | ssl_version            | TLSv1_1,TLSv1_2                                    |
    +------------------------+----------------------------------------------------+
    • If yes, go to 9.
    • If no, go to 11.
  9. Run the following command to check whether the MAC address corresponding to rabbitmq_extend_ip can be obtained:

    arping -D -I systeminterface rabbitmq_extend_ip

    6F4AB81F-B799-8348-A691-E29D5B25EC64:~ # arping -D -I external_om 10.10.10.10
    ARPING 10.10.10.10 from 0.0.0.0 external_om
    Unicast reply from 10.10.10.10 [FA:16:3E:1A:9C:A1]  1.085ms
    Sent 1 probes (1 broadcast(s))
    Received 1 response(s)
    • If yes, go to 10.
    • If no, go to 11.
  10. Run the following command to check whether the MAC address is used by the local node:

    ifconfig|grep -i MAC address

    • If yes, the IP address defined in rabbitmq_extend_ip is used by the local node, and no IP address conflict occurs. In this case, go to 11.
    • If no, an IP address conflict occurs. Check whether the IP address plan is correct.
  11. Contact technical support for assistance.

neutron-server Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (the runsonhost column indicates the host ID):

    cps template-instance-list --service neutron neutron-server

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep C4D9BE13-D21D-B211-82D2-000000821800

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-server --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Check whether the configuration items of the neutron-server component are normal. The common configuration error cases are as follows:
    • Check whether qos_extension and ac_qos_plugin are simultaneously configured. Run the cat/etc/neutron/neutron.conf | grep service_plugins command to check whether qos_extension and ac_qos_plugin exist in service_plugins. If yes, run the cps command to delete one of the configuration items.
    • Check whether the value of the neutron-server cfg file is null. Run the cat /etc/huawei/fusionsphere/neutron.neutron-server/cfg/neutron.neutron-server.cfg | grep null command to check whether any value is null. If yes, contact technical support for assistance.
    • Check whether the AC-DCN plug-in is correctly configured. In the scenario where AC-DCN is connected, if you forget to enter a value or set the value to null when configuring /etc/neutron/huawei_driver_config.ini, neutron-server becomes faulty. The common issue is that rpc_server_ip or the host is not configured. You are advised to check the configurations in /etc/neutron/huawei_driver_config.ini of the controller nodes one by one based on the networking plan. In addition, you need to delete the configuration items which do not need to be configured. If the pound (#) is deleted, there is not the default configuration, and therefor you need to ensure that the entered value is correct. For example, # ac_auth_password = indicates that the default password is used. If the pound (#) is deleted (ac_auth_password =), the password is left blank, and the neutron-server component is faulty.
    • Check whether service_plugins matches service_provider. If they do not match (because they are incorrectly spelled or you forget to configure one of them), the neutron-server service is abnormal. If the load balancing (for example, F5 is connected) or the port image (neutron-taas-agent) is used in the environment, you are advised to check the configurations again based on the commissioning documents.

    If any configuration is incorrect, rectify the fault and run the following command to start the component:

    cps host-template-instance-operate --service neutron neutron-server --action start

    • Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.
    • If the fault persists, go to the next step.
  4. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and the faulty component. Contact technical support for assistance.

neutron-l3-service-agent Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component configuration is abnormal.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check whether the component configuration is correct.

Procedure

  1. Log in to the controller node in the cascading FusionSphere OpenStack system, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    If a host in the fault status exists, an exception occurs in the environment. In this case, identify the cause of node faults.

  3. Run the following command to query the configuration of the neutron-l3-service-agent component:

    cps template-params-show --service neutron neutron-l3-service-agent01

    The command output shows that fip_mappings, hm_ip_mappings, and vrouter_mappings have been configured and their formats are correct.

    fip_mappings and vrouter_mappings are lists in the format of Node ID:seq value. The seq value is an integer and ranges from 1 to 16.

    hm_ip_mappings is a list in the format of Node ID:IP address/mask.

    If the configuration format is incorrect, run the following commands to reconfigure it. Replace XXX=YYY with the actual configuration item and value.

    cps template-params-update --service neutron neutron-l3-service-agent01 --parameter XXX=YYY

    cps commit

  4. Wait for 1 minute and check whether the component is restored. If no, contact technical support for assistance.

neutron-vrouter Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component configuration is abnormal.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check whether the component configuration is correct.

Procedure

  1. Log in to the controller node in the cascading FusionSphere OpenStack system, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Check whether the status of the host is normal.

    cps host-list | grep fault

    Check whether the host status is fault.

    If a host in the fault status exists, an exception occurs in the environment. In this case, identify the cause of node fault.

  3. Run the following command to check whether the neutron-l3-service-agent component is normal:

    cps template-instance-list --service neutron neutron-l3-service-agent01

    You can confirm that the neutron-l3-service-agent component has been deployed and the status is normal, and you need to preferably check the neutron-l3-service-agent component fault.

  4. Run the following command to query the ID of the node where the neutron-vrouter component is deployed:

    cps template-instance-list --service vrouter neutron-vrouter01

  5. Run the following command to query the neutron-l3-service-agent configuration:

    cps template-params-show --service neutron neutron-l3-service-agent01

  6. Check whether the ID in the vrouter_mappings configuration item obtained from 5 is the same as the ID of the node where the neutron-vrouter component is deployed in 4, and whether the seq values range from 1 to 16 and are not equal. If the configuration is incorrect, run the following command to modify the neutron-l3-service-agent configuration:

    cps template-params-update --service neutron neutron-l3-service-agent01 --parameter vrouter_mappings= 'Node 1 ID:seq1, Node 2 ID:seq2'

    For example: cps template-params-update --service neutron neutron-l3-service-agent01 --parameter vrouter_mappings="7A7C7D29-D21D-B211-AB16-0018E1C5D866:1,9236322A-D21D-B211-A515-0018E1C5D866:2"

    Run the cps commit command to submit the configuration.

  7. Wait for 1 minute and check whether the neutron-vrouter component is restored. If no, contact technical support for assistance.

neutron-ipv6-vrouter Component Troubleshooting

Possible Causes

  • An exception occurred on the network.
  • The component configuration is abnormal.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check whether the component configuration is correct.

Procedure

  1. Log in to the controller node in the cascading FusionSphere OpenStack system, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Check whether the status of the host is normal.

    cps host-list | grep fault

    Check whether the host status is fault.

    If a host in the fault status exists, an exception occurs in the environment. In this case, identify the cause of node fault.

  3. Run the following command to check whether the neutron-l3-service-agent component is normal:

    cps template-instance-list --service neutron neutron-l3-service-agent01

    You can confirm that the neutron-l3-service-agent component has been deployed and the status is normal, and you need to preferably check the neutron-l3-service-agent component fault.

  4. Run the following command to query the ID of the node where the neutron-ipv6-vrouter component is deployed:

    cps template-instance-list --service ipv6-vrouter neutron-ipv6-vrouter01

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 75AFAB2E-B994-A542-A8B3-7B38D77E9CDB

  5. Run the following command to manually start the neutron-ipv6-vrouter component:

    cps host-template-instance-operate --service ipv6-vrouter neutron-ipv6-vrouter01 --action start

  6. Wait for 1 minute and check whether the neutron-ipv6-vrouter component is restored. If no, contact technical support for assistance.

neutron-l3-nat-agent Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component configuration is incorrect.
  • The system environment is damaged.
  • The dependent component is abnormal.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.
  4. Ensure that the dependent component is normal.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    • Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    • Run the following command and enter the password of user root to switch to user root:

      su - root

    • Import environment variables. For details, see Importing Environment Variables.
  2. Typically, the component is deployed in only the cascading FusionSphere OpenStack system in the Region Type I scenario. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (In the command output, runsonhost indicates the host ID, and neutron-l3-nat-agent01 indicates the component name which needs to be set based on the site requirements):

    cps template-instance-list --service nat-server neutron-l3-nat-agent01

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 688DC013-69DE-904B-B5C4-B299966D019C

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service nat-server neutron-l3-nat-agent01 --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command to check the configuration of the faulty component:

    cps template-params-show --service nat-server neutron-l3-nat-agent01

    Ensure that the host ID in the host_ip_mapping configuration item is the same as the ID of the host where the component is deployed and the host ID format is in the format of natserver_hostid1:IP11/mask11,natserver_hostid2:IP21/mask21.

    If any configuration is incorrect, rectify the fault and run the following command to start the component:

    cps host-template-instance-operate --service nat-server neutron-l3-nat-agent01 --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to the next step.
  4. Ensure that the statuses of neutron-l3-service-agent01 (the component name whose suffix can be 01, 02, or 03) and neutron-vrouter are normal (active). If any exception occurs, rectify the fault based on the corresponding guide and run the following command to start the component:

    cps host-template-instance-operate --service neutron neutron-l3-service-agent01 --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to the next step.
  5. Run the following command to query the fip_mappings configuration of neutron-l3-service-agent (In the command, neutron-l3-service-agent01 indicates the component name, and the suffix can be 01, 02, or 03):

    cps template-params-show --service neutron neutron-l3-service-agent01

    Check whether the ID in the fip_mappings configuration item is the same as that of the node where the neutron-l3-nat-agent component is deployed and whether the seq value ranges from 1 to 16 and is unique. If no, run the following commands to modify the configuration:

    cps template-params-update --service neutron neutron-l3-service-agent01 --parameter fip_mappings="Node1 ID:seq1,Node 2 ID:seq2"

    For example: cps template-params-update --service neutron neutron-l3-service-agent01 --parameter fip_mappings="7A7C7D29-D21D-B211-AB16-0018E1C5D866:1,9236322A-D21D-B211-A515-0018E1C5D866:2"

    cps commit

    Wait for 1 minute and check whether the component fault is rectified.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to the next step.
  6. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-cascading-proxy Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Typically, the component is deployed in only the cascading FusionSphere OpenStack system in the Region Type I scenario. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (In the command output, runsonhost indicates the host ID, and neutron-cascading-proxy001 indicates the component name which needs to be set based on the site requirements):

    cps template-instance-list --service neutron neutron-cascading-proxy001

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 5C300214-D21D-B211-82D2-000000821800

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-cascading-proxy001 --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command on the controller node in the AZ where the faulty component resides to check the configuration of the faulty component (In the command, neutron-cascading-proxy001 indicates the component name which needs to be set based on the site requirements):

    cps template-params-show --service neutron neutron-cascading-proxy001

    Ensure that configuration items, such as cascaded_region_name, host, and nova_hosts, are correctly configured and the corresponding cascading system has been interconnected with the corresponding AZ.

    If any configuration is incorrect or the cascading system is not interconnected with the cascaded system, rectify the fault and run the following command to start the component:

    cps host-template-instance-operate --service neutron neutron-cascading-proxy001 --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to the next step.
  4. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-openvswitch-agent Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.
  • The dependent component is abnormal.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.
  4. Ensure that the dependent component is normal.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (the runsonhost column indicates the host ID):

    In systems except cascaded systems in Region Type I, run the following command:

    cps template-instance-list --service neutron neutron-openvswitch-agent

    In Region Type I in the cascaded system, run the following command:

    cps template-instance-list --service network-agent neutron-openvswitch-agent

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 5C300214-D21D-B211-82D2-000000821800

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      In systems except cascaded systems in Region Type I, run the following command:

      cps host-template-instance-operate --service neutron neutron-openvswitch-agent --action start

      In Region Type I in the cascaded system, run the following command:

      cps host-template-instance-operate --service network-agent neutron-openvswitch-agent --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command on the controller node in the AZ where the faulty component resides:

    cps template-instance-list --service neutron neutron-server

    Ensure that the status of the neutron-server component in the same AZ is normal. If any exception occurs, rectify the fault based on the corresponding guide and run the following command to start the component:

    In systems except cascaded systems in Region Type I, run the following command:

    cps host-template-instance-operate --service neutron neutron-openvswitch-agent --action start

    In Region Type I in the cascaded system, run the following command:

    cps host-template-instance-operate --service network-agent neutron-openvswitch-agent --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • Otherwise, go to the next step.
  4. Check whether the underlying OVS is normal and run the following command on the node with the faulty component deployed:

    ps -ef | grep ovs-vswitchd | grep -v grep

    If a process exists, go to the next step. If no process exists, check the OVS process in the underlying OS. Run the following command to restart the component:

    In systems except cascaded systems in Region Type I, run the following command:

    cps host-template-instance-operate --service neutron neutron-openvswitch-agent --action stop

    cps host-template-instance-operate --service neutron neutron-openvswitch-agent --action start

    In Region Type I in the cascaded system, run the following command:

    cps host-template-instance-operate --service network-agent neutron-openvswitch-agent --action stop

    cps host-template-instance-operate --service network-agent neutron-openvswitch-agent --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • Otherwise, go to the next step.
  5. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-garbage-collector Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (the runsonhost column indicates the host ID):

    cps template-instance-list --service neutron neutron-garbage-collector

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 6A797A84-C386-119F-8567-000000821800

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-garbage-collector --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command on the controller node in the AZ where the faulty component resides:

    cps template-instance-list --service gaussdb gaussdb

    Ensure that the status of the GaussDB component in the same AZ is normal. If any exception occurs, rectify the fault based on the corresponding guide and run the following command to start the component:

    cps host-template-instance-operate --service neutron neutron-garbage-collector --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • Otherwise, go to the next step.
  4. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-l3-dummy-agent Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Typically, the component is deployed in only the cascaded FusionSphere OpenStack system in the Region Type I scenario. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (the runsonhost column indicates the host ID):

    cps template-instance-list --service neutron neutron-l3-dummy-agent

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep E8649029-D21D-B211-8F5B-0018E1C5D866

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-l3-dummy-agent --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command on the controller node in the AZ where the faulty component resides:

    cps template-params-show --service neutron neutron-l3-dummy-agent

    Ensure that the configuration items of the component are correct, such as bridge_mappings, host, mq_instance, and use_state_machine (False).

    If the configuration is incorrect, run the following commands (you need to change XXX=YYY to the actual parameter and value) to rectify the configuration:

    cps template-params-update --service neutron neutron-l3-dummy-agent --parameter XXX=YYY

    cps commit

    Run the following command to start the component:

    cps host-template-instance-operate --service network-agent neutron-dvr-compute-agent --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to the next step.
  4. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-dvr-compute-agent Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Typically, the component is deployed in only the cascaded FusionSphere OpenStack system in the Region Type I scenario. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (the runsonhost column indicates the host ID):

    cps template-instance-list --service network-agent neutron-dvr-compute-agent

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 82598981-A719-E811-865C-785860655392

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service network-agent neutron-dvr-compute-agent --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command on the controller node in the AZ where the faulty component resides:

    cps template-params-show --service network-agent neutron-dvr-compute-agent

    Ensure that the configuration items of the component are correct, such as enable_mq_cluster, interface_driver, and mq_instance.

    If the configuration is incorrect, run the following commands (you need to change XXX=YYY to the actual parameter and value) to rectify the configuration:

    cps template-params-update --service network-agent neutron-dvr-compute-agent --parameter XXX=YYY

    cps commit

    Run the following command to start the component:

    cps host-template-instance-operate --service network-agent neutron-dvr-compute-agent --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • Otherwise, go to the next step.
  4. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-dhcp-agent Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.

    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.

  2. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (the runsonhost column indicates the host ID):

    In systems except cascaded systems in Region Type I, run the following command:

    cps template-instance-list --service neutron neutron-dhcp-agent

    In Region Type I in the cascaded system, run the following command:

    cps template-instance-list --service network-agent neutron-dhcp-agent

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep E8649029-D21D-B211-8F5B-0018E1C5D866

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      In systems except cascaded systems in Region Type I, run the following command:

      cps host-template-instance-operate --service neutron neutron-dhcp-agent --action start

      In Region Type I in the cascaded system, run the following command:

      cps host-template-instance-operate --service network-agent neutron-dhcp-agent --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to 3.

  3. Run the following command on the controller node in the AZ where the faulty component resides:

    In systems except cascaded systems in Region Type I, run the following command:

    cps template-params-show --service neutron neutron-dhcp-agent

    In Region Type I in the cascaded system, run the following command:

    cps template-params-show --service network-agent neutron-dhcp-agent

    Ensure that the configuration items of the component are correct, such as enable_mq_cluster, interface_driver, and mq_instance. dhcp_distributed (True) is unique in the cascaded FusionSphere OpenStack system.

    If the configuration is incorrect, run the following commands (you need to change XXX=YYY to the actual parameter and value) to rectify the configuration:

    In systems except cascaded systems in Region Type I, run the following command:

    cps template-params-update --service neutron neutron-dhcp-agent --parameter XXX=YYY

    cps commit

    In Region Type I in the cascaded system, run the following command:

    cps template-params-update --service network-agent neutron-dhcp-agent --parameter XXX=YYY

    cps commit

    Run the following command to start the component:

    In systems except cascaded systems in Region Type I, run the following command:

    cps host-template-instance-operate --service neutron neutron-dhcp-agent --action start

    In Region Type I in the cascaded system, run the following command:

    cps host-template-instance-operate --service network-agent neutron-dhcp-agent --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to 4.

  4. neutron-dhcp-agent depends on the neutron-openvswitch-agent component during startup. If the neutron-openvswitch-agent component is abnormal, the neutron-dhcp-agent component becomes abnormal. Check whether the status of the neutron-openvswitch-agent component in the same AZ is normal (active). If an exception occurs, rectify the fault based on neutron-openvswitch-agent Component Troubleshooting and run the following command to start the component:

    cps host-template-instance-operate --service neutron neutron-dhcp-agent --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to 5.

  5. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-metadata-agent Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (the runsonhost column indicates the host ID):

    In systems except cascaded systems in Region Type I, run the following command:

    cps template-instance-list --service neutron neutron-metadata-agent

    In Region Type I in the cascaded system, run the following command:

    cps template-instance-list --service network-agent neutron-metadata-agent

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep E8649029-D21D-B211-8F5B-0018E1C5D866

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-metadata-agent --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command on the controller node in the AZ where the faulty component resides:

    In systems except cascaded systems in Region Type I, run the following command:

    cps template-params-show --service neutron neutron-metadata-agent

    In Region Type I in the cascaded system, run the following command:

    cps template-params-show --service network-agent neutron-metadata-agent

    Ensure that the configuration items of the component are correct, such as auth_url, enable_mq_cluster, mq_instance, and os_region_name. In Region Type I scenarios, os_region_name is used to distinguish the cascading and cascaded FusionSphere OpenStack systems.

    If the configuration is incorrect, run the following command while changing XXX=YYY to a specific parameter and its value. For the cascaded system, you need to change neutron next to service to network-agent.

    cps template-params-update --service neutron neutron-metadata-agent--parameter XXX=YYY

    cps commit

    Run the following command to start the component:

    cps host-template-instance-operate --service neutron neutron-metadata-agent--action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • Otherwise, go to the next step.
  4. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-vc-vswitch-agent Component Troubleshooting

Possible Causes

  • A network exception occurs.
  • The component is running improperly.
  • The system environment is damaged.
  • FusionSphere OpenStack cannot communicate with vSphere.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check whether the node can ping vSphere.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If yes, the environment is faulty. Preferentially check the cause for the node faults.
    • If no, go to 3.
  3. Check whether the communication between FusionSphere OpenStack and vSphere is normal.

    If no, an exception occurs on neutron-vc-vswitch-agent. Typically, the production site and DR site are switched over. The production site and vSphere are connected, but the DR site and vSphere are disconnected. As a result, the neutron-vc-vswitch-agent component is abnormal. In this case, contact environment management personnel to enable the communication between the DR site and vSphere.

    If the fault persists, go to 4.

  4. Contact technical support for assistance.

neutron-elb-proxy Component Troubleshooting

Possible Causes

  • An exception occurred on the network.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Typically, the component is deployed in only the cascading FusionSphere OpenStack system in the Region Type I scenario. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (In the command output, runsonhost indicates the host ID, and neutron-elb-proxy01 indicates the component name which needs to be set based on the site requirements):

    cps template-instance-list --service neutron neutron-elb-proxy01

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 382944B4-B8C4-C0A5-E811-40B1FEA672CF

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-elb-proxy01 --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-l3-agent Component Troubleshooting

Possible Causes

The neutron-l3-agent component conflicts with the AC-DCN layer-3 plug-in function.

Troubleshooting Guideline

  1. If the component is connected to the AC-DCN, remove the component.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Remove neutron-l3–agent.

    In the scenarios where the AC-DCN is not connected, only the layer 2 function can be used, the component cannot be deployed, and you do not need to take it into consideration.

    In the scenario where the AC is connected, the AC-DCN provides the layer 3 function, and neutron-l3–agent is not needed. The component becomes faulty due to the function conflict or the check failure before the upgrade, and you need to remove the component from the router role.

    • cps role-update --parameter template=neutron.neutron-metering-agent --name router
    • cps commit

    If the fault persists, go to 3.

  3. Contact technical support for assistance.

neutron-nat-gw-dataplane Component Troubleshooting

Possible Causes

  • A network fault occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check whether the component configuration items are normal.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (the runsonhost column indicates the host ID):

    cps template-instance-list --service nat-gateway neutron-nat-gw-dataplane

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 799E8894-247A-144E-B0D6-8B66C08AC849

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service nat-gateway neutron-nat-gw-dataplane --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command on the controller node in the AZ where the faulty component resides:

    cps cluster-list

    Run the following command to query the name of the cluster where the gateway, for example, natgw_01_1 or natgw_01_2, is located:

    cps template-params-show --service nat-gateway neutron-nat-gw-data-agent --cluster natgw_01_2

    Ensure that the configuration items of the component are correct, for example, vpc_vlan_vip, vpc_vlan_gateway, and eip_in_vlan.

    If the configuration is incorrect, run the following commands (you need to change XXX=YYY to the actual parameter and value) to rectify the configuration:

    cps template-params-update --service nat-gateway neutron-nat-gw-dataplane --cluster natgw_01_2 --parameter XXX=YYY

    cps commit

    Run the following command to start the component:

    cps host-template-instance-operate --service nat-gateway neutron-nat-gw-dataplane --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to the next step.
  4. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-nat-gw-data-agent Component Troubleshooting

Possible Causes

  • A network fault occurs.
  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check whether the component configuration items are normal.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (the runsonhost column indicates the host ID):

    cps template-instance-list --service nat-gateway neutron-nat-gw-data-agent

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 799E8894-247A-144E-B0D6-8B66C08AC849

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service nat-gateway neutron-nat-gw-data-agent --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command on the controller node in the AZ where the faulty component resides:

    cps cluster-list

    Run the following command to query the name of the cluster where the gateway, for example, natgw_01_1 or natgw_01_2, is located:

    cps template-params-show --service nat-gateway neutron-nat-gw-data-agent --cluster natgw_01_2

    Ensure that the configuration items of the component are correct, such as local_ip.

    If the configuration is incorrect, run the following commands (you need to change XXX=YYY to the actual parameter and value) to rectify the configuration:

    cps template-params-update --service nat-gateway neutron-nat-gw-data-agent --cluster natgw_01_2 --parameter XXX=YYY

    cps commit

    Run the following command to start the component:

    cps host-template-instance-operate --service nat-gateway neutron-nat-gw-data-agent --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to the next step.
  4. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-ipv6-service-agent Component Troubleshooting

Possible Causes

  • An exception occurred on the network.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Typically, the component is deployed in only the cascading FusionSphere OpenStack system in the Region Type I scenario. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (In the command output, runsonhost indicates the host ID, and neutron-ipv6-service-agent01 indicates the component name which needs to be set based on the site requirements):

    cps template-instance-list --service neutron neutron-ipv6-service-agent01

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 382944B4-B8C4-C0A5-E811-40B1FEA672CF

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-ipv6-service-agent01 --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-ngfw-agent Component Troubleshooting

Possible Causes

  • An exception occurred on the network.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.
  4. Check whether the network between the host and the firewall is normal.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Typically, the component is deployed in only the cascading FusionSphere OpenStack system in the Region Type I scenario. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (In the command output, runsonhost indicates the host ID, and neutron-ngfw-agent01 indicates the component name which needs to be set based on the site requirements):

    cps template-instance-list --service neutron neutron-ngfw-agent01

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 382944B4-B8C4-C0A5-E811-40B1FEA672CF

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-ngfw-agent01 --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Run the following command on the controller node in the AZ where the faulty component resides:

    cps template-params-show --service neutron neutron-ngfw-agent01

    Ensure that the configuration items of the component are correct, such as director, director_port, ngfw_username, and ngfw_password.

    If the configuration is incorrect, run the following commands (you need to change XXX=YYY to the actual parameter and value) to rectify the configuration:

    cps template-params-update --service neutron neutron-ngfw-agent01 --parameter XXX=YYY

    cps commit

    Run the following command to start the component:

    cps host-template-instance-operate --service neutron neutron-ngfw-agent01 --action start

    Wait for 1 minute and check whether the component is in the active state.

    • If the component status is normal, no further action is required.
    • If the component status is abnormal, go to the next step.
  4. Check whether the network between the host and the firewall is normal.

    Ping the IP address of the firewall on the node where the component is located and check whether there is a session on the NGFW by running display firewall session table verbose. If no session exists, check the physical link configuration. If a session exists, check whether the NGFW is correctly configured. If the configuration is correct, go to the next step.

  5. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-fw-proxy Component Troubleshooting

Possible Causes

  • An exception occurred on the network.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check the common configuration items of the component.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Typically, the component is deployed in only the cascading FusionSphere OpenStack system in the Region Type I scenario. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (In the command output, runsonhost indicates the host ID, and neutron-fw-proxy01 indicates the component name which needs to be set based on the site requirements):

    cps template-instance-list --service neutron neutron-fw-proxy01

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 382944B4-B8C4-C0A5-E811-40B1FEA672CF

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-fw-proxy01 --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-ngfw-vpn-agent Component Troubleshooting

Possible Causes

  • An exception occurred on the network.
  • The component is running improperly.
  • The component configuration is incorrect.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Log in to the controller node in the AZ where the faulty component resides, switch to user root, and import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Typically, the component is deployed in only the cascading FusionSphere OpenStack system in the Region Type I scenario. Run the following command on the controller node in the AZ where the faulty component resides to check the node where the faulty component resides (In the command output, runsonhost indicates the host ID, and neutron-ngfw-vpn-agent01 indicates the component name which needs to be set based on the site requirements):

    cps template-instance-list --service neutron neutron-ngfw-vpn-agent01

    Run the following command to check whether the host status is normal (you need to replace the host ID in the command with the ID of the host where the faulty component locates):

    cps host-list | grep 382944B4-B8C4-C0A5-E811-40B1FEA672CF

    • The status is fault. Check the fault cause of the host. For example, check whether the host is powered on and whether the network status is normal (using the ping command). If an exception occurs, rectify the fault and then run the following command to start the component:

      cps host-template-instance-operate --service neutron neutron-ngfw-vpn-agent01 --action start

      Wait for 1 minute and check whether the component is in the active state. If yes, no further action is required.

    • If the status is normal, go to the next step.
  3. Invoke the health check tool to collect logs, including the logs of the node where the faulty component locates and logs of the faulty component. Contact technical support for assistance.

neutron-sriov-nic-agent Component Troubleshooting

Possible Causes

  • An exception occurred on the network.
  • The component configuration is abnormal.
  • The system environment is damaged.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If yes, the environment is faulty. Preferentially check the cause for the node faults.
    • If no, go to 3.
  3. Run the following command to check whether the neutron-sriov-nic-agent configuration items are normal:

    cps template-params-show --service neutron neutron-sriov-nic-agent

    Check whether the configuration items are different from those in the normal environment.

    • If yes, check the cause for the configuration item difference. If you have any questions, go to 4.
    • If no, go to 4.
  4. Contact technical support for assistance.

neutron-evs-agent Component Troubleshooting

Possible Causes

  • An exception occurred on the network.
  • The component is running improperly.
  • The system environment is damaged.
  • The hugepage memory is not configured.

Troubleshooting Guideline

  1. Check whether the node network is reachable.
  2. Check whether the node status is normal.
  3. Check whether the hugepage memory is configured.

Procedure

  1. Import environment variables.
    1. Use PuTTY to log in to the first host in the AZ.

      Ensure that the reverse proxy IP address and username fsp are used to establish the connection.

    2. Run the following command and enter the password of user root to switch to user root:

      su - root

    3. Import environment variables. For details, see Importing Environment Variables.
  2. Check the status of the host.

    cps host-list | grep fault

    Check whether the host status is fault.

    • If yes, the environment is faulty. Preferentially check the cause for the node faults.
    • If no, go to 3.
  3. Check whether the hugepage memory is configured.

    Run the cat /proc/meminfo |grep Huge command to check whether values of HugePages_Total and HugePages_Free in the command output are 0.

    • If yes, configure hugepage memory for the host group with the user-mode EVS bridge deployed by choosing Configuration > Kernel Option on the web client and restart the host for the configuration to take effect.
    • If no, go to 4.
  4. Contact technical support for assistance.

ceilometer-agent-compute Component Troubleshooting

Possible Causes

  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

Check whether the node status is normal.

Procedure

  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default user name is fsp. The default password is Huawei@CLOUD8.

    The system supports both password and public-private key pair for identity authentication. If the public-private key pair is used for login authentication, see detailed operations in Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

    NOTE:
    To obtain the IP address of the External OM plane, search for the required parameter on the Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
    • Region Type I scenario:

      Cascading system: Cascading-ExternalOM-Reverse-Proxy

      Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

    • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command and enter the password of user root to switch to user root:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  3. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  4. Import environment variables. For details, see Importing Environment Variables.
  5. Run the cps host-list | grep fault command to check whether the host status is fault.

    • If no, go to 6.
    • If yes, go to 7.

  6. Run the cps host-template-instance-operate --service ceilometer ceilometer-agent-compute --action start command. Wait for 1 minute and check whether the faulty component is in the active status.

    • If yes, no further action is required.
    • If no, go to 7.

  7. Contact technical support for assistance.

ceilometer-agent-hardware Component Troubleshooting

Possible Causes

  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

Check whether the node status is normal.

Procedure

  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default user name is fsp. The default password is Huawei@CLOUD8.

    The system supports both password and public-private key pair for identity authentication. If the public-private key pair is used for login authentication, see detailed operations in Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

    NOTE:
    To obtain the IP address of the External OM plane, search for the required parameter on the Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
    • Region Type I scenario:

      Cascading system: Cascading-ExternalOM-Reverse-Proxy

      Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

    • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command and enter the password of user root to switch to user root:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  3. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  4. Import environment variables. For details, see Importing Environment Variables.
  5. Run the cps host-list | grep fault command to check whether the host status is fault.

    • If no, go to 6.
    • If yes, go to 7.

  6. Run the cps host-template-instance-operate --service ceilometer ceilometer-agent-hardware --action start command. Wait for 1 minute and check whether the faulty component is in the active status.

    • If yes, no further action is required.
    • If no, go to 7.

  7. Contact technical support for assistance.

MongoDB Component Troubleshooting

Possible Causes

  • The component is running improperly.
  • The system environment is damaged.

Troubleshooting Guideline

Check whether the node status is normal.

Procedure

  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default user name is fsp. The default password is Huawei@CLOUD8.

    The system supports both password and public-private key pair for identity authentication. If the public-private key pair is used for login authentication, see detailed operations in Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

    NOTE:
    To obtain the IP address of the External OM plane, search for the required parameter on the Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
    • Region Type I scenario:

      Cascading system: Cascading-ExternalOM-Reverse-Proxy

      Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

    • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command and enter the password of user root to switch to user root:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  3. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  4. Import environment variables. For details, see Importing Environment Variables.
  5. Run the cps host-list | grep fault command to check whether the host status is fault.

    • If no, go to 6.
    • If yes, contact technical support for assistance.

  6. Restore the MongoDB component status. For details, see "How Do I Resolve Database Status Exceptions?" in the HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide.

ceilometer-collector Component Troubleshooting

Possible Causes

  • The component is running improperly, the component status is fault, and the active node is frequently switched over.
  • The system environment is damaged.

Troubleshooting Guideline

Check whether the node status is normal.

Procedure

  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default user name is fsp. The default password is Huawei@CLOUD8.

    The system supports both password and public-private key pair for identity authentication. If the public-private key pair is used for login authentication, see detailed operations in Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

    NOTE:
    To obtain the IP address of the External OM plane, search for the required parameter on the Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
    • Region Type I scenario:

      Cascading system: Cascading-ExternalOM-Reverse-Proxy

      Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

    • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command and enter the password of user root to switch to user root:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  3. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  4. Import environment variables. For details, see Importing Environment Variables.
  5. Run the cps host-list | grep fault command to check whether the host status is fault.

    • If no, go to 6.
    • If yes, go to 9.

  6. Run the following command to query the extended configurations of the ceilometer-collector component:

    cps template-ext-params-show --service ceilometer ceilometer-collector

    Check whether the configuration that starts with event_http_dispatcher.event_http_target_url_ exists in the command output.

    • If yes, go to 7.
    • If no, go to 9.

  7. Log in to a faulty node and check whether the IP address obtained in 6 can be pinged.

    • If yes, go to 8.
    • If no, configure the default gateway based on based on Modify host network configurations. in section "Modifying System Configurations" in the HUAWEI CLOUD Stack 6.5.0 O&M Guide. After the configuration is complete, wait for 1 minute and check whether the component status is normal. If no, go to 8.

  8. Run the openstack endpoint list | grep keystone command to check the Keystone domain name.

    Use identity.az1.dc1.domainname.com as an example. Run the ping identity.az1.dc1.domainname.com command to check whether the IP address can be pinged.
    • If yes, go to 9.
    • If no, configure required information based on Modify host network configurations. in section "Modifying System Configurations" in the HUAWEI CLOUD Stack 6.5.0 O&M Guide. After the configuration is complete, wait for 1 minute and check whether the component status is normal. If no, go to 9.

  9. Contact technical support for assistance.
Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 47878

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next