No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

After the KunLun Server Is Forcibly Powered Off and Powered On, the OS Cannot Be Accessed

Publication Date:  2018-06-19 Views:  183 Downloads:  0
Issue Description

The conclusion of this problem is as follows: In the scenario where the root partition of the operating system spans local and remote storage, the configuration scenario is incorrect. That is, this problem is caused by improper configuration. The forcible power-off operation causes the file system in the root partition to be damaged.As a result, the grub boot program fails to boot from the root partition during the restart. As a result, the system fails to start.




Alarm Information

The SUSE operating system fails to be started.

Handling Process

1、   According to the SEL system
event logs, two PSUs were faulty at 2:36 in the early morning on February 7.

After communication with the customer, it was found that one UPS was faulty.



2、After the customer found that
the power supply was abnormal, the customer prepared to perform power supply rectification. Before the rectification, onsite maintenance personnel performed two power off operations on 16:10 and 16:17 in the afternoon of February 7. Why did the Power off do not take effect? No analysis is found. The force power off operation is performed on 16:22 to forcibly power off the system.
[SL1] 

3、At about 21:59 on February 7,the power supply was restored in the equipment room.


4、At about 12:47 on February 8,onsite maintenance personnel powered on the physical partition.



At about 18:36 on February 8, the frontline service personnel contacted Huawei R&D engineers and reported that the device failed to enter the OS and stopped at the grub interface.













Root Cause

For Linux operating systems such as SUSE Linux, forcible power-off or other abnormal power-off operations are risky operations. As a result, data in the file system cache is abnormally interrupted. As a result, the metadata, data blocks, and log data in the file system are inconsistent, and the file system is damaged..
In addition, the root partition of the operating system is created on the same volume group (as shown in Figure 1) of the local physical volume and the remote storage physical volume. This configuration mode is determined by the SUSE factory as an improper configuration mode. The read and write data of the LVM volume is distributed to the local or remote PV. Because the physical attributes such as the data transmission rate and power supply of the local and remote ends are different, forcible power-off will increase the probability of file system integrity damage.

Solution

1、After the system is started, the grub interface is stopped, as shown in the following figure.

2、  Run the ls command to list the partitions of the records in the partition table.

Confirm with the onsite installation personnel.(lvm/vg_os-root is the root partition of the system.) The files required by theGrub to boot the operating system are in the root partition.
3、 When you run the ls (lvm/vg_os-root) / command to view the directory structure of the root partition, the error message "Error: the server is not specified" is displayed, indicating that the lvm partition is incomplete. In addition, you can see the following information according to the prompt of the ls command: The grub cannot read the root partition (only lvm/vg_os-usr_sap and lvm/vg_os-swap can be read).

The root partition is abnormal. As a result, the grub cannot boot the system from the root partition.

4、Mount the Toolkit CD-ROM system. Use the file system repair command fsck to detect and repair the root partition. If hundreds of errors are found,it indicates that the file system of the root partition is severely damaged.



After the repair is complete, restart the system. The system still stops at the grub stage. The symptom is the same as that before the recovery. This indicates that the damage degree of the root partition file system exceeds the fsck recovery capability and cannot be restored.
After the preceding symptom is reported to the SUSE factory, the analysis conclusion received from the original manufacturer is as follows:
The analysis conclusion is the same as that of Huawei.










Suggestions

  For details about how to install the SAP HANA and configure the VG,see HYPERLINK"http://support.huawei.com/enterprise/zh/doc/DOC1000152143?idPath=7919749%257C9856522%257C21782478%257C21782482%257C9332560"Huawei SAP HANA Appliance Broadwell Platform Installation Guide.

END