No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

File System error in SAP HANA

Publication Date:  2017-08-07 Views:  100 Downloads:  0
Issue Description

Customer has sent mail to us with the below picture that they found some error messages which indicates file system corrupted .



Last time, we have shared the requirement for the SAP Hana.

As we check, the /hana/log size is smaller than the requirement and the usage hit 100%. Can I know since when you’re having this issue?


Alarm Information

Customer is facing some Alarm on two disk as below image  ; 


Handling Process

We given customer “Run command “fdisk” and feedback the result to us”

Fdisk Command Output Below :-

hslecc-sandb:/hana/log1 # fdisk -l

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdb: 10794.0 GB, 10793999400960 bytes

255 heads, 63 sectors/track, 1312295 cylinders, total 21082030080 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 4096 bytes

I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk identifier: 0x00000000

Device Boot Start End Blocks Id System

/dev/sdb4 1 1 0+ ee GPT

Partition 4 does not start on physical sector boundary.

Disk /dev/sdc: 1197.0 GB, 1196999835648 bytes

255 heads, 63 sectors/track, 145526 cylinders, total 2337890304 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 4096 bytes

I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk identifier: 0xffffffff

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/sda: 599.0 GB, 598999040000 bytes

255 heads, 63 sectors/track, 72824 cylinders, total 1169920000 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00037ac5

Device Boot Start End Blocks Id System

/dev/sda1 * 304078848 306182143 1051648 83 Linux

/dev/sda2 369092608 373286911 2097152 82 Linux swap / Solaris

/dev/sda3 373286912 478142463 52427776 83 Linux

Disk /dev/nvme0n1: 1600.3 GB, 1600321314816 bytes

64 heads, 32 sectors/track, 1526185 cylinders, total 3125627568 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

Disk /dev/nvme0n1 doesn't contain a valid partition table

Disk /dev/nvme1n1: 1600.3 GB, 1600321314816 bytes

64 heads, 32 sectors/track, 1526185 cylinders, total 3125627568 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

Disk /dev/nvme1n1 doesn't contain a valid partition table

Disk /dev/md0: 1600.2 GB, 1600187072512 bytes

2 heads, 4 sectors/track, 390670672 cylinders, total 3125365376 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

Disk /dev/md0 doesn't contain a valid partition table

hslecc-sandb:/hana/log1 #


We suggest customer to that this is not from us Hardware side , kindly contact to OS vendor .

We requesting to collect iBMC logs . Then we arrange for remote for support .

We found disk have alarm & disk went offline. We have manually bring the disk online.

The VD is Online now.



Both disk 2 and disk 5 is online.





As I check the SDB1 (hana/data) is available now. Please check and confirm that all the data is available and all your business is resume.



Root Cause

We collect OS logs for further rectification . After restarting the server we found disk offiline automatically .

On logs, we find there is several disk generated alarm and clear after few second




There are some error code in phy link 24 ~31 in sasraidlog.txt.




Phy 0~23 is hard disk slot 0~23, the other phy Number is link,


The phyID 24~31 have many error code, they represent the problem on the link (include the HDDs back plane, raid card and SAS cables).

The reason of the link error code may be poor cable contact, loose, dust, static electricity, part damage and so on.

So we replace part on SAS link. We replaced 23HDDs back plane, raid card and SAS cables.

Solution

Solution :- Replace SAS cable , HDD backplane & Raid Card

We arrange new parts for replacement by RMA ;

2 Units of SAS Cable

1 Unit of Raid Card

1 Unit of Hard disk backplane

After replacement all the disk is Online now. Also when we check, the path is available as shown below.


END