No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Controller Reset Caused by Reference Count Overflow That Occurred in Oceanstor 6800V3 during Background Disk Scanning

Publication Date:  2016-08-30 Views:  93 Downloads:  0
Issue Description

Trigger condition

l   A disk domain is created, where there are idle blocks for which LUN space is not allocated. (By default, no space is allocated to a thin LUN. Space is allocated to a thin LUN only when data is written to the thin LUN.)

l   The background scanning policy is enabled for the storage system. (The policy is enabled by default.)

Version: Oceanstor 6800V300R001C10SPC100

Symptom

A controller resets.

Handling Process

Workarounds

On the master controller of the cluster, run CLI commands to disable background disk scanning. (Background disk scanning is used to identify bad sectors on disks and repair them in advance. If background disk scanning is disabled, the storage system lacks one way to identify bad sectors on disks and repair them in advance. However, bad sector detection and repair can be triggered by host I/Os. If a read I/O encounters a bad sector, degraded read repair is implemented. If a write I/O encounters a bad sector, write repair is directly implemented. In addition, disk fault detection functions such as disk media scanning, routine check of S.M.A.R.T information and slow-disk check are still working to ensure disk reliability.

                          Step 1     Use an SSH tool, such as Xshell 5, PuTTY 0.63, SecureCRT 6.7, or one of their later versions, to access the management network port of the storage system. Log in to the CLI as user admin.

 

                          Step 1     On the CLI, run the following commands to find out the master controller (whose role is master) of the cluster.

change user_mode current_mode user_mode=developer (If a password is required, enter debug@storage.)

debug

sys showcls

 

                          Step 2     On the master controller of the cluster, run the following command to disable background scanning.

change system media_scan status=stop

 

                          Step 3     On each controller, check whether background scanning has been disabled.

show system media_scan

 

----End

After upgrading the storage system or installing a patch, enable background scanning.

                          Step 1     Use an SSH tool, such as Xshell 5, PuTTY 0.63, SecureCRT 6.7, or one of their later versions, to access the management network port of the storage system. Log in to the CLI as user admin.

 

                          Step 2     On the CLI, run the following commands to find out the master controller (whose role is master) of the cluster.

change user_mode current_mode user_mode=developer (If a password is required, enter debug@storage.)

debug

sys showcls

 

                          Step 3     On the master controller of the cluster, run the following command to enable background scanning.

change system media_scan status=start

 

                          Step 4     On each controller, check whether background scanning has been enabled.

show system media_scan

 

----End

Root Cause

During the background disk scanning, background scanning I/Os are requested, dispatched, returned, and deleted. When I/Os are requested, the I/O count value increases. When I/Os are returned, the I/O count value decreases. If an amount of space in a disk is not allocated to a LUN, that amount of space will not be scanned, and the corresponding I/O is deleted without being dispatched and returned. As a result, the I/O count value only increases. After a long time, the counter overflow issue occurs, leading to an abnormal controller reset.

Solution

V300R001C10SPH202 intended for V300R001C10SPC200 and patches of later versions have resolved the problem.

Recommended solution versions or patches are as follows:

Upgrade all V300R001 versions involved in the problem to V300R001C20SPC100 or later version.

END