Faulty symptom: Customer configured some filesystem with 3-5 max number of timing snapshot and with delete obsolete read-only snapshots enabled. But, customer found there're more than 100 snapshots on some of the filesystem. When customer try to delete some older ones, he got error message "This operation fails to be performed because the snapshot is used by other services."
Version information: V300R003C20SPC200
Storage model: OceanStor 6800V3
1. Collect and check storage event log, we can find a lot of events like below. And, we can simply find all the issue filesystem snapshots have a prefix like "Hyper_ndmp", which means the snapshot was created by NDMP function.
2018-01-02 14:20:22 0x200F001D0004 Event Informational -- None admin:10.178.17.237 failed to delete the file system snapshot (name Hyper_ndmp020171002233520, source file system Shares_IPTV_DR). Error code: 0x4000300e.
2018-01-02 14:20:22 0x200F001D0004 Event Informational -- None admin:10.178.17.237 failed to delete the file system snapshot (name Hyper_ndmp020171005050008, source file system Shares_IPTV_DR). Error code: 0x4000300e.
2. Normally, after customer enabled the NDMP backup function, storage need to create a snapshot for the filesystem before each backup, and the NDMP just backup data from the snapshot. During the next NDMP backup, storage will delete the old NDMP snapshot and create a new one. So, normally, there's only one NDMP snapshot remain in system for each filesystem. So, there should be a software issue which cause NDMP snapshot remain in system. Confirmed with R&D, it's new bug on storage.
Storage search the NDMP snapshot by the filesystem ID, and the filesystem ID is stored in string. So, the storage will match the filesystem ID by string.
Because of coding error, there’s a possibility to match the wrong filesystem ID. For example, if there’re two filesystem with ID 2 and 20, when storage need to delete NDMP snapshot for filesystem 2, it may match filesystem 20. And then search the corresponding snapshot with ID 20. Obviously, it can’t find the snapshot in snapshot list of filesystem 20. In this case, it can’t delete the snapshot for filesystem 2.
The problem will be fixed in V300R006C10SPC100.