Getting failure result when trying to restore from e-Backup for a production server
Failure result when trying to restore from e-Backup for a production server.
Issue was reported on eBackup and need t check eBackup portal and then FusionCompute portal.
The failed backup/restore tasks are related to different CNA and different Datastore.
Tried to restore VM to another FusionCompute cluster and after that, trie to restore the VM to original FusionCompute cluster.
The tasks of eBackup was failed because of write data blocks via socket failed.
The parameter DPS_CHECK_DELAY_TIME is just a factor which used to control how many seconds that FusionCompute need to minus from the timeout count. For example, if we change DPS_CHECK_DELAY_TIME to 10, FusionCompute minus 10 from current timeout count in each detection cycle(1 second). Which means, if eBackup doesn’t update the timeout count in time, the task will be closed in 6 minutes. No other impact to the system.
eBackup have a backup task timeout count which is 3600 seconds in maximum. eBackup check the backup/restore task progress and remain timeout count every 3 minutes. If the ackup/restore task can’t be finished in remain count time, it will change the timeout to 3600 again, to ensure the backup/restore task work properly.
In the meanwhile, FusionCompute check the timeout count too, and it minus the timeout count every second. If FusionCompute find the timeout count is zero, it will close backup/restore task as well as the socket between eBackup and FusionCompute .
The problem is, current FusionCompute cluster set a parameter called DPS_CHECK_DELAY_TIME as 20. In this case, FusionCompute minus the timeout count of backup/restore task by 20 every time, which means the backup/restore task is timeout in 3 minutes in FusionCompute, instead of 60minutes.
FusionCompute closed the backup/restore task before eBackup update timeout count. eBackup will find the socket closed before backup/restore task complete, and interrupt the ackup/restore task.
This issue it’s collaboration problem between FusionCompute and eBackup.
During VM backup and restore, eBackup just call interface of FusionCompute to finish the task. So, FusionCompute also need to “manage” the task. For example, FusionCompute need to close the task after eBackup crash or network interrupted, avoid the backup/restore task lose control.