热复位
命令功能
npu-smi set -t reset -i id -c chip_id 用于复位NPU设备。
- 本产品有SMP和AMP两种工作模式,当工作在AMP模式下时只复位当前指定的NPU,当工作在SMP模式下时会提示是否复位全部NPU,如果输入y,所有NPU全部复位,如果输入n退出复位操作。
- 工作模式查询方法:登录iBMC,通过命令ipmcget -d npu可以查出当前的NPU工作模式。
- 工作模式切换方法:请参考《Atlas 900 计算节点 iBMC (V3.01.00.00及以上) 用户指南》的“查询和设置NPU芯片工作模式”章节。
- NPU复位成功后,请至少等待20s,再执行当前或其他NPU信息查询操作,请耐心等待。
命令格式
npu-smi set -t reset -i id -c chip_id
参数说明
类型 |
描述 |
---|---|
id |
NPU单元ID |
chip_id |
芯片ID |
约束说明
- 执行热复位操作前,请停掉NPU处理器上的所有业务。
- 热复位命令必须在物理机的root用户下运行,若在物理机的非root用户,或在容器、虚拟机下运行,否则会返回错误。
使用实例
#AMP模式下对NPU 2进行热复位。
npu-smi set -t reset -i 2 -c 0 Resetting a standard PCIe card or npu chip during service running may cause system hang or abnormal reset. Are you sure you want to continue resetting?(Y/N) n Status : Fail Message : User aborts reset. npu-smi set -t reset -i 2 -c 0 Resetting a standard PCIe card or npu chip during service running may cause system hang or abnormal reset. Are you sure you want to continue resetting?(Y/N) y Message : resetting ... Status : OK Message : Reset server successfully
npu-smi set -t reset -i 2 -c 0 Resetting a standard PCIe card or npu chip during service running may cause system hang or abnormal reset. Are you sure you want to continue resetting?(Y/N) n Status : Fail Message : User aborts reset. npu-smi set -t reset -i 2 -c 0 Resetting a standard PCIe card or npu chip during service running may cause system hang or abnormal reset. Are you sure you want to continue resetting?(Y/N) y It's SMP mode, it will reboot all devices, do you want to continue reboot? [y/n] n npu-smi set -t reset -i 2 -c 0 Resetting a standard PCIe card or npu chip during service running may cause system hang or abnormal reset. Are you sure you want to continue resetting?(Y/N) y It's SMP mode, it will reboot all devices, do you want to continue reboot? [y/n] y Message : resetting (about 150 seconds) ... Status : OK Message : Reset server successfully