处理过程
寻找 block 的请求
-
[root@hh-yun-puppet-129021 ~]
-
HEALTH_WARN 1 requests are blocked > 32 sec; 1 osds have slow requests
-
1 ops are blocked > 33554.4 sec
-
1 ops are blocked > 33554.4 sec on osd.16
-
1 osds have slow requests
可以看到 osd.16 具有一个操作 block
解决方法
查询 osd 对应主机
-
[root@hh-yun-puppet-129021 ~]
-
-
-
-2 40 host hh-yun-ceph-cinder015-128055
-
-
-
-
-
-
-
-
-
-
-
-3 40 host hh-yun-ceph-cinder016-128056
-
-
-
-
-
-
-
-
重启 osd
-
[root@hh-yun-ceph-cinder016-128056 ~]
-
-
Stopping Ceph osd.16 on hh-yun-ceph-cinder016-128056...kill 2799859...kill 2799859...done
-
[root@hh-yun-ceph-cinder016-128056 ~]
-
-
create-or-move updated item name 'osd.16' weight 3.64 at location {host=hh-yun-ceph-cinder016-128056,root=default} to crush map
-
Starting Ceph osd.16 on hh-yun-ceph-cinder016-128056...
-
Running as unit run-3126361.service.
系统会对该 osd 执行 recovery 操作, recovery 过程中, 会断开 block request, 那么这个 request 将会重新请求 mon 节点, 并重新获得新的 pg map, 得到最新的数据访问位置, 从而解决上述问题
恢复后的状态
-
[root@hh-yun-puppet-129021 ~]
-
cluster dc4f91c1-8792-4948-b68f-2fcea75f53b9
-
-
monmap e3: 5 mons at {hh-yun-ceph-cinder015-128055=240.30.128.55:6789/0,hh-yun-ceph-cinder017-128057=240.30.128.57:6789/0,hh-yun-ceph-cinder024-128074=240.30.128.74:6789/0,hh-yun-ceph-cinder025-128075=240.30.128.75:6789/0,hh-yun-ceph-cinder026-128076=240.30.128.76:6789/0}, election epoch 216, quorum 0,1,2,3,4 hh-yun-ceph-cinder015-128055,hh-yun-ceph-cinder017-128057,hh-yun-ceph-cinder024-128074,hh-yun-ceph-cinder025-128075,hh-yun-ceph-cinder026-128076
-
osdmap e97981: 190 osds: 190 up, 190 in
-
pgmap v13669826: 20544 pgs, 2 pools, 77488 GB data, 19510 kobjects
-
228 TB used, 426 TB / 654 TB avail
-
-
3 active+clean+scrubbing+deep
-
client io 21801 kB/s rd, 66461 kB/s wr, 2328 op/s
根因
requests are blocked > 32 sec 有可能是在数据迁移过程中, 用户正在对该数据块进行访问, 但访问还没有完成, 数据就迁移到别的 OSD 中, 那么就会导致有请求被 block, 对用户也是有影响的