Hot Swap
Software Dependency
- OS: EulerOS V2.0 SP1
- Kernel: 3.10.0-229.20.1.30.hulk and later
Hardware Dependency
Only KunLun9016 and servers 9032 support hot swap.
Configurations
- Enable hot swap
In the /boot/grub2/grub.cfg file (EFI configuration file is /boot/efi/EFI/euleros/grub.cfg), add "movable_node numa_zonelist_order=zone" to the kernel boot option.
- Enable address range memory mirroring
By default, kernel 3.10.0-229.20.1.30.hulk and later versions support address range memory mirroring. You can use the mirrorable parameter to manually enable address range memory mirroring for versions earlier than kernel 3.10.0-229.20.1.30.hulk.
Principles for Using Hot Swap on OS
- Hot swap must not interrupt service. Do not use hot swap if taking memory offline will cause out of memory (OOM) and ultimately service interruption.
- Hot swap must be suitable for each typical scenario (for example, the scenario in which an Oracle database is installed on EulerOS).
- Hot swap is mainly used when hardware faults have or may have occurred.
Precautions
- Hot swap causes OS kernel to run only on unmovable nodes (nodes that are not hot swappable). This increases the possibility for kernel to access memory across nodes and therefore degrades performance.
- Hot swap causes EulerOS to ignore bindings between memory and nodes.
For example, some applications are bound to node 1. After node 1 becomes offline, the binding relationship changes from bind to prefer so that if memory of node 1 is insufficient, some memory of other nodes can be allocated to node 1. This ensures that memory of node 1 will not be taken offline. Once memory of node 1 becomes offline, processes on node 1 will be killed.
- Hot swap causes EulerOS to ignore the overcommit_memory parameter of hugeTLB pages.
To ensure that hugeTLB pages can be successfully migrated when memory is being taken offline, OS ignores overcommit_memory, that is, OS allocates hugeTLB pages larger than the preset threshold to users when OS is taking memory offline. This avoids out of memory (OOM) due to a lack of hugeTLB pages.
- If hot swap is enabled, system load must be moderate.
Take 16P as an example. Memory usage, that is, memused/(total - mem_tobe_removed), must be below 60%. High memory usage increases the amount of time spent in memory migration and therefore increases the possibility of migration failures.
- The attempt to take memory offline may fail during hot swap.
When applications are running, filemap_fault may occur because of memory lock or frequent file access. This ultimately prevents memory from being successfully taken offline. If memory fails to become offline, try again later.
The nodectl tool helps you determine why memory fails to be taken offline. This tool displays which processes are using the node memory that fails to become offline. Command for querying which processes are using memory of node 1:
[root@localhost nodectl]# nodectl --procs 1 This following process(es) have memory page(s) allocated on node 1 9757, watch: pages = 119 (476 kb) 169177, nodectl: pages = 27 (108 kb) 173780, nodectl: pages = 3 (12 kb) # use the -v option to query more information about memory page allocation: [root@localhost nodectl]# nodectl --procs -v 1 This following process(es) have memory page(s) allocated on node 1 [ 3690, libvirtd] 7f9c7c000000 default anon=287 dirty=286 swapcache=1 active=282 N1=1 N4=1 N6=1 N7=284 [ 3690, libvirtd] 7f9c863d2000 default file=/usr/lib64/libvirt/connection-driver/libvirt_driver_nodedev.so anon=1 dirty=1 N1=1 [ 3690, libvirtd] 7f9c9943f000 default file=/usr/lib64/libc-2.17.so anon=2 dirty=2 N0=1 N1=1 [ 3690, libvirtd] 7f9c9e3e4000 default heap anon=20 dirty=20 active=19 N0=12 N1=3 N4=2 N7=3 [ 9757, watch] 00e27000 default heap anon=437 dirty=437 N1=106 N2=1 N3=148 N4=8 N6=170 N7=4 [ 9757, watch] 7fc80568b000 default file=/usr/lib64/libc-2.17.so anon=2 dirty=2 N1=1 N3=1 [ 9757, watch] 7fc805abf000 default file=/usr/lib64/libtinfo.so.5.9 anon=1 dirty=1 N1=1 [ 9757, watch] 7ffc546fb000 default stack anon=3 dirty=3 N1=1 N3=2 [ 32838, systemd-udevd] 7f5df58a0000 default heap anon=78 dirty=69 mapmax=5 swapcache=9 active=69 N1=1 N5=24 N6=19 N7=34 [ 32838, systemd-udevd] 7ffd97b72000 default stack anon=14 dirty=14 mapmax=5 N1=2 N4=1 N5=7 N7=4