KSM is a feature that allows the sharing of identical pages between the different processes running on a system. We might presume that the identical pages exist due to certain reasons—for example, if there are multiple processes spawned from the same binary or something similar. There is no rule such as this though. KSM scans these identical memory pages and consolidates a Copy-on-Write (COW) shared page. COW is nothing but a mechanism where when there is an attempt to change a memory region that is shared and common to more than one process, the process that requests the change gets a new copy and the changes are saved in it.
Even though the consolidated COW shared page is accessible by all the processes, whenever a process tries to change the content (write to that page), the process gets a new copy with all of the changes. By now, you will have understood that, by using KSM, we can reduce physical memory consumption. In the KVM context, this can really add value, because guest systems are
qemu-kvm processes in the system, and there is a huge possibility that all the VM processes will have a good amount of similar memory.
For KSM to work, the process/application has to register its memory pages with KSM. In KVM-land, KSM allows guests to share identical memory pages, thus achieving an improvement in memory consumption. That might be some kind of application data, a library, or anything else that’s used frequently. This shared page or memory is marked as
copy on write. In short, KSM avoids memory duplication and it’s really useful when similar guest OSes are present in a KVM environment.
By using the theory of prediction, KSM can provide enhanced memory speed and utilization. Mostly, this common shared data is stored in cache or main memory, which causes fewer cache misses for the KVM guests. Also, KSM can reduce the overall guest memory footprint so that, in a way, it allows the user to do memory overcommitting in a KVM setup, thus supplying the greater utilization of available resources. However, we have to keep in mind that KSM requires more CPU resources to identify the duplicate pages and to perform tasks such as sharing/merging.
Previously, we mentioned that the processes have to mark the pages to show that they are eligible candidates for KSM to operate. The marking can be done by a process based on the
MADV_MERGEABLE flag, which we will discuss in the next section. You can explore the use of this flag in the
madvise man page:
# man 2 madvise MADV_MERGEABLE (since Linux 2.6.32) Enable Kernel Samepage Merging (KSM) for the pages in the range specified by addr and length. The kernel regularly scans those areas of user memory that have been marked as mergeable, looking for pages with identical content. These are replaced by a single write-protected page (that is automatically copied if a process later wants to update the content of the page). KSM merges only private anonymous pages (see mmap(2)). The KSM feature is intended for applications that generate many instances of the same data (e.g., virtualization systems such as KVM). It can consume a lot of processing power; use with care. See the Linux kernel source file Documentation/ vm/ksm.txt for more details. The MADV_MERGEABLE and MADV_UNMERGEABLE operations are available only if the kernel was configured with CONFIG_KSM.
So, the kernel has to be configured with KSM, as follows:
KSM gets deployed as a part of the
qemu-kvm package. Information about the KSM service can be fetched from the
sysfs filesystem, in the
/sys directory. There are different files available in this location, reflecting the current KSM status. These are updated dynamically by the kernel, and it has a precise record of the KSM usage and statistics:
In an upcoming section, we will discuss the
ksmtuned service and its configuration variables. As
ksmtuned is a service to control KSM, its configuration variables are analogous to the files we see in the
sysfs filesystem. For more details, you can check out https://www.kernel.org/doc/html/latest/admin-guide/mm/ksm.html.
It is also possible to tune these parameters with the
virsh command. The
virsh node-memory-tune command does this job for us. For example, the following command specifies the number of pages to scan before the shared memory service goes to sleep:
# virsh node-memory-tune --shm-pages-to-scan number
As with any other service, the
ksmtuned service also has logs stored in a log file,
/var/log/ksmtuned. If we add
/etc/ksmtuned.conf, we will have logging from any kind of KSM tuning actions. Refer to https://www.kernel.org/doc/Documentation/vm/ksm.txt for more details.
Once we start the KSM service, as shown next, you can watch the values change depending on the KSM service in action:
# systemctl start ksm
We can then check the status of the
ksm service like this:
Once the KSM service is started and we have multiple VMs running on our host, we can check the changes by querying
sysfs by using the following command multiple times:
Let’s explore the
ksmtuned service in more detail. The
ksmtuned service is designed so that it goes through a cycle of actions and adjusts KSM. This cycle of actions continues its work in a loop. Whenever a guest system is created or destroyed, libvirt will notify the
/etc/ksmtuned.conf file is the configuration file for the
ksmtuned service. Here is a brief explanation of the configuration parameters available. You can see these configuration parameters match with the KSM files in
# Configuration file for ksmtuned. # How long ksmtuned should sleep between tuning adjustments # KSM_MONITOR_INTERVAL=60 # Millisecond sleep between ksm scans for 16Gb server. # Smaller servers sleep more, bigger sleep less. # KSM_SLEEP_MSEC=10 # KSM_NPAGES_BOOST - is added to the `npages` value, when `free memory` is less than `thres`. # KSM_NPAGES_BOOST=300 # KSM_NPAGES_DECAY - is the value given is subtracted to the `npages` value, when `free memory` is greater than `thres`. # KSM_NPAGES_DECAY=-50 # KSM_NPAGES_MIN - is the lower limit for the `npages` value. # KSM_NPAGES_MIN=64 # KSM_NPAGES_MAX - is the upper limit for the `npages` value. # KSM_NPAGES_MAX=1250 # KSM_THRES_COEF - is the RAM percentage to be calculated in parameter `thres`. # KSM_THRES_COEF=20 # KSM_THRES_CONST - If this is a low memory system, and the `thres` value is less than `KSM_THRES_CONST`, then reset `thres` value to `KSM_THRES_CONST` value. # KSM_THRES_CONST=2048
KSM is designed to improve performance and allow memory overcommits. It serves this purpose in most environments; however, KSM may introduce a performance overhead in some setups or environments – for example, if you have a few VMs that have similar memory content when you start them and loads of memory-intensive operations afterward. This will create issues as KSM will first work very hard to reduce the memory footprint, and then lose time to cover for all of the memory content differences between multiple VMs. Also, there is a concern that KSM may open a channel that could potentially be used to leak information across guests, as has been well documented in the past couple of years. If you have these concerns or if you see/experience KSM not helping to improve the performance of your workload, it can be disabled.
To disable KSM, stop the
ksm services in your system by executing the following:
# systemctl stop ksm # systemctl stop ksmtuned
We have gone through the different tuning options for CPU and memory. The next big subject that we need to cover is NUMA configuration, where both CPU and memory configuration become a part of a larger story or context.