Since kernel 2.6.28, Linux uses a Split Least Recently Used (LRU) page replacement strategy. Pages with a filesystem source, such as program text or shared libraries belong to the file cache. Pages without filesystem backing are called anonymous pages, and consist of runtime data such as the stack space reserved for applications etc. Typically pages belonging to the file cache are cheaper to evict from memory (as these can simple be read back from disk when needed). Since anonymous pages have no filesystem backing, they must remain in memory as long as they are needed by a program unless there is swap space to store them to.
It is a common misconception that a swap partition would somehow slow down your system. Not having a swap partition does not mean that the kernel won’t evict pages from memory, it just means that the kernel has fewer choices in regards to which pages to evict. The amount of swap available will not affect how much it is used.
Linux can cope with the absence of a swap space because, by default, the kernel memory accounting policy may overcommit memory. The downside is that when physical memory is exhausted, and the kernel cannot swap anonymous pages to disk, the out-of-memory-killer (OOM-killer) mechanism will start killing off memory-hogging “rogue” processes to free up memory for other processes.
vm.swappiness option is a modifier that changes the balance between swapping out file cache pages in favour of anonymous pages. The file cache is given an arbitrary priority value of 200 from which
vm.swappiness modifier is deducted (
file_prio=200-vm.swappiness). Anonymous pages, by default, start out with 60 (
anon_prio=vm.swappiness). This means that, by default, the priority weights stand moderately in favour of anonymous pages (
file_prio=200-60=140). The behaviour is defined in
mm/vmscan.c in the kernel source tree.
100, the priorities would be equal (
anon_prio=100). This would make sense for an I/O heavy system if it is not wanted that pages from the file cache being evicted in favour of anonymous pages.
Conversely setting the
0 will prevent the kernel from evicting anonymous pages in favour of pages from the file cache. This might be useful if programs do most of their caching themselves, which might be the case with some databases. In desktop systems this might improve interactivity, but the downside is that I/O performance will likely take a hit.
The default value has most likely been chosen as an approximate middleground between these two extremes. As with any performance parameter, adjusting
vm.swappiness should be based on benchmark data comparable to real workloads, not just a gut feeling.
The problem is that there is no one default value that will suit all needs. Setting the swappiness option to 10 may be an appropriate setting for desktops, but the default value of 60 may be more suitable for servers. In other words swappiness needs to be tweaked according the use case – desktop vs. server, application type and so on.
Furthermore, the Linux kernel uses memory for disk cache otherwise the RAM wouldn’t be used and this is not efficient and intended. Having disk data in the cache means that if something needs the same data again, it will likely get it from the memory. Fetching the data from there is much more quicker than getting it from the disk again. And the swappiness option is a mechanism how much the Linux kernel prefers swapping out to disk to shrinking the disk cache. Should it rather remove older data from the cache or should it swap out some program pages?
This article may shed some light on the topic as well. Especially, how the swapping tendency is estimated.
Adding more detail to the answers above.
As we’re using VM’s more and more, a linux host may be a vm on one of these cloud environments. In both examples 1 & 2 we’ve got a good idea of the applications running and so how much RAM they consume. In 3, not so much
- Example 1
A high performance private cloud (think the sort most banks would pay millions for) one where the disk is provided by a very expensive storage array with VERY good IO. Part of that storage may be in RAM (in the disk array) backed by SSD disks, backed by regular disks with spindles. In this situation the disk that the VM sees might be only a little slower than the RAM it can access. For a single vm there isn’t much difference between swap and ram.
- Example 2
The same as example 1 but instead of a single vm you have hundreds, thousands or more. In this situation we find out that server (hypervisor) RAM is cheap and plentiful where storage RAM is expensive (relatively speaking). If we split the RAM requirements between Hypervisor RAM and SWAP provided by our very expensive storage array we find we quickly use all of the RAM in the storage array, blocks are then served by the SSD’s and finally by the spindles. Suddenly every starts getting really slow. In this case we probably want to assign plenty of RAM (from the hypervisor) to the VM and set swappiness to 0 (only swap to avoid out of memory conditions) as the cumulative effect of all those vm’s will have an effect on the performance of the storage, where setting the swappiness higher may give a perceived performance boost as there will be more unused RAM because apps that are not currently being interacted with have been (mostly) swapped out.
- Example 3
A modern laptop or desktop probably with a SSD. The memory requirements are considered unknown. Which browser will the user use, how many tabs will they have open, will they also be editing a document, a RAW image or possibly a video, they will all consume RAM. Setting the swappiness to a low value and performing other file system tweaks will mean that there are fewer writes to the SSD and so it will last longer.