Home » Need explanation on Resident Set Size/Virtual Size

Need explanation on Resident Set Size/Virtual Size

Solutons:


RSS is how much memory this process currently has in main memory (RAM). VSZ is how much virtual memory the process has in total. This includes all types of memory, both in RAM and swapped out. These numbers can get skewed because they also include shared libraries and other types of memory. You can have five hundred instances of bash running, and the total size of their memory footprint won’t be the sum of their RSS or VSZ values.

If you need to get a more detailed idea about the memory footprint of a process, you have some options. You can go through /proc/$PID/map and weed out the stuff you don’t like. If it’s shared libraries, the calculation could get complex depending on your needs (which I think I remember).

If you only care about the heap size of the process, you can always just parse the [heap] entry in the map file. The size the kernel has allocated for the process heap may or may not reflect the exact number of bytes the process has asked to be allocated. There are minute details, kernel internals and optimisations which can throw this off. In an ideal world, it’ll be as much as your process needs, rounded up to the nearest multiple of the system page size (getconf PAGESIZE will tell you what it is — on PCs, it’s probably 4,096 bytes).

If you want to see how much memory a process has allocated, one of the best ways is to forgo the kernel-side metrics. Instead, you instrument the C library’s heap memory (de)allocation functions with the LD_PRELOAD mechanism. Personally, I slightly abuse valgrind to get information about this sort of thing. (Note that applying the instrumentation will require restarting the process.)

Please note, since you may also be benchmarking runtimes, that valgrind will make your programs very slightly slower (but probably within your tolerances).

Minimal runnable example

For this to make sense, you have to understand the basics of paging: https://stackoverflow.com/questions/18431261/how-does-x86-paging-work and in particular that the OS can allocate virtual memory via page tables / its internal memory book keeping (VSZ virtual memory) before it actually has a backing storage on RAM or disk (RSS resident memory).

Now to observe this in action, let’s create a program that:

  • allocates more RAM than our physical memory with mmap
  • writes one byte on each page to ensure that each of those pages goes from virtual only memory (VSZ) to actually used memory (RSS)
  • checks the memory usage of the process with one of the methods mentioned at: https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c

main.c

#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

typedef struct {
    unsigned long size,resident,share,text,lib,data,dt;
} ProcStatm;

/* https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c/7212248#7212248 */
void ProcStat_init(ProcStatm *result) {
    const char* statm_path = "/proc/self/statm";
    FILE *f = fopen(statm_path, "r");
    if(!f) {
        perror(statm_path);
        abort();
    }
    if(7 != fscanf(
        f,
        "%lu %lu %lu %lu %lu %lu %lu",
        &(result->size),
        &(result->resident),
        &(result->share),
        &(result->text),
        &(result->lib),
        &(result->data),
        &(result->dt)
    )) {
        perror(statm_path);
        abort();
    }
    fclose(f);
}

int main(int argc, char **argv) {
    ProcStatm proc_statm;
    char *base, *p;
    char system_cmd[1024];
    long page_size;
    size_t i, nbytes, print_interval, bytes_since_last_print;
    int snprintf_return;

    /* Decide how many ints to allocate. */
    if (argc < 2) {
        nbytes = 0x10000;
    } else {
        nbytes = strtoull(argv[1], NULL, 0);
    }
    if (argc < 3) {
        print_interval = 0x1000;
    } else {
        print_interval = strtoull(argv[2], NULL, 0);
    }
    page_size = sysconf(_SC_PAGESIZE);

    /* Allocate the memory. */
    base = mmap(
        NULL,
        nbytes,
        PROT_READ | PROT_WRITE,
        MAP_SHARED | MAP_ANONYMOUS,
        -1,
        0
    );
    if (base == MAP_FAILED) {
        perror("mmap");
        exit(EXIT_FAILURE);
    }

    /* Write to all the allocated pages. */
    i = 0;
    p = base;
    bytes_since_last_print = 0;
    /* Produce the ps command that lists only our VSZ and RSS. */
    snprintf_return = snprintf(
        system_cmd,
        sizeof(system_cmd),
        "ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "%ju") print}'",
        (uintmax_t)getpid()
    );
    assert(snprintf_return >= 0);
    assert((size_t)snprintf_return < sizeof(system_cmd));
    bytes_since_last_print = print_interval;
    do {
        /* Modify a byte in the page. */
        *p = i;
        p += page_size;
        bytes_since_last_print += page_size;
        /* Print process memory usage every print_interval bytes.
         * We count memory using a few techniques from:
         * https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c */
        if (bytes_since_last_print > print_interval) {
            bytes_since_last_print -= print_interval;
            printf("extra_memory_committed %lu KiBn", (i * page_size) / 1024);
            ProcStat_init(&proc_statm);
            /* Check /proc/self/statm */
            printf(
                "/proc/self/statm size resident %lu %lu KiBn",
                (proc_statm.size * page_size) / 1024,
                (proc_statm.resident * page_size) / 1024
            );
            /* Check ps. */
            puts(system_cmd);
            system(system_cmd);
            puts("");
        }
        i++;
    } while (p < base + nbytes);

    /* Cleanup. */
    munmap(base, nbytes);
    return EXIT_SUCCESS;
}

GitHub upstream.

Compile and run:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
sudo dmesg -c
./main.out 0x1000000000 0x200000000
echo $?
sudo dmesg

where:

  • 0x1000000000 == 64GiB: 2x my computer’s physical RAM of 32GiB
  • 0x200000000 == 8GiB: print the memory every 8GiB, so we should get 4 prints before the crash at around 32GiB
  • echo 1 | sudo tee /proc/sys/vm/overcommit_memory: required for Linux to allow us to make a mmap call larger than physical RAM: https://stackoverflow.com/questions/2798330/maximum-memory-which-malloc-can-allocate/57687432#57687432

Program output:

extra_memory_committed 0 KiB
/proc/self/statm size resident 67111332 768 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
  PID    VSZ   RSS
29827 67111332 1648

extra_memory_committed 8388608 KiB
/proc/self/statm size resident 67111332 8390244 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
  PID    VSZ   RSS
29827 67111332 8390256

extra_memory_committed 16777216 KiB
/proc/self/statm size resident 67111332 16778852 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
  PID    VSZ   RSS
29827 67111332 16778864

extra_memory_committed 25165824 KiB
/proc/self/statm size resident 67111332 25167460 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
  PID    VSZ   RSS
29827 67111332 25167472

Killed

Exit status:

137

which by the 128 + signal number rule means we got signal number 9, which man 7 signal says is SIGKILL, which is sent by the Linux out-of-memory killer.

Output interpretation:

  • VSZ virtual memory remains constant at printf '0x%Xn' 0x40009A4 KiB ~= 64GiB (ps values are in KiB) after the mmap.
  • RSS “real memory usage” increases lazily only as we touch the pages. For example:
    • on the first print, we have extra_memory_committed 0, which means we haven’t yet touched any pages. RSS is a small 1648 KiB which has been allocated for normal program startup like text area, globals, etc.
    • on the second print, we have written to 8388608 KiB == 8GiB worth of pages. As a result, RSS increased by exactly 8GIB to 8390256 KiB == 8388608 KiB + 1648 KiB
    • RSS continues to increase in 8GiB increments. The last print shows about 24 GiB of memory, and before 32 GiB could be printed, the OOM killer killed the process

See also: Need explanation on Resident Set Size/Virtual Size

OOM killer logs

Our dmesg commands have shown the OOM killer logs.

An exact interpretation of those has been asked at:

  • https://stackoverflow.com/questions/9199731/understanding-the-linux-oom-killers-logs but let’s have a quick look here.
  • https://serverfault.com/questions/548736/how-to-read-oom-killer-syslog-messages

The very first line of the log was:

[ 7283.479087] mongod invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

So we see that interestingly it was the MongoDB daemon that always runs in my laptop on the background that first triggered the OOM killer, presumably when the poor thing was trying to allocate some memory.

However, the OOM killer does not necessarily kill the one who awoke it.

After the invocation, the kernel prints a table or processes including the oom_score:

[ 7283.479292] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[ 7283.479303] [    496]     0   496    16126        6   172032      484             0 systemd-journal
[ 7283.479306] [    505]     0   505     1309        0    45056       52             0 blkmapd
[ 7283.479309] [    513]     0   513    19757        0    57344       55             0 lvmetad
[ 7283.479312] [    516]     0   516     4681        1    61440      444         -1000 systemd-udevd

and further ahead we see that our own little main.out actually got killed on the previous invocation:

[ 7283.479871] Out of memory: Kill process 15665 (main.out) score 865 or sacrifice child
[ 7283.479879] Killed process 15665 (main.out) total-vm:67111332kB, anon-rss:92kB, file-rss:4kB, shmem-rss:30080832kB
[ 7283.479951] oom_reaper: reaped process 15665 (main.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:30080832kB

This log mentions the score 865 which that process had, presumably the highest (worst) OOM killer score as mentioned at: How does the OOM killer decide which process to kill first?

Also interestingly, everything apparently happened so fast that before the freed memory was accounted, the oom was awoken again by the DeadlineMonitor process:

[ 7283.481043] DeadlineMonitor invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

and this time that killed some Chromium process, which is usually my computers normal memory hog:

[ 7283.481773] Out of memory: Kill process 11786 (chromium-browse) score 306 or sacrifice child
[ 7283.481833] Killed process 11786 (chromium-browse) total-vm:1813576kB, anon-rss:208804kB, file-rss:0kB, shmem-rss:8380kB
[ 7283.497847] oom_reaper: reaped process 11786 (chromium-browse), now anon-rss:0kB, file-rss:0kB, shmem-rss:8044kB

Tested in Ubuntu 19.04, Linux kernel 5.0.0.

Related Solutions

Joining bash arguments into single string with spaces

[*] I believe that this does what you want. It will put all the arguments in one string, separated by spaces, with single quotes around all: str="'$*'" $* produces all the scripts arguments separated by the first character of $IFS which, by default, is a space....

AddTransient, AddScoped and AddSingleton Services Differences

TL;DR Transient objects are always different; a new instance is provided to every controller and every service. Scoped objects are the same within a request, but different across different requests. Singleton objects are the same for every object and every...

How to download package not install it with apt-get command?

Use --download-only: sudo apt-get install --download-only pppoe This will download pppoe and any dependencies you need, and place them in /var/cache/apt/archives. That way a subsequent apt-get install pppoe will be able to complete without any extra downloads....

What defines the maximum size for a command single argument?

Answers Definitely not a bug. The parameter which defines the maximum size for one argument is MAX_ARG_STRLEN. There is no documentation for this parameter other than the comments in binfmts.h: /* * These are the maximum length and maximum number of strings...

Bulk rename, change prefix

I'd say the simplest it to just use the rename command which is common on many Linux distributions. There are two common versions of this command so check its man page to find which one you have: ## rename from Perl (common in Debian systems -- Ubuntu, Mint,...

Output from ls has newlines but displays on a single line. Why?

When you pipe the output, ls acts differently. This fact is hidden away in the info documentation: If standard output is a terminal, the output is in columns (sorted vertically) and control characters are output as question marks; otherwise, the output is...

mv: Move file only if destination does not exist

mv -vn file1 file2. This command will do what you want. You can skip -v if you want. -v makes it verbose - mv will tell you that it moved file if it moves it(useful, since there is possibility that file will not be moved) -n moves only if file2 does not exist....

Is it possible to store and query JSON in SQLite?

SQLite 3.9 introduced a new extension (JSON1) that allows you to easily work with JSON data . Also, it introduced support for indexes on expressions, which (in my understanding) should allow you to define indexes on your JSON data as well. PostgreSQL has some...

Combining tail && journalctl

You could use: journalctl -u service-name -f -f, --follow Show only the most recent journal entries, and continuously print new entries as they are appended to the journal. Here I've added "service-name" to distinguish this answer from others; you substitute...

how can shellshock be exploited over SSH?

One example where this can be exploited is on servers with an authorized_keys forced command. When adding an entry to ~/.ssh/authorized_keys, you can prefix the line with command="foo" to force foo to be run any time that ssh public key is used. With this...

Why doesn’t the tilde (~) expand inside double quotes?

The reason, because inside double quotes, tilde ~ has no special meaning, it's treated as literal. POSIX defines Double-Quotes as: Enclosing characters in double-quotes ( "" ) shall preserve the literal value of all characters within the double-quotes, with the...

What is GNU Info for?

GNU Info was designed to offer documentation that was comprehensive, hyperlinked, and possible to output to multiple formats. Man pages were available, and they were great at providing printed output. However, they were designed such that each man page had a...

Set systemd service to execute after fstab mount

a CIFS network location is mounted via /etc/fstab to /mnt/ on boot-up. No, it is not. Get this right, and the rest falls into place naturally. The mount is handled by a (generated) systemd mount unit that will be named something like mnt-wibble.mount. You can...

Merge two video clips into one, placing them next to each other

To be honest, using the accepted answer resulted in a lot of dropped frames for me. However, using the hstack filter_complex produced perfectly fluid output: ffmpeg -i left.mp4 -i right.mp4 -filter_complex hstack output.mp4 ffmpeg -i input1.mp4 -i input2.mp4...

How portable are /dev/stdin, /dev/stdout and /dev/stderr?

It's been available on Linux back into its prehistory. It is not POSIX, although many actual shells (including AT&T ksh and bash) will simulate it if it's not present in the OS; note that this simulation only works at the shell level (i.e. redirection or...

How can I increase the number of inodes in an ext4 filesystem?

It seems that you have a lot more files than normal expectation. I don't know whether there is a solution to change the inode table size dynamically. I'm afraid that you need to back-up your data, and create new filesystem, and restore your data. To create new...

Why doesn’t cp have a progress bar like wget?

The tradition in unix tools is to display messages only if something goes wrong. I think this is both for design and practical reasons. The design is intended to make it obvious when something goes wrong: you get an error message, and it's not drowned in...