From performance perspective one of the simplest question is how well your system is handling current load ? You can answer this question by referring to many performance indicators like cpu usage, load average, io queue size, memory usage etc. Most of them can led you to summarize that it depends. More generaly we would like to stick and measure saturation one of the four golden signals. How can we technically achieve this goal in modern Linux ? In kernel version 4.20 there is system called PSI (Pressure Blocking Information), which provides information about how much are your processes being stalled in selected subsystems:
This informations are stored in
cgroups v2 and expressed in the following format:
$ cat /proc/pressure/memory some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0
somemeans some processes are stalled
fullall processes (non-idle) are stalled
avgXwhere X means seconds and are express as percentage of time
totaltotal time being stalled in us
cpu pressure is having only
some because CPU is always executing, cannot be stalled at all. You can also monitor pressure by setting threshold and being trigger when this threshold is being exceeded.
More detailed information:
- How to Monitor Server via PSI (Pressure Stall Information) and cgroupv2?
- PSI - Pressure Stall Information
Let's test this out, I will start container make some load which can be handled by system and then I put some additonal workload to get pressure values. Docker Engine from version 20.10 is supporting cgroup v2, but to make it working I will switch it to cgroup v2. To check if docker engine is using cgroup v2:
# docker info | grep -i "cgroup version" Cgroup Version: 2
then put some load:
# nproc 1 # docker run --rm -ti ubuntu:latest bash root@cc8d07a3da81:/# dd if=/dev/zero of=/dev/null
so one process generate 100% cpu usage and we have only one cpu in system, see how PSI metrics look like:
# mount | grep -i cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate) # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES cc8d07a3da81 ubuntu:latest "bash" 2 minutes ago Up 2 minutes compassionate_tharp # cat /sys/fs/cgroup/system.slice/docker-cc8d07a3da81a308d842e9dedd0656cf4548f558642ffb2cdb5881742e5b1ec0.scope/cpu.pressure some avg10=0.00 avg60=0.00 avg300=0.00 total=15190
almost no pressure, so run another process:
# cat /sys/fs/cgroup/system.slice/docker-cc8d07a3da81a308d842e9dedd0656cf4548f558642ffb2cdb5881742e5b1ec0.scope/cpu.pressure some avg10=98.75 avg60=52.24 avg300=14.14 total=46233007
now we have almost 100% cpu stall in last 10 seconds, because only one process is working another needs to wait, for sure it's too much for this system. PSI metrics can be collected by node_exporter, how can be collected from container perspective I will try to figure out in next blog post.