List all processes and how much swap they use

Sometimes I see that my computer uses a lot of swap space, but none of the system monitors (e.g. ps, Gnome’s System Monitor, top, btop) show which processes are to blame. There’re several memory metrics (e.g. total swap usage), but never per process swap usage. 😠

There’s a StackExchange question that asks the same thing, but the solutions are all not well scripted, slow or “meh” for other reasons … so I tackled the problem myself. 😝

All the information is basically buried in the /proc/*/status files. Among other things they contain the amount of swap a process uses, its PID and even better: its name. So we have to go through all the files, grep the useful lines, extract the data from those lines and recombine them.

I tried different combinations of grep and sed and Bash string interpolation … and while it worked, it was even slower than the StackExchange suggestions. 😯😅😞 This looked more and more like a “if I knew grep/sed/awk better I wouldn’t need to invoke 4 sub shells/pipes/processes for each file” kind of problem. I remembered Bryan Cantrill making an offhanded remark once that awk had a simple and concise language and a great manual.

If you get the awk programming language manual…you’ll read it in about two hours and then you’re done. That’s it. You know all of awk.

Bryan Cantrill

I had put off diving in for too long. So I started reading and roughly 20 minutes in I knew enough to solve the whole problem (almost) entirely in awk (it still needs the shell for filename globbing 😕🤷). And … it’s blazingly fast! And … you can also combine it with sort and head to quickly find the worst offenders. 😎

You can find the code in this Gist. There’s a simpler version in an earlier revision. 😉

Usefulness of Swap Explained

Chris Down explains how swap’s main role is being the missing backing store for anonymous (i.e. allocated by malloc) pages. While all other kinds of data (e.g. paged-in files) can be reclaimed easily and later reloaded, because their “source of truth” is elsewhere. There’s no such source for anonymous pages hence these pages can “never” be reclaimed unless there’s swap space available (even if those pages aren’t “hot”).

Linux has historically had poor swap (and by extension OOM) handling with few and imprecise means for configuration. Chris describes the behavior of a machine with and without swap in different scenarios of memory contention. He thinks that poor swap performance is caused by having a poor measure of “memory pressure.” He explains how work on cgroups v2 might give the kernel (and thus admins) better measures for memory pressure and knobs for dealing with it.