Cgroups

From PostgreSQL wiki
Jump to navigationJump to search


cgroups (control groups) are a Linux kernel feature that allows fine-grained control over system resources for a set of processes. cgroups might be a solution to solve several common challenges related to performance, resource allocation, and system stability. See https://docs.kernel.org/admin-guide/cgroup-v2.html for more information.


Problems statement

This is just some related problems which maybe can be addressed partially or fully with cgroup.

OOM killer on PostgreSQL process are still happening / overcommit control

There are reports form time to time about memory going under control, like [Postgresql OOM] and [Memory leak from ExecutorState context?] in the last 2 years (not an exhaustive list of OOM reported problems in 2023-2024, but there are numerous ones).

And also probably a lot of occasions triggered by external (to postgres) activity, though evidences are not collected here.


Gaining out of the box detailed insights into CPU, memory, and I/O usage at the system level

There are a lot of stats inside PostgreSQL, and also a lot on system side (in /proc for example) and via top-like commands, so there is no real problem.

However monitoring what's inside the PostgreSQL cgroup (because PostgreSQL is very often run a cgroup today) cannot be achieved with usual tools working with cgroups, like systemd-cgtop or systemd-cgls


WAL sender processes require consistent CPU resources to ensure smooth (logical) replication.

The topic has been discussed during Pgcon 2022: [Logical_Replication_High_Replay_Lag]

It is not possible de define resource allocation policy

It is not possible to tune resource allocation based on workload, for example balancing workloads between databases is not achievable today other than tuning memory allowed (without hard limit).


"slow" connection time

Backends are created only at connection, would it be possible to make this step faster ?


fantom process

PostgreSQL has a strong support for crashing quickly and reducing the window of possible misbehavior (due to shm locks and the like) however all PIDs must be addressed, and maybe it'll be preferable to kill them all at once.


Is Cgroup a solution ?

Proposed Cgroup implementation

Proposed Cgroup Extension API

Benchmarks

Issues

cgroup implementation in Linux is not perfect, it works but also has some limitations, an interesting extract from the linux documentation, affecting PostgreSQL, about memory and page tracking for writeback:

While this model is enough for most use cases where a given inode is mostly dirtied by a single cgroup even when the main writing cgroup changes over time, use cases where multiple cgroups write to a single inode simultaneously are not supported well. In such circumstances, a significant portion of IOs are likely to be attributed incorrectly. As memory controller assigns page ownership on the first use and doesn’t update it until the page is released, even if writeback strictly follows page ownership, multiple cgroups dirtying overlapping areas wouldn’t work as expected. It’s recommended to avoid such usage patterns.


PostgreSQL uses fork() which then impose a migration or the process if it is desired to move the child to another cgroup than the parent. clone() allows to directly assign to cgroup so it might be more convenient to use it...

References

[Postgresql OOM]: https://www.postgresql.org/message-id/CAG4TxrizOVnkYx1v1a7rv6G3t4fMoZP6vbZn3yPLgjHrg5ETbw%40mail.gmail.com

[Memory leak from ExecutorState context?]: https://www.postgresql.org/message-id/flat/20230516200052.sbg6z4ghcmsas3wv@liskov#f6059259c7c9251fb8c17f5793a2d427

[Logical_Replication_High_Replay_Lag]: https://wiki.postgresql.org/wiki/PgCon_2022_Developer_Unconference#Logical_Replication_High_Replay_Lag

[Linux Cgroupv2 documentation]: https://docs.kernel.org/admin-guide/cgroup-v2.html