Prioritizing users, queries, or databases
PostgreSQL has no facilities to limit what resources a particular user, query, or database consumes, or correspondingly to set priorities such that one user/query/database gets more resources than others. It's necessary to use operating system facilities to achieve what limited prioritization is possible.
There are three main resources that PostgreSQL users, queries, and databases will contend for:
- Disk I/O
Of these, disk I/O is commonly a bottleneck for database applications, but that's not always the case. Some schema designs and queries are particularly CPU heavy. Others really benefit from having lots of memory to work with, typically for sorting.
Are priorities really the problem?
Before struggling too much with prioritizing your queries/users/databases, it's worthwhile to optimize your queries and tune your database. You may find that you can get perfectly acceptable performance without playing with priorities or taking extreme measures, using techniques such as:
- Improving your queries
- Tune autovacuum to reduce bloat
- Generally polishing your cluster's performance
- Avoiding use of VACUUM FULL. That eat lots of memory and take forever to scan, wasting disk I/O bandwidth. See the wiki page on VACUUM FULL for more information.
Is CPU really the bottleneck?
People often complain of pegged (100%) CPU and assume that's the cause of database slowdowns. That's not necessarily the case - a system may show an apparent 100% CPU use, but in fact be mainly limited by I/O bandwidth. Consider the following test, which starts 20 `dd' processes, each reading a different 1GB block from the hard disk (here: /dev/md0) at 1GB offsets (run the test as root).
sync ; sh -c "echo 3 > /proc/sys/vm/drop_caches for i in `seq 1 20`; do ( dd if=/dev/md0 bs=1M count=1000 skip=$(($i * 1000)) of=/dev/null &) done
results in `top' output of:
top - 14:51:55 up 3 days, 2:09, 5 users, load average: 10.92, 4.94, 2.93 Tasks: 259 total, 3 running, 256 sleeping, 0 stopped, 0 zombie Cpu(s): 1.6%us, 15.0%sy, 0.0%ni, 0.0%id, 78.6%wa, 0.8%hi, 4.0%si, 0.0%st Mem: 4055728k total, 3843408k used, 212320k free, 749448k buffers Swap: 2120544k total, 4144k used, 2116400k free, 2303356k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 33 root 15 -5 0 0 0 R 5 0.0 0:26.67 kswapd0 904 root 20 0 4152 1772 628 D 5 0.0 0:00.62 dd 874 root 20 0 4152 1768 628 D 3 0.0 0:00.74 dd 908 root 20 0 4152 1768 628 D 3 0.0 0:00.80 dd 888 root 20 0 4152 1772 628 D 3 0.0 0:00.44 dd 906 root 20 0 4152 1772 628 D 3 0.0 0:00.56 dd 894 root 20 0 4152 1768 628 D 2 0.0 0:00.49 dd 902 root 20 0 4152 1772 628 D 2 0.0 0:00.46 dd .... etc ....
... which could be confused for a busy CPU, but is really load caused by disk I/O. The key warning sign here is the presence of a high iowait cpu percentage ("%wa"), indicating that much of the apparent load is actually caused by delays in the I/O subsystem. Most of the `dd' processes are in 'D' state - ie uninterruptable sleep in a system call - and if you check "wchan" with "ps" you'll see that they're sleeping waiting for I/O.
Rather than assuming that CPU contention is the issue, it's a good idea to use the available Performance Analysis Tools to get a better idea of where your system bottlenecks really are.
For adjusting the CPU priority of PostgreSQL processes, you can use "renice" (on UNIX systems), but it's a bit clumsy to do since you need to "renice" the backend of interest, not the client program connected to that backend. You can get the backend process id using the SQL query "SELECT pg_backend_pid()" or by looking at the pg_stat_activity view.
One significant limitation of "renice", or any approach based on the setpriority() call, is that on most UNIX-like platforms one must be root to lower the numerical priority value (i.e. schedule the process to run more urgently) of a process.
Increasing the priority of important backends, via a root user's call to "renice", instead of lowering the priority of unimportant ones, may be more effective.
The prioritize extension lets users adjust the CPU priority, in the same way that "renice" does, via the SQL function set_backend_priority(). Normal users may increase the priority value of any backend process running under the same username. Superusers may increase the priority value of any backend process. Just like with using "renice" manually, it is not possible to lower a backend's priority value, since PostgreSQL will not be running as the "root" user.
If you know your application will be running an unimportant CPU-heavy query, you could have it call set_backend_priority(pg_backend_pid(), 20) after installing the "prioritize" module, so that the process is scheduled for the lowest possible urgency.
I/O is harder. Some operating systems offer I/O priorities for processes, like Linux's ionice, and you'd think you could use these in a similar way to how you use 'nice'. Unfortunately, that won't work particularly well, because a lot of the work PostgreSQL does - especially disk writes - are done via a separate background writer process working from memory shared by all backends. Similarly, the write-ahead logs are managed by their own process via shared memory. Because of those two, it's very hard to effectively give one user priority over another for writes. ionice should be moderately effective for reads, though.
As with "nice", effective control on a per-connection level will require the addition of appropriate helper functions, and user co-operation is required to achieve per-user priorities.
Better separation of I/O workloads will require cluster separation, which has its own costs and is only effective on the per-database level.
PostgreSQL does have some tunable parameters for memory use that are per-client, particularly
maintenance_work_mem. These may be set within a given connection to allow that backend to use more than the usual amount of memory for things like sorts and index creation. You can set these to conservative, low values in
postgresql.conf then use the
SET command to assign higher values to them for a particular backend, eg
SET work_mem = '100MB';.
You can set different values for
maintenance_work_mem using per-user GUC variables. For example:
ALTER USER myuser SET work_mem = '50MB';
You cannot affect the shared memory allocation done with settings like shared_buffers this way, that value is fixed at database startup time and can't be changed without restarting it.
There's no easy way in most operating systems to prioritize memory allocations, so that for example the OS would prefer to swap one backend's memory out instead of another's.
Page initially by Ringerc 02:34, 26 November 2009 (UTC)