Priorities
From PostgreSQL wiki
Contents |
Prioritizing users, queries, or databases
PostgreSQL has no facilities to limit what resources a particular user, query, or database consumes, or correspondingly to set priorities such that one user/query/database gets more resources than others. It's necessary to use operating system facilities to achieve what limited prioritization is possible.
There are three main resources that PostgreSQL users, queries, and databases will contend for:
- Memory
- CPU
- Disk I/O
Of these, disk I/O is commonly a bottleneck for database applications, but that's not always the case. Some schema designs and queries are particularly CPU heavy. Others really benefit from having lots of memory to work with, typically for sorting.
Are priorities really the problem?
Before struggling too much with prioritizing your queries/users/databases, it's worthwhile to optimize your queries and tune your database. You may find that you can get perfectly acceptable performance without playing with priorities or taking extreme measures, using techniques such as:
- Improving your queries
- Tune autovacuum to reduce bloat
- Generally polishing your cluster's performance
- Avoiding use of VACUUM FULL. That can lead to bloated indexes that eat lots of memory and take forever to scan, wasting disk I/O bandwidth. See the wiki page on VACUUM FULL for more information.
Is CPU really the bottleneck?
People often complain of pegged (100%) CPU and assume that's the cause of database slowdowns. That's not necessarily the case - a system may show an apparent 100% CPU use, but in fact be mainly limited by I/O bandwidth. Consider the following test, which starts 20 `dd' processes, each reading a different 1Gb block from the hard disk at 1Gb offsets.
for i in `seq 1 20`; do ( dd if=/dev/md0 bs=1M count=1000 skip=$(($i * 1000)) of=/dev/null &) done
results in `top' output of:
top - 14:51:55 up 3 days, 2:09, 5 users, load average: 10.92, 4.94, 2.93 Tasks: 259 total, 3 running, 256 sleeping, 0 stopped, 0 zombie Cpu(s): 1.6%us, 15.0%sy, 0.0%ni, 0.0%id, 78.6%wa, 0.8%hi, 4.0%si, 0.0%st Mem: 4055728k total, 3843408k used, 212320k free, 749448k buffers Swap: 2120544k total, 4144k used, 2116400k free, 2303356k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 33 root 15 -5 0 0 0 R 5 0.0 0:26.67 kswapd0 904 root 20 0 4152 1772 628 D 5 0.0 0:00.62 dd 874 root 20 0 4152 1768 628 D 3 0.0 0:00.74 dd 908 root 20 0 4152 1768 628 D 3 0.0 0:00.80 dd 888 root 20 0 4152 1772 628 D 3 0.0 0:00.44 dd 906 root 20 0 4152 1772 628 D 3 0.0 0:00.56 dd 894 root 20 0 4152 1768 628 D 2 0.0 0:00.49 dd 902 root 20 0 4152 1772 628 D 2 0.0 0:00.46 dd .... etc ....
... which could be confused for a busy CPU, but is really load caused by disk I/O. The key warning sign here is the presence of a high iowait cpu percentage ("%wa"), indicating that much of the apparent load is actually caused by delays in the I/O subsystem. Most of the `dd' processes are in 'D' state - ie uninterruptable sleep in a system call - and if you check "wchan" with "ps" you'll see that they're sleeping waiting for I/O.
Rather than assuming that CPU contention is the issue, it's a good idea to use the available Performance Analysis Tools to get a better idea of where your system bottlenecks really are.
Prioritizing CPU
For CPU you can use "nice" (on UNIX systems), but it's a bit clumsy to do since you need to "nice" the backend of interest, not the client program connected to that backend. You can get the backend process id using the SQL query "SELECT pg_backend_pid()" or by looking at the pg_stat_activities view, but there's no way within a normal session to then renice that backend. For the situation where there's a particular large query you want to reduce the runtime of, you could have a monitoring process do this job outside of the database client itself by looking for the query/client of interest in pg_stat_activity.
Developing a nice backend function
One potential approach is writing and installing a C-language function "nice_backend(prio)" that renices the calling backend to "prio". Limit "prio" to >= 0, and provide a second function "nice_backend_super(pid,prio)" that can renice any backend pid and set any priority, but is usable only by the superuser. I don't know of any canned implementations of these, but it shouldn't be hard to whip them up based on the C user defined function documentation and examples, using the nice(2) system call to do the dirty work. If you write something like this, please submit it to the pgsql-performance mailing list and provide a link to that post here.
Even with these functions, you will still need co-operation from the user application to renice its self appropriately before running a CPU-heavy query, and it won't do you much good if there's contention from large numbers of backends. Increasing the priority of important backends instead of lowering the priority of unimportant ones may be more effective.
Prioritizing I/O
I/O is harder. Some operating systems offer I/O priorities for processes, like Linux's ionice, and you'd think you could use these in a similar way to how you use 'nice'. Unfortunately, that won't work particularly well, because a lot of the work PostgreSQL does - especially disk writes - are done via a separate background writer process working from memory shared by all backends. Similarly, the write-ahead logs are managed by their own process via shared memory. Because of those two, it's very hard to effectively give one user priority over another for writes. ionice should be moderately effective for reads, though.
As with "nice", effective control on a per-connection level will require the addition of appropriate helper functions, and user co-operation is required to achieve per-user priorities.
Better separation of I/O workloads will require cluster separation, which has its own costs and is only effective on the per-database level.
Prioritizing memory
PostgreSQL does have some tunable parameters for memory use that are per-client, particularly work_mem and maintenance_work_mem. These may be set within a given connection to allow that backend to use more than the usual amount of memory for things like sorts and index creation. You can set these to conservative, low values in postgresql.conf then use the SET command to assign higher values to them for a particular backend, eg SET work_mem = '100MB';.
You can set different values for work_mem and maintenance_work_mem using per-user GUC variables. For example:
ALTER USER myuser SET work_mem = '50MB';
You cannot affect the shared memory allocation done with settings like shared_buffers this way, that value is fixed at database startup time and can't be changed without restarting it.
There's no easy way in most operating systems to prioritize memory allocations, so that for example the OS would prefer to swap one backend's memory out instead of another's.
External links
Credits
Page initially by Ringerc 02:34, 26 November 2009 (UTC)
