Installation and Administration Best practices

Proposal

This page is under construction - and hasn't been updated since 2009 (except for a ill-advised amendment on April 2, 2014). Seek knowledge elsewhere, as the material in here predates Postgres 9 it seems.

This page will contain information about the best way to install and maintain a Postgresql database, including environment variables, paths and other relevant stuff.

Fell free to add your suggestions and knowledge to make it a 'hands on' resource.

Self compiled vs. package distributed

Before installing, consider whether to use packages distributed with your operating system or whether to roll your own. (Compiling PostgreSQL on Windows might be a difficult task. It's quite easy on Linux/Unix boxes.)

Let's compare using a pre-built package and compiling PostgreSQL yourself. (Note: This is a rather Linux-centric view. For Windows, you'll likely want to use the binary packages provided for each release.)

Using pre-built package from distribution	Compiling yourself.
Very easy to install - just use your package manager.	You might need to install gcc and some development packages just for building PostgreSQL.
Installation is dependent on distribution (location of config files, initial tablespace).	You may install everything in one place, just where you want it.
Startup-scripts are included and supposed to work.	You need to provide your own system startup scripts.
The packages might be out of date or new minor versions might not become available frequently.	You are free to use the latest stable version and perform upgrades at your will.
The package management knows about the PostgreSQL installation and will update it.	Your package management doesn't know anything about the installation. Dependent libraries might get uninstalled or replaced by newer, incompatible versions. (Note: This is rather unlikely. I've never seen it happen. PostgreSQL doesn't depend on any strange or fast-evolving packages.)

Compiling and installing in Solaris

TODO: Add a workaround for the most common issues.

Compiling in Solaris using Sun Studio 12

./configure --prefix=/usr/local/pgsql84 CC=/opt/SUNWspro/bin/cc 'CFLAGS=-xO3 -xarch=native -xspace -W0,-Lt -W2,-Rcond_elim -Xa -xildoff -xc99=none -xCC' --datadir=/usr/local/pgsql84/data84 --enable-dtrace --enable-cassert --with-perl --with-python --with-libxml --with-libxslt --with-ossp-uuid --without-readline

Notes:

OSol don't have readline library
Maybe you don't need --enable-cassert option.

Issues

UUID: There is a problem with UUID library in Open Solaris 200911. If some one have a workaround in this, please post it.

Multiple Versions on the same host

If you have to install multiple PostgreSQL versions at the same host, compile from source and call configure like this:

 ./configure --prefix=/opt/postgresql-8.2.11 --with-pgport=8200

That way, you never need to worry what version you are talking with - you just look at the port number.

Other way is changing port in postgresql.conf. Beware of that if you have am own init script, remeber to change values of PGDATA and PGUSER.

Making sure it starts up at system boot time

TODO: Provide a default init script (if there's not already one in contrib/). ==== For

Recommended values to be changed in big servers

In Linux, the SHMMAX value can often be set rather low, especially in older 32-bit distributions. Depending on your PostgreSQL configuration, you might need to tweak the values of SHMMAX and/or SHMALL.

Example of a high configuration:

/etc/sysctl.conf

 fs.file-max = 32768
 kernel.shmmax = 1073741824
 kernel.shmall = 536870912

How to calculate? One of the equations is: (FIXME: does this formula refer to SHMMAX?)

250kb + 8.2kb * shared_buffers +14.2kb * max_connections

The SHMMAX variable controls the maximum amount of memory to be allocated for shared memory use. If you try to assign high values for e.g. the shared_buffers GUC in PostgreSQL without adjusting SHMMAX, you might see an error message in Postgres' log like " ... Failed system call was shmget ... usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter", and you'll have to adjust SHMMAX upward accordingly.

More information about SHMMAX can be found here.

Directory Locations Recommended

WAL Directory

Write Ahead Logging (WAL) is a critical part of Postgres' operation. Due to Postgres' use of the WAL to ensure data consistency, the WAL will receive significant I/O activity. Especially if your PGDATA directory isn't located on a capable RAID array, you might consider relocating the WAL (pg_xlog directory) to a separate disk to ease I/O load on the rest of the database. The WAL should be written in perfectly sequential fashion, so the I/O savings from relocating this directory can be substantial. However, be aware of additional data consistency risks you will take on by relocating the WAL.

(FIXME: awkward and vague paragraph) Configuration: If you have the PGDATA in very different location, you maybe want to have a /etc/postgresql/data<number>. Or, if you have several versions for testing purposes you maybe want to have 'debian' like tree (/etc/postgresql/<version>/<pgdata_number>). The default way is have config files inside the PGDATA.

See here for information on offloading various PostgreSQL data onto different drives.

Versioning sql scripts and configuration files

As you are doing right now (versioning the sql scripts), other best practice is to version the configuration files. Not only a simple versioning, remember that you have several environments (Development, Test, Production).

Other good practice, is to have versioned the DBA modifications in a separated script (SET STORAGE modifications, special indexes and rules, etc).

TODO: Paste a example.

Backup and Recovery strategies

DO:

use pg_dump with -Fc to ensure that you are using a custom format. This is the only way to prepare to do a parallelized restore using pg_restore (with the -j parameter)
make sure to adjust your postgresql.conf file with settings that optimize for restoring (ie: increase maintenance_work_mem and turn off replication)

TODO: How to perform those tasks.

Users authentication

Please for more information, read article [1].

One recommended installation for access by host mode is the md5 method for encrypted password. We can draw something like this:

host    all          all     192.168.1.0/24   md5

This mode, reduces the the access to the IP addresses that are included in 192.168.1.x and using the correct password for the user. Adding to this restriction, you must remember that in the postgres.conf you should modify listen_addresses variable listen_addresses.

Remember that you can create users with VALID UNTIL option. When you create a user you can calculate the timestamp with something like this:

SELECT (CURRENT_DATE+1)::timestamp;

You might replace 1 with the number of days that you want to enable the user(7=1 week, 30=month, etc.). Then, copy result in the ALTER or CREATE statement.

Remember: The first step before change trust for other method based in passwords, you should assign one to the user.

Changes in the pg_hba.conf only need reload signal. So, there is no need downtime.

You can create superusers for the databases, but remember that if you want to restrict the access to the databases, the superusers still have permissions to drop the other BD's and other maintenance tasks. Try to reduce the number of superusers or almost one.

LDAP Auth

TODO: explain the best ways to put on work this method [2]

Monitoring Index and table accessing and use

TODO: Explain how to monitor the use of tables and indexes to apply modification to the way to storage them.