https://wiki.postgresql.org/api.php?action=feedcontributions&user=Jaytaylor&feedformat=atomPostgreSQL wiki - User contributions [en]2024-03-28T13:39:05ZUser contributionsMediaWiki 1.35.13https://wiki.postgresql.org/index.php?title=Tuning_Your_PostgreSQL_Server&diff=18825Tuning Your PostgreSQL Server2013-01-08T20:05:45Z<p>Jaytaylor: /* listen_addresses */ state it in the positive</p>
<hr />
<div>__NOTOC__<br />
''by Greg Smith, Robert Treat, and Christopher Browne''<br />
<br />
{{Languages}}<br />
<br />
PostgreSQL ships with a basic configuration tuned for wide compatibility rather than performance. Odds are good the default parameters are very undersized for your system. Rather than get dragged into the details of everything you should eventually know (which you can find if you want it at the [http://www.pgcon.org/2008/schedule/events/104.en.html GUC Three Hour Tour]), here we're going to sprint through a simplified view of the basics, with a look at the most common things people new to PostgreSQL aren't aware of. You should click on the name of the parameter in each section to jump to the relevant documentation in the PostgreSQL manual for more details after reading the quick intro here. There is also additional information available about many of these parameters, as well as a list of parameters you shouldn't adjust, at [https://www.packtpub.com/article/server-configuration-tuning-postgresql Server Configuration Tuning].<br />
<br />
== Background Information on Configuration Settings ==<br />
<br />
PostgreSQL settings can be manipulated a number of different ways, but generally you will want to update them in your postgresql.conf file. The specific options available change from release to release, the definitive list is in the source code at src/backend/utils/misc/guc.c for your version of PostgreSQL (but the pg_settings view works well enough for most purposes).<br />
<br />
=== The types of settings ===<br />
<br />
There are several different types of configuration settings, divided up based on the possible inputs they take<br />
<br />
* Boolean: true, false, on, off<br />
* Integer: Whole numbers (2112)<br />
* Float: Decimal values (21.12)<br />
* Memory / Disk: Integers (2112) or "computer units" (512MB, 2112GB). Avoid integers--you need to know the underlying unit to figure out what they mean. Computer units are only available starting in version 8.2.<br />
* Time: "Time units" aka d,m,s (30s). Sometimes the unit is left out; don't do that<br />
* Strings: Single quoted text ('pg_log')<br />
* ENUMs: Strings, but from a specific list ('WARNING', 'ERROR')<br />
* Lists: A comma separated list of strings ('"$user",public,tsearch2) <br />
<br />
=== When they take effect ===<br />
<br />
PostgreSQL settings have different levels of flexibility for when they can be changed, usually related to internal code restrictions. The complete list of levels is:<br />
<br />
* Postmaster: requires restart of server <br />
* Sighup: requires a HUP of the server, either by kill -HUP (usually -1), pg_ctl reload, or select pg_reload_conf();<br />
* User: can be set within individual sessions, take effect only within that session<br />
* Internal: set at compile time, can't be changed, mainly for reference<br />
* Backend: settings which must be set before session start<br />
* Superuser: can be set at runtime for the server by superusers<br />
<br />
Most of the time you'll only use the first of these, but the second can be useful if you have a server you don't want to take down, while the user session settings can be helpful for some special situations. You can tell which type of parameter a setting is by looking at the "context" field in the pg_settings view.<br />
<br />
=== Important notes about postgresql.conf ===<br />
<br />
* You should be able to find it at $PGDATA/postgresql.conf; watch out for symbolic links and other trickiness<br />
* You can figure out the file location with ''SHOW config_file''<br />
* Lines with # are comments and have no effect. For a new database, this will mean the setting is using the default, but on running systems this may not hold true! '''In versions before 8.3, commenting out a setting does not restore it to the default.''' Even in versions after that, changes to the postgresql.conf do not take effect without a reload/restart, so it's possible for the system to be running something different than what is in the file. <br />
* If the same setting is listed multiple times, the last one wins<br />
<br />
=== Viewing the current settings === <br />
<br />
* Look in postgresql.conf. This works if you follow good practice, but it's not definitive!<br />
* ''show all'', ''show <setting>'' will show you the current value of the setting. Watch out for session specific changes<br />
* ''select * from pg_settings'' will label session specific changes as locally modified<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-LISTEN-ADDRESSES listen_addresses] ==<br />
<br />
By default, PostgreSQL only responds to connections from the local host. If you want your server to be accessible from other systems via standard TCP/IP networking, you need to change listen_addresses from its default. The usual approach is to set it to listen to all addresses like this:<br />
<br />
<code><pre><br />
listen_addresses = '*'<br />
</pre></code><br />
<br />
And then control who can and can connect via the [http://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html pg_hba.conf] file.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS max_connections]==<br />
<br />
max_connections sets exactly that: the maximum number of client connections allowed. This is very important to some of the below parameters (particularly work_mem) because there are some memory resources that are or can be allocated on a per-client basis, so the maximum number of clients suggests the maximum possible memory use. Generally, PostgreSQL on good hardware can support a few hundred connections. If you want to have thousands instead, you should consider using [[Replication, Clustering, and Connection Pooling|connection pooling software]] to reduce the connection overhead.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-SHARED-BUFFERS shared_buffers]==<br />
<br />
The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. One reason the defaults are low is because on some platforms (like older Solaris versions and SGI), having large values requires invasive action like recompiling the kernel. Even on a modern Linux system, the stock kernel will likely not allow setting shared_buffers to over 32MB without adjusting kernel settings first.<br />
<br />
If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. If you have less RAM you'll have to account more carefully for how much RAM the OS is taking up; closer to 15% is more typical there. There are some workloads where even larger settings for shared_buffers are effective, but given the way PostgreSQL also relies on the operating system cache, it's unlikely you'll find using more than 40% of RAM to work better than a smaller amount.<br />
<br />
Note that on Windows (and on PostgreSQL versions before 8.1), large values for shared_buffers aren't as effective, and you may find better results keeping it relatively low and using the OS cache more instead. On Windows the useful range is 64MB to 512MB, and for earlier than 8.1 versions the effective upper limit is near shared_buffers=50000 (just under 400MB--older versions before 8.2 don't allow using MB values for their settings, you specify this parameter in 8K blocks).<br />
<br />
It's likely you will have to increase the amount of memory your operating system allows you to allocate at once to set the value for shared_buffers this high. On UNIX-like systems, if you set it above what's supported, you'll get a message like this:<br />
<br />
<code><pre><br />
IpcMemoryCreate: shmget(key=5432001, size=415776768, 03600) failed: Invalid argument <br />
<br />
This error usually means that PostgreSQL's request for a shared memory <br />
segment exceeded your kernel's SHMMAX parameter. You can either <br />
reduce the request size or reconfigure the kernel with larger SHMMAX. <br />
To reduce the request size (currently 415776768 bytes), reduce <br />
PostgreSQL's shared_buffers parameter (currently 50000) and/or <br />
its max_connections parameter (currently 12).<br />
</pre></code><br />
<br />
See [http://www.postgresql.org/docs/current/static/kernel-resources.html Managing Kernel Resources] for details on how to correct this.<br />
<br />
Changing this setting requires restarting the database. Also, this is a hard allocation of memory; the whole thing gets allocated out of virtual memory when the database starts.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE effective_cache_size]==<br />
<br />
effective_cache_size should be set to an estimate of how much memory is available for disk caching by the operating system and within the database itself, after taking into account what's used by the OS itself and other applications. This is a guideline for how much memory you expect to be available in the OS and PostgreSQL buffer caches, not an allocation! This value is used only by the PostgreSQL query planner to figure out whether plans it's considering would be expected to fit in RAM or not. If it's set too low, indexes may not be used for executing queries the way you'd expect. The setting for shared_buffers is not taken into account here--only the effective cache_size_value is, so it should include memory dedicated to the database too.<br />
<br />
Setting effective_cache_size to 1/2 of total memory would be a normal conservative setting, and 3/4 of memory is a more aggressive but still reasonable amount. You might find a better estimate by looking at your operating system's statistics. On UNIX-like systems, add the free+cached numbers from free or top to get an estimate. On Windows see the "System Cache" size in the Windows Task Manager's Performance tab. Changing this setting does not require restarting the database (HUP is enough).<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-CHECKPOINTS checkpoint_segments checkpoint_completion_target]==<br />
<br />
PostgreSQL writes new transactions to the database in files called WAL segments that are 16MB in size. Every time checkpoint_segments worth of these files have been written, by default 3, a checkpoint occurs. Checkpoints can be resource intensive, and on a modern system doing one every 48MB will be a serious performance bottleneck. Setting checkpoint_segments to a much larger value improves that. Unless you're running on a very small configuration, you'll almost certainly be better setting this to at least 10, which also allows usefully increasing the completion target.<br />
<br />
For more write-heavy systems, values from 32 (checkpoint every 512MB) to 256 (every 4GB) are popular nowadays. Very large settings use a lot more disk and will cause your database to take longer to recover, so make sure you're comfortable with both those things before large increases. Normally the large settings (>64/1GB) are only used for bulk loading. Note that whatever you choose for the segments, you'll still get a checkpoint at least every 5 minutes unless you also increase checkpoint_timeout (which isn't necessary on most systems).<br />
<br />
;PostgreSQL 8.3 and newer<br />
<br />
Starting with PostgreSQL 8.3, the checkpoint writes are spread out a bit while the system starts working toward the next checkpoint. You can spread those writes out further, lowering the average write overhead, by increasing the checkpoint_completion_target parameter to its useful maximum of 0.9 (aim to finish by the time 90% of the next checkpoint is here) rather than the default of 0.5 (aim to finish when the next one is 50% done). A setting of 0 gives something similar to the behavior of the earlier versions. The main reason the default isn't just 0.9 is that you need a larger checkpoint_segments value than the default for broader spreading to work well. For lots more information on checkpoint tuning, see [http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm Checkpoints and the Background Writer] (where you'll also learn why tuning the background writer parameters, particularly those in 8.2 and below, is challenging to do usefully).<br />
<br />
==[http://www.postgresql.org/docs/current/static/routine-vacuuming.html#AUTOVACUUM autovacuum] [http://www.postgresql.org/docs/8.3/static/runtime-config-resource.html#GUC-MAX-FSM-PAGES max_fsm_pages, max_fsm_relations]==<br />
<br />
The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.<br />
<br />
However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data.<br />
<br />
;PostgreSQL 8.4 and newer<br />
<br />
The FSM was rewritten for PostgreSQL 8.4, so earlier advice is no longer applicable. The <tt>max_fsm_pages</tt> and <tt>max_fsm_relations</tt> settings are gone, as the new FSM is self-adapting ([http://www.depesz.com/index.php/2008/10/04/waiting-for-84-new-fsm-free-space-map/ more info]). <tt>autovacuum</tt> is enabled by default and should remain so, as vacuum much less invasive in 8.4 than before thanks to [http://www.depesz.com/index.php/2008/12/08/waiting-for-84-visibility-maps/ visibility maps].<br />
<br />
;PostgreSQL 8.3 and earlier<br />
<br />
As of 8.3, autovacuum is turned on by default, and you should keep it that way. In 8.1 and 8.2 you will have to turn it on yourself. Note that in those earlier versions, you may need to tweak its settings a bit to make it aggressive enough; it may not do enough work by default if you have a larger database or do lots of updates.<br />
<br />
You may also need to increase the value of max_fsm_pages and max_fsm_relations as needed. The Free Space Map is used to track where there are dead tuples (rows) that may be reclaimed. You will only get effective nonblocking VACUUM queries if the dead tuples can be listed in the Free Space Map. As a result, if you do not plan to run VACUUM frequently, and if you expect a lot of updates, you should ensure these values are usefully large (and remember, these values are cluster wide, not database wide). It should be easy enough to set max_fsm_relations high enough; the problem that will more typically occur is when max_fsm_pages is not set high enough. Once the Free Space Map is full, VACUUM will be unable to track further dead pages. In a busy database, this needs to be set much higher than 1000... also, remember that changing these settings requires a restart of the database, so it is wise to to err on the side of setting comfortable margins for these settings. <br />
<br />
If you run VACUUM VERBOSE on your database, it'll tell you how many pages and relations are in use (and, under 8.3, what the current limits are). For example,<br />
<br />
<pre><br />
INFO: free space map contains 5293 pages in 214 relations<br />
DETAIL: A total of 8528 page slots are in use (including overhead).<br />
8528 page slots are required to track all free space.<br />
Current limits are: 204800 page slots, 1000 relations, using 1265 kB.<br />
</pre><br />
<br />
If you find that your settings are already too low, you will likely need to do aggressive vacuuming of your system, and possibly reindexing and vacuum full maybe needed as well. If you're getting close to the limits for page slots, typical practice is to just double the current values, with perhaps a smaller percentage increase once you've gotten much higher (in the millions range). For the max relations settings, note that this setting includes all the databases in your cluster.<br />
<br />
One other situation to be aware of is that of a database approaching autovacuum_freeze_max_age. When a database approaches this point, it will begin to vacuum every table in the database that has not been vacuumed before. On some systems this may not result in much activity, but for systems where there are a lot of tables that are not modified often, this can be a more common occurrence (especially if the system has gone through a dump/restore, say for upgrading). The significance of all of this is that, even on a system with well set fsm settings, once your system begins vacuuming all of the additional tables, your old fsm setting may no longer be appropriate.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-logging.html logging]==<br />
There are many things you can log that may or may not be important to you. You should investigate the documentation on all of the options, but here are some tips & tricks to get you started:<br />
<br />
*pgFouine is a tool used to analyze postgresql logs for performance tuning. If you plan to use this tool, it has specific logging requirements. Please check http://pgfouine.projects.postgresql.org/<br />
<br />
*a newer alternative to pgFouine is pgbadger: http://dalibo.github.com/pgbadger/<br />
<br />
*log_destination & log_directory (& log_filename): What you set these options to is not as important as knowing they can give you hints to determine where your database server is logging to. Best practice would be to try and make this as similar as possible across your servers. Note that in some cases, the init script starting your database may be customizing the log destination in the command line used to start the database, overriding what's in the postgresql.conf (and making it so you'll get different behavior if you run pg_ctl manually instead of using the init script).<br />
<br />
*log_min_error_statement: You should probably make sure this is at least on error, so that you will see any SQL commands which cause an error. should be the default on recent versions. <br />
<br />
*log_min_duration_statement: Not necessary for everyday use, but this can generate [[Logging Difficult Queries|logs of "slow queries"]] on your system. <br />
<br />
*log_line_prefix: Appends information to the start of each line. A good generic recommendation is '%t:%r:%u@%d:[%p]: ' : %t=timestamp, %u=db user name, %r=host connecting from, %d=database connecting to, %p=PID of connection. It may not be obvious what the PID is useful at first, but it can be vital for trying to troubleshoot problems in the future so better to put in the logs from the start.<br />
<br />
*log_statement: Choices of none, ddl, mod, all. Using all in production leads to severe performance penalties. DDL can sometime be helpful to discover rogue changes made outside of your recommend processes, by "cowboy DBAs" for example.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET default_statistics_target]==<br />
<br />
The database software collects statistics about each of the tables in your database to decide how to execute queries against it. In earlier versions of PostgreSQL, the default setting of 10 doesn't collect very much information, and if you're not getting good execution query plans particularly on larger (or more varied) tables you should increase default_statistics_target then ANALYZE the database again (or wait for autovacuum to do it for you). <br />
<br />
;PostgreSQL 8.4 and later<br />
<br />
The starting default_statistics_target value was raised from 10 to 100 in PostgreSQL 8.4. Increases beyond 100 may still be useful, but this increase makes for greatly improved statistics estimation in the default configuration. The maximum value for the parameter was also increased from 1000 to 10,000 in 8.4.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-WORK-MEM work_mem maintainance_work_mem]==<br />
<br />
If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents.<br />
<br />
This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem. You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory.<br />
<br />
maintenance_work_mem is used for operations like vacuum. Using extremely large values here doesn't help very much, and because you essentially need to reserve that memory for when vacuum kicks in, takes it away from more useful purposes. Something in the 256MB range has anecdotally been a reasonably large setting here.<br />
<br />
;PostgreSQL 8.3 and later<br />
<br />
In 8.3 you can use log_temp_files to figure out if sorts are using disk instead of fitting in memory. In earlier versions, you might instead just monitor the size of them by looking at how much space is being used in the various ''$PGDATA/base/<db oid>/pgsql_tmp'' files. You can see sorts to disk happen in EXPLAIN ANALYZE plans as well. For example, if you see a line like "Sort Method: external merge Disk: 7526kB" in there, you'd know a work_mem of at least 8MB would really improve how fast that query executed, by sorting in RAM instead of swapping to disk.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-SYNC-METHOD wal_sync_method wal_buffers]==<br />
<br />
After every transaction, PostgreSQL forces a commit to disk out to its write-ahead log. This can be done a couple of ways, and on some platforms the other options are considerably faster than the conservative default. open_sync is the most common non-default setting switched to, on platforms that support it but default to one of the fsync methods. See [http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm Tuning PostgreSQL WAL Synchronization] for a lot of background on this topic. Note that open_sync writing is buggy on some platforms (such as [http://lwn.net/Articles/350219/ Linux]), and you should (as always) do plenty of tests under a heavy write load to make sure that you haven't made your system less stable with this change. [[Reliable Writes]] contains more information on this topic.<br />
<br />
Linux kernels starting with version 2.6.33 will cause earlier versions of PostgreSQL to default to wal_sync_method=open_datasync; before that kernel release the default picked was always fdatasync. This can cause a significant performance decrease when combined with small writes and/or small values for wal_buffers.<br />
<br />
Increasing wal_buffers from its tiny default of a small number of kilobytes is helpful for write-heavy systems. Benchmarking generally suggests that just increasing to 1MB is enough for some large systems, and given the amount of RAM in modern servers allocating a full WAL segment (16MB, the useful upper-limit here) is reasonable. Changing wal_buffers requires a database restart.<br />
<br />
;PostgreSQL 9.1 and later<br />
<br />
Starting with PostgreSQL 9.1 wal_buffers defaults to being 1/32 of the size of shared_buffers, with an upper limit of 16MB (reached when shared_buffers=512MB).<br />
<br />
PostgreSQL 9.1 also changes the logic for selecting the default wal_sync_method such that on newer Linux kernels, it will still select fdatasync as its method--the same as on older Linux versions.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION constraint_exclusion]==<br />
<br />
;PostgreSQL 8.4 and later<br />
In 8.4, <tt>constraint_exclusion</tt> now defaults to a new choice: <tt>partition</tt>. This will only enable constraint exclusion for partitioned tables which is the right thing to do in nearly all cases.<br />
<br />
;PostgreSQL 8.3 and earlier<br />
If you plan to use table partitioning, you need to turn on constraint exclusion. Since it does add overhead to query planning, it is recommended you leave this off outside of this scenario.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-MAX-PREPARED-TRANSACTIONS max_prepared_transactions]==<br />
<br />
This setting is used for managing 2 phase commit. If you do not use two phase commit (and if you don't know what it is, you don't use it), then you can set this value to 0. That will save a little bit of shared memory. For database systems with a large number (at least hundreds) of concurrent connections, be aware that this setting also affects the number of available lock-slots in pg_locks, so you may want to leave it at the default setting. There is a formula for how much memory gets allocated [http://www.postgresql.org/docs/current/static/kernel-resources.html#SHARED-MEMORY-PARAMETERS in the docs] and in the default postgresql.conf.<br />
<br />
Changing max_prepared_transactions requires a server restart.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT synchronous_commit]==<br />
PostgreSQL can only safely use a write cache if it has a battery backup. See [http://www.postgresql.org/docs/current/static/wal-reliability.html WAL reliability] for an essential introduction to this topic. No, really; go read that right now, it's vital to understand that if you want your database to work right.<br />
<br />
You may be limited to approximately 100 transaction commits per second per client in situations where you don't have such a durable write cache (and perhaps only 500/second even with lots of clients).<br />
<br />
;PostgreSQL 8.3 and later<br />
<br />
Asynchronous commit was introduced in PostgreSQL 8.3. For situations where a small amount of data loss is acceptable in return for a large boost in how many updates you can do to the database per second, consider switching synchronous commit off. This is particularly useful in the situation where you do not have a battery-backed write cache on your disk controller, because you could potentially get thousands of commits per second instead of just a few hundred.<br />
<br />
For earlier versions of PostgreSQL, you may find people recommending that you set ''fsync=off'' to speed up writes on busy systems. This is dangerous--a power loss could result in your database getting corrupted and not able to start again. Synchronous commit doesn't introduce the risk of ''corruption'', which is really bad, just some risk of data ''loss''.<br />
<br />
==[http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-RANDOM-PAGE-COST random_page_cost]==<br />
This setting suggests to the optimizer how long it will take your disks to seek to a random disk page, as a multiple of how long a sequential read (with a cost of 1.0) takes. If you have particularly fast disks, as commonly found with RAID arrays of SCSI disks, it may be appropriate to lower random_page_cost, which will encourage the query optimizer to use random access index scans. Some feel that 4.0 is always too large on current hardware; it's not unusual for administrators to standardize on always setting this between 2.0 and 3.0 instead. In some cases that behavior is a holdover from earlier PostgreSQL versions where having random_page_cost too high was more likely to screw up plan optimization than it is now (and setting at or below 2.0 was regularlly necessary). Since these cost estimates are just that--estimates--it shouldn't hurt to try lower values.<br />
<br />
But this not where you should start to search for plan problems. Note that random_page_cost is pretty far down this list (at the end in fact). If you are getting bad plans, this shouldn't be the first thing you look at, even though lowering this value may be effective. Instead, you should start by making sure autovacuum is working properly, that you are collecting enough statistics, and that you have correctly sized the memory parameters for your server--all the things gone over above. After you've done all those much more important things, if you're still getting bad plans ''then'' you should see if lowering random_page_cost is still useful.<br />
<br />
[[Category:Administration]] [[Category:Performance]]</div>Jaytaylorhttps://wiki.postgresql.org/index.php?title=Binary_Replication_Tutorial&diff=18697Binary Replication Tutorial2012-12-11T20:13:31Z<p>Jaytaylor: /* Cloning a Live Database */ type</p>
<hr />
<div>Welcome to the new PostgreSQL 9 replication and standby databases guide. This new set of features implements possibly the longest awaited functionality in PostgreSQL's history. As a result, a lot of people are going to be trying to deploy standby databases for the first time, and find the process rather unintuitive. This guide is here to help.<br />
<br />
'''Work in progress: only 40% complete'''<br />
<br />
= 5 Minutes to Simple Replication =<br />
<br />
This is the easiest way to set up replication between a master and standby. It requires shutting down the master; other methods are detailed later in this guide.<br />
<br />
What we're going to do is shut down the master and copy the files we need over to the slave server, creating a cloned copy of the master. Because the master is shut down, we don't have to worry about changes being made to it.<br />
<br />
Note: Both the '5 minutes' instructions and the '10 minutes' version which follows do not deal with the complications that arise with a database that uses tablespaces, specifically what to do about the pg_tblspc directory and its contents.<br />
<br />
== Prerequisites ==<br />
<br />
You must have the right setup to make this work:<br />
<br />
* 2 servers with similar operating systems (e.g both Linux 64-bit).<br />
* The same release of PostgreSQL 9.0 installed on both servers.<br />
* PostgreSQL superuser shell access on both servers.<br />
* Knowledge of how to start, stop and reload Postgres.<br />
* PostgreSQL 9.0 running on Server1.<br />
* A database created and loaded on Server1.<br />
* A postgres user or root user who has network <br />
<br />
See the full documentation for more information:<br />
<br />
* [http://www.postgresql.org/docs/9.0/static/warm-standby.html 9.0 Replication Documentation]<br />
* [http://www.postgresql.org/docs/9.1/static/warm-standby.html 9.1 Replication Documentation]<br />
<br />
== Binary Replication in 6 Steps ==<br />
<br />
This 6-step guide, and all of the examples in this tutorial, assume that you have a master server at 192.168.0.1 and a standby server at 192.168.0.2 and that your database and its configuration files are installed at /var/lib/postgresql/data. Replace those with whatever your actual server addresses and directories are.<br />
<br />
1. Edit postgresql.conf on the master to turn on streaming replication. Change these settings:<br />
<br />
listen_address = '*'<br />
wal_level = hot_standby<br />
max_wal_senders = 3<br />
<br />
2. Edit pg_hba.conf on the master in order to let the standby connect. <br />
<br />
host replication all 192.168.0.2/32 trust<br />
<br />
3. Edit recovery.conf and postgresql.conf on the standby to start up replication and hot standby. First, in postgresql.conf, change this line:<br />
<br />
hot_standby = on<br />
<br />
Then create a file in the standby's '''data directory''' (which is often the same directory as postgresql.conf and pg_hba.conf, except on some Linux distributions such as Debian and Ubuntu), called recovery.conf, with the following lines:<br />
<br />
standby_mode = 'on'<br />
primary_conninfo = 'host=192.168.0.1'<br />
<br />
4. Shutdown the master and copy the files. You want to copy most but not all files between the two servers, excluding the configuration files and the pg_xlog directory. An example rsync script would be:<br />
<br />
rsync -av --exclude pg_xlog --exclude postgresql.conf data/* 192.168.0.2:/var/lib/postgresql/data/<br />
<br />
5. Start the standby first, so that they can't get out of sync. (Messages will be logged about not being able to connect to the primary server, that's OK.)<br />
<br />
6. Start the master.<br />
<br />
== Starting Replication with only a Quick Master Restart ==<br />
<br />
Is taking down the master for long enough to copy the files too long? Then you need the 10-minute version.<br />
<br />
What we're going to do this time is similar to what we did before, cloning the database by copying the files from the master to the slave server. However, because the database is only going to be shut down for a short period of time, long enough to activate the changes in the configuration file, after we've copied the data files we will need to copy additional files so that the slave will be an up-to-date copy of the master. <br />
<br />
So, we will tell the master we're running a backup, copy the data files (not quite the same set of files as before), tell the master the backup is complete, then copy the WAL files in the pg_xlog directory so that when the slave comes up it can make all the changes that were committed to the master database after the backup was started.<br />
<br />
First, start with the same prerequisites as above.<br />
<br />
1. Set the postgresql.conf variables the same in step (1) as above.<br />
<br />
2. Don't close the file yet. You'll need to set two other variables which control the size of your write-ahead-log (WAL). The first is wal_keep_segments, the second is checkpoint_segments. Unless you've already done so, you're going to need to increase these, which is usually a good idea for performance anyway. You want the WAL to be big enough to not get used up in 15 or 20 minutes. If you don't have a clear idea of that, here's some reasonable values, based on how busy and how large your database is. Also, a database with large blob objects may require a much larger setting. Remember, these logs will take up disk space, so make sure that you have enough available - space requirements are below.<br />
<br />
checkpoint_segments = 8 <br />
wal_keep_segments = 8 <br />
# light load 500MB<br />
<br />
checkpoint_segments = 16<br />
wal_keep_segments = 32<br />
# moderately busy 1.5GB <br />
<br />
checkpoint_segments = 64<br />
wal_keep_segments = 128<br />
# busy server 5GB<br />
<br />
You don't ''have'' to increase checkpoint_segments in order to increase wal_keep_segments, but it's generally a good idea. Now save the file. <br />
<br />
3. Edit pg_hba.conf as in (2) in the "Six Steps" above.<br />
<br />
4. Now you need to restart the master. Given the interruption in service, you should probably plan this ahead. <br />
<br />
5. Edit postgresql.conf and recovery.conf on the standby as in (3) above.<br />
<br />
6. Now, we're going to need to copy the files from the master and start the standby. Unlike in the 6-step version, this needs to be done quickly or the standby will fail to sync and you'll need to try again. First step, you need to tell the master you're starting a backup (see below for a more detailed explanation of this). Log in to psql as the database superuser.<br />
<br />
psql -U postgres<br />
# select pg_start_backup('clone',true);<br />
<br />
Note that the string you use as a backup label doesn't matter; use any string you want.<br />
<br />
7. Now, quickly copy all the database files. This rsync is slightly different from the 6-step version:<br />
<br />
rsync -av --exclude pg_xlog --exclude postgresql.conf --exclude postgresql.pid \ <br />
data/* 192.168.0.2:/var/lib/postgresql/data/<br />
<br />
8. As soon as that's done you need to stop the backup on the master:<br />
<br />
# select pg_stop_backup();<br />
<br />
9. As soon as that completes, you need to quickly copy the WAL files from the master to the standby.<br />
<br />
rsync -av data/pg_xlog 192.168.0.2:/var/lib/postgresql/data/<br />
<br />
10. Now, start the standby.<br />
<br />
If you've done this quickly enough, then the standby should catch up with the master and you should be replicating. If not, you'll get this message:<br />
<br />
(Future Revisions note: Message needs to go here)<br />
<br />
... which means you need to try again, possibly with checkpoint_segments and wal_keep_segments higher. If that still doesn't work, you're going to need to use the even more complex archiving method described below.<br />
<br />
Now, the rest of the guide will explain how to deal with more complex situations, such as archive logs, handling security, and maintaining availability, failover and standby promotion.<br />
<br />
= Introduction to Binary Replication =<br />
<br />
Binary replication is also called "Hot Standby" and "Streaming Replication" which are two separate, but complimentary, features of PostgreSQL 9.0 and later. Here's some general information about how they work and what they are for.<br />
<br />
== What Can You Do With Binary Replication? ==<br />
<br />
* Have a simple and complete replica of your production database, preventing all but a couple seconds of data loss even under catastrophic circumstances.<br />
* Load-balance between your read/write master server and multiple read-only slave servers. (Note: This means that queries that aren't read-only cannot be run on a slave server. A common misconception has to do with finding the 'current' value of a sequence on a slave server, that is not possible.)<br />
* Run reporting or other long-running queries on a replica server, taking them off your main transaction-processing server.<br />
* Replicate all DDL, including table and index changes, and even creating new databases.<br />
* Replicate a hosted multi-tenant database, making no specific requirements for primary keys or database changes of your users.<br />
<br />
== What Can't You Do With Binary Replication? ==<br />
<br />
* Replicate a specific table, schema, or database. Binary replication is the entire Postgres instance (or "cluster").<br />
* Multi-master replication. Multi-master binary replication is probably technically impossible.<br />
* Replicate between different versions of PostgreSQL, or between different platforms.<br />
* Set up replication without administration rights on the server. Sorry, working on it.<br />
* Replicate data synchronously, guaranteeing zero data loss. And ... this is here since the release of PostgreSQL 9.1!<br />
<br />
For the reasons above, we expect that Slony-I, Londiste, Bucardo, pgPool2 and other systems will continue to be used.<br />
<br />
== Transaction Logs and Log Shipping ==<br />
<br />
Users who are already familiar with the PostgreSQL transaction log and warm standby can skip this section.<br />
<br />
An individual "instance", "server", or (confusingly) "cluster" of PostgreSQL (hereafter Server) consists of a single postmaster server process connected to a single initialized PostgreSQL data directory (PGDATA), which in turn contains several databases. Each running Server has a transaction log, located in the PGDATA/pg_xlog directory. This transaction log consists of binary snapshots of data, written to record synchronously each change to all databases' data, in case of unexpected shutdown of the database server (such as in a power failure). This ensures that data is not corrupted and no completed transaction is lost.<br />
<br />
You can also use this log to allow a copy of the original database to replicate changes made to a master database. This was first implemented with the PITR feature in PostgreSQL 8.0, and is known as "log shipping". Log shipping is required for most forms of binary replication.<br />
<br />
This log consists of 16MB segments full of new data pages (8K segments) of the database, and not of SQL statements. For this reason there is no before and after auditing possible via this log, as you cannot know exactly what has changed. Also, the log is treated as a buffer, being deleted as it is no longer needed for crash recovery. More importantly, the data page format of the log means that log segments can only be applied to a database which is binary-identical to the database which created the log. <br />
<br />
== PITR, Warm Standby, Hot Standby, and Streaming Replication ==<br />
<br />
For the rest of this tutorial, we will refer to the active read-write instance of the Server which generates transaction logs as the "Master" and the passive, read-only or offline instance (or instances) of the Server which receives transaction logs as the "Standby" (or "Standbys"). The term Master/Standby is equivalent to other terminology which may be used in the database industry, such as Master/Slave, Primary/Secondary or Primary/Replica.<br />
<br />
=== PITR ===<br />
<br />
In Point-In-Time Recovery (PITR), transaction logs are copied and saved to storage until needed. Then, when needed, the Standby server can be "brought up" (made active) and transaction logs applied, either stopping when they run out or at a prior point indicated by the administrator. PITR has been available since PostgreSQL version 8.0, and as such will not be documented here.<br />
<br />
PITR is primarily used for database forensics and recovery. It is also useful when you need to back up a very large database, as it effectively supports incremental backups, which pg_dump does not.<br />
<br />
=== Warm Standby ===<br />
<br />
In Warm Standby, transaction logs are copied from the Master and applied to the Standby immediately after they are received, or at a short delay. The Standby is offline (in "recovery mode") and not available for any query workload. This allows the Standby to be brought up to full operation very quickly. Warm Standby has been available since version 8.3, and will not be fully documented here.<br />
<br />
Warm Standby requires Log Shipping. It is primary used for database failover.<br />
<br />
=== Hot Standby ===<br />
<br />
Hot Standby is identical to Warm Standby, except that the Standby is available to run read-only queries. This offers all of the advantages of Warm Standby, plus the ability to distribute some business workload to the Standby server(s). Hot Standby by itself requires Log Shipping.<br />
<br />
Hot Standby is used both for database failover, and can also be used for load-balancing. In contrast to Streaming Replication, it places no load on the master (except for disk space requirements) and is thus theoretically infinitely scalable. A WAL archive could be distributed to dozens or hundreds of servers via network storage. The WAL files could also easily be copied over a poor quality network connection, or by SFTP.<br />
<br />
However, since Hot Standby replicates by shipping 16MB logs, it is at best minutes behind and sometimes more than that. This can be problematic both from a failover and a load-balancing perspective.<br />
<br />
=== Streaming Replication ===<br />
<br />
Streaming Replication improves either Warm Standby or Hot Standby by opening a network connection between the Standby and the Master database, instead of copying 16MB log files. This allows data changes to be copied over the network almost immediately on completion on the Master. <br />
<br />
In Streaming Replication, the master and the standby have special processes called the walsender and walreceiver which transmit modified data pages over a network port. This requires one fairly busy connection per standby, imposing an incremental load on the master for each additional standby. Still, the load is quite low and a single master should be able to support multiple standbys easily.<br />
<br />
Streaming replication does not require log shipping in normal operation. It may, however, require log shipping to start replication, and can utilize log shipping in order to catch up standbys which fall behind.<br />
<br />
= How to Replicate =<br />
<br />
== Cloning a Live Database ==<br />
<br />
If your workload doesn't allow you to take the master down (and whose does?), things get a bit more complicated. You need to somehow take a "coherent snapshot" of the master, so that you don't have an inconsistent or corrupt database on the standby. Now, in some cases this can be done via filesystem snapshotting tools or similar tricks, but as that approach is tricky and platform-dependent, we're not going to cover it here.<br />
<br />
Instead, we're going to cover the built-in method, which involves keeping a log of all changes applied to the database which happen during the copying process. The steps are essentially the same, regardless of whether you're planning to use just hot standby, streaming replication, or both. There are two parts:<br />
<br />
* Cloning the database files<br />
* Copying the archive logs<br />
<br />
Unintuitive as it is, the latter needs to be set up first, so we're going to start with that.<br />
<br />
== Setting Up Archiving On The Master ==<br />
<br />
Archiving is the process of making an extra copy of each WAL file as it is completed. These log files then need to somehow be accessed by the standby. There are three basic ways to handle this, and you should decide in advance what method you're going to use:<br />
<br />
# Manually<br />
# Automatic file copying from master to standby using rsync or simiar<br />
# Writing them to a common shared network file location<br />
<br />
The first method is only appropriate if you're archiving logs only to jump-start streaming replication, and you have a fairly low-traffic database or the ability to stop all writes. The third method is probably the easiest to manage if you have an appropriate network share; it can even be used to support multiple standbys with some extra thought and scripting. All of these methods will be explained below.<br />
<br />
This needs to be turned on on the master, which if it's never been done before may require a restart (sorry, working on it), and will certainly require a reload. You'll need to set the following parameters:<br />
<br />
wal_level = hot_standby<br />
archive_mode = on<br />
archive_command = 'some command'<br />
<br />
What archive command you use depends on which archiving approach you are taking, of course. Here are three examples of commands you might use. Note that you will need to create the "archive" directories.<br />
<br />
# Manual: cp -f %p /var/lib/postgresql/data/archive/%f </dev/null<br />
# Automatic Copy: rsync -a %p 192.168.0.2:/var/lib/pgsql/data/archive/%f<br />
# Network Share: cp -f %p /shares/walarchive/archive/%f </dev/null<br />
<br />
In these commands, %p is replace by postgres at invocation time with the full path and name of the WAL file, and %f with the name of the file alone. There are more escapes and parameters dealing with WAL archiving which will be detailed later in the tutorial. Note that, in real production, you are unlikely to want to use any commands as simple as the above. In general, you will want to have archive_command call an executable script which traps errors and can be disabled. Examples of such scripts are available in this tutorial.<br />
<br />
Now, if archive_mode was originally "off" or if you had to change wal_level, you're going to need to restart the master (sorry, this will be fixed in a later version). If you just needed to change the archive_command, however, only a reload is required.<br />
<br />
Once you've restarted or reloaded, check the master's logs to make sure archiving is working. If it's failing, the master will complain extensively. You might also check that archive log files are being created; run the command "SELECT pg_switch_xlog();" as the superuser to force a new log to be written.<br />
<br />
== Setting Up Archiving on the Standby ==<br />
<br />
The standby needs to be configured to consume logs. This is simpler than the master's setup, and doesn't really change no matter what archive copying strategy you're using.<br />
<br />
== Recovery.conf ==<br />
<br />
On the standby, replication configuration is controlled through a file called, for historical reasons, recovery.conf. If this file is present in PostgreSQL's data directory when PostgreSQL is started, that server will assume it is a standby and attempt to obey it. Generally, there is an example file installed with the other PostgreSQL shared docs. However, that example file covers all of the various replication options at once, so it's often simpler to write your own file, from scratch. Any change to recovery.conf requires a restart of the standby.<br />
<br />
In recovery.conf, you need to add a command to copy the archived WAL files to the standby's on pg_xlog directory. This is the mirror image of the archive_command on the master. Generally, a simple cp command is sufficient:<br />
<br />
restore_command = 'cp -f /var/lib/postgresql/data/archive/%f %p </dev/null'<br />
restore_command = 'cp -f /shares/walarchive/%f %p </dev/null'<br />
<br />
Again, you might want to use a simple shell script which traps error messages, and, importantly, deletes archive files which are no longer needed. If you will be doing only hot standby and not using streaming replication, you probably want to compile the pg_standby binary provided in PostgreSQL's additional modules or "contrib", and use it instead:<br />
<br />
restore_command = 'pg_standby /shares/walarchive %f %p %r'<br />
<br />
More detail on pg_standby is in its documentation.<br />
<br />
== Cloning a Snapshot of the Master ==<br />
<br />
Once you have archiving working, you're ready to clone the master database. At this point, it's a simple process:<br />
<br />
# As superuser, issue the command "SELECT pg_start_backup('backup');" on the master.<br />
# Copy all of the database files to the standby.<br />
# Start the standby database.<br />
# Issue the command "SELECT pg_stop_backup();" on the master.<br />
<br />
Of course, each of those steps deserves a little more elaboration. pg_start_backup and pg_stop_backup are special commands you issue on the master in order to create, hold open, and close, a "snapshot" which is how we make sure your copy of the database is not inconsistent. They also write special files to the archive log which tell the standby when it has a complete snapshot.<br />
<br />
If you are using the "manual" method of synching the archive logs, immediately after step 4 you need to do one last rsync or copy of the archive logs to the standby.<br />
<br />
When you're done with the cloning, you should see output similar to the below:<br />
<br />
This means that you're up and replicating, and should now be able to run queries on the standby.<br />
<br />
== Failing Over To The Standby ==<br />
<br />
Of course, one of the major reasons to have a standby is in case something (planned or unplanned) causes the master server to shut down. Then you want to "fail over", or stop replication and change the standby to a full read-write master.<br />
<br />
The recommended method is the same regardless of the type of replication or standby: via "trigger file". First, you need to set a configuration option in recovery.conf on the standby:<br />
<br />
trigger_file = '/var/lib/postgresql/data/failover'<br />
<br />
Then, when it's time to fail over, you just create an empty file with that name, such as by using the "touch" command. The standby will notice the file, attempt to apply any remaining WAL records or files it has received, and then switch to read-write or "master" mode. When this happens, you will see a message like this in the Postgres log:<br />
<br />
PostgreSQL will also rename the recovery.conf file to recovery.done in order to prevent having the new master fail on restart. For this reason, the recovery.conf file should be owned by the same user which the server runs as (usually "postgres").<br />
<br />
The alternative to using a trigger file is to failover manually, by deleting or renaming the recovery.conf file and restarting the standby. This method is inferior because it requires a restart which would interrupt any read-only connections to the standby currently in use.<br />
<br />
In a high-availability system, the above activity should be managed automatically in order to avoid downtime. PostgreSQL itself supplies no tools to do this, but numerous third-party utilities such as "Linux heartbeat" are compatible with PostgreSQL replication.<br />
<br />
It's important to prevent the original master from restarting after failover, lest you end up with a "split brain" problem and data loss. There is a substantial body of literature on this, and third-party tools, so we won't discuss them here at this time.<br />
<br />
== Load Balancing ==<br />
<br />
== Managing Archive Logs ==<br />
<br />
== Tuning and Configuration of Binary Replication ==<br />
<br />
== Monitoring Replication ==<br />
<br />
<br />
[[Category:Replication]]</div>Jaytaylorhttps://wiki.postgresql.org/index.php?title=Binary_Replication_Tutorial&diff=18696Binary Replication Tutorial2012-12-11T20:12:39Z<p>Jaytaylor: /* What Can't You Do With Binary Replication? */</p>
<hr />
<div>Welcome to the new PostgreSQL 9 replication and standby databases guide. This new set of features implements possibly the longest awaited functionality in PostgreSQL's history. As a result, a lot of people are going to be trying to deploy standby databases for the first time, and find the process rather unintuitive. This guide is here to help.<br />
<br />
'''Work in progress: only 40% complete'''<br />
<br />
= 5 Minutes to Simple Replication =<br />
<br />
This is the easiest way to set up replication between a master and standby. It requires shutting down the master; other methods are detailed later in this guide.<br />
<br />
What we're going to do is shut down the master and copy the files we need over to the slave server, creating a cloned copy of the master. Because the master is shut down, we don't have to worry about changes being made to it.<br />
<br />
Note: Both the '5 minutes' instructions and the '10 minutes' version which follows do not deal with the complications that arise with a database that uses tablespaces, specifically what to do about the pg_tblspc directory and its contents.<br />
<br />
== Prerequisites ==<br />
<br />
You must have the right setup to make this work:<br />
<br />
* 2 servers with similar operating systems (e.g both Linux 64-bit).<br />
* The same release of PostgreSQL 9.0 installed on both servers.<br />
* PostgreSQL superuser shell access on both servers.<br />
* Knowledge of how to start, stop and reload Postgres.<br />
* PostgreSQL 9.0 running on Server1.<br />
* A database created and loaded on Server1.<br />
* A postgres user or root user who has network <br />
<br />
See the full documentation for more information:<br />
<br />
* [http://www.postgresql.org/docs/9.0/static/warm-standby.html 9.0 Replication Documentation]<br />
* [http://www.postgresql.org/docs/9.1/static/warm-standby.html 9.1 Replication Documentation]<br />
<br />
== Binary Replication in 6 Steps ==<br />
<br />
This 6-step guide, and all of the examples in this tutorial, assume that you have a master server at 192.168.0.1 and a standby server at 192.168.0.2 and that your database and its configuration files are installed at /var/lib/postgresql/data. Replace those with whatever your actual server addresses and directories are.<br />
<br />
1. Edit postgresql.conf on the master to turn on streaming replication. Change these settings:<br />
<br />
listen_address = '*'<br />
wal_level = hot_standby<br />
max_wal_senders = 3<br />
<br />
2. Edit pg_hba.conf on the master in order to let the standby connect. <br />
<br />
host replication all 192.168.0.2/32 trust<br />
<br />
3. Edit recovery.conf and postgresql.conf on the standby to start up replication and hot standby. First, in postgresql.conf, change this line:<br />
<br />
hot_standby = on<br />
<br />
Then create a file in the standby's '''data directory''' (which is often the same directory as postgresql.conf and pg_hba.conf, except on some Linux distributions such as Debian and Ubuntu), called recovery.conf, with the following lines:<br />
<br />
standby_mode = 'on'<br />
primary_conninfo = 'host=192.168.0.1'<br />
<br />
4. Shutdown the master and copy the files. You want to copy most but not all files between the two servers, excluding the configuration files and the pg_xlog directory. An example rsync script would be:<br />
<br />
rsync -av --exclude pg_xlog --exclude postgresql.conf data/* 192.168.0.2:/var/lib/postgresql/data/<br />
<br />
5. Start the standby first, so that they can't get out of sync. (Messages will be logged about not being able to connect to the primary server, that's OK.)<br />
<br />
6. Start the master.<br />
<br />
== Starting Replication with only a Quick Master Restart ==<br />
<br />
Is taking down the master for long enough to copy the files too long? Then you need the 10-minute version.<br />
<br />
What we're going to do this time is similar to what we did before, cloning the database by copying the files from the master to the slave server. However, because the database is only going to be shut down for a short period of time, long enough to activate the changes in the configuration file, after we've copied the data files we will need to copy additional files so that the slave will be an up-to-date copy of the master. <br />
<br />
So, we will tell the master we're running a backup, copy the data files (not quite the same set of files as before), tell the master the backup is complete, then copy the WAL files in the pg_xlog directory so that when the slave comes up it can make all the changes that were committed to the master database after the backup was started.<br />
<br />
First, start with the same prerequisites as above.<br />
<br />
1. Set the postgresql.conf variables the same in step (1) as above.<br />
<br />
2. Don't close the file yet. You'll need to set two other variables which control the size of your write-ahead-log (WAL). The first is wal_keep_segments, the second is checkpoint_segments. Unless you've already done so, you're going to need to increase these, which is usually a good idea for performance anyway. You want the WAL to be big enough to not get used up in 15 or 20 minutes. If you don't have a clear idea of that, here's some reasonable values, based on how busy and how large your database is. Also, a database with large blob objects may require a much larger setting. Remember, these logs will take up disk space, so make sure that you have enough available - space requirements are below.<br />
<br />
checkpoint_segments = 8 <br />
wal_keep_segments = 8 <br />
# light load 500MB<br />
<br />
checkpoint_segments = 16<br />
wal_keep_segments = 32<br />
# moderately busy 1.5GB <br />
<br />
checkpoint_segments = 64<br />
wal_keep_segments = 128<br />
# busy server 5GB<br />
<br />
You don't ''have'' to increase checkpoint_segments in order to increase wal_keep_segments, but it's generally a good idea. Now save the file. <br />
<br />
3. Edit pg_hba.conf as in (2) in the "Six Steps" above.<br />
<br />
4. Now you need to restart the master. Given the interruption in service, you should probably plan this ahead. <br />
<br />
5. Edit postgresql.conf and recovery.conf on the standby as in (3) above.<br />
<br />
6. Now, we're going to need to copy the files from the master and start the standby. Unlike in the 6-step version, this needs to be done quickly or the standby will fail to sync and you'll need to try again. First step, you need to tell the master you're starting a backup (see below for a more detailed explanation of this). Log in to psql as the database superuser.<br />
<br />
psql -U postgres<br />
# select pg_start_backup('clone',true);<br />
<br />
Note that the string you use as a backup label doesn't matter; use any string you want.<br />
<br />
7. Now, quickly copy all the database files. This rsync is slightly different from the 6-step version:<br />
<br />
rsync -av --exclude pg_xlog --exclude postgresql.conf --exclude postgresql.pid \ <br />
data/* 192.168.0.2:/var/lib/postgresql/data/<br />
<br />
8. As soon as that's done you need to stop the backup on the master:<br />
<br />
# select pg_stop_backup();<br />
<br />
9. As soon as that completes, you need to quickly copy the WAL files from the master to the standby.<br />
<br />
rsync -av data/pg_xlog 192.168.0.2:/var/lib/postgresql/data/<br />
<br />
10. Now, start the standby.<br />
<br />
If you've done this quickly enough, then the standby should catch up with the master and you should be replicating. If not, you'll get this message:<br />
<br />
(Future Revisions note: Message needs to go here)<br />
<br />
... which means you need to try again, possibly with checkpoint_segments and wal_keep_segments higher. If that still doesn't work, you're going to need to use the even more complex archiving method described below.<br />
<br />
Now, the rest of the guide will explain how to deal with more complex situations, such as archive logs, handling security, and maintaining availability, failover and standby promotion.<br />
<br />
= Introduction to Binary Replication =<br />
<br />
Binary replication is also called "Hot Standby" and "Streaming Replication" which are two separate, but complimentary, features of PostgreSQL 9.0 and later. Here's some general information about how they work and what they are for.<br />
<br />
== What Can You Do With Binary Replication? ==<br />
<br />
* Have a simple and complete replica of your production database, preventing all but a couple seconds of data loss even under catastrophic circumstances.<br />
* Load-balance between your read/write master server and multiple read-only slave servers. (Note: This means that queries that aren't read-only cannot be run on a slave server. A common misconception has to do with finding the 'current' value of a sequence on a slave server, that is not possible.)<br />
* Run reporting or other long-running queries on a replica server, taking them off your main transaction-processing server.<br />
* Replicate all DDL, including table and index changes, and even creating new databases.<br />
* Replicate a hosted multi-tenant database, making no specific requirements for primary keys or database changes of your users.<br />
<br />
== What Can't You Do With Binary Replication? ==<br />
<br />
* Replicate a specific table, schema, or database. Binary replication is the entire Postgres instance (or "cluster").<br />
* Multi-master replication. Multi-master binary replication is probably technically impossible.<br />
* Replicate between different versions of PostgreSQL, or between different platforms.<br />
* Set up replication without administration rights on the server. Sorry, working on it.<br />
* Replicate data synchronously, guaranteeing zero data loss. And ... this is here since the release of PostgreSQL 9.1!<br />
<br />
For the reasons above, we expect that Slony-I, Londiste, Bucardo, pgPool2 and other systems will continue to be used.<br />
<br />
== Transaction Logs and Log Shipping ==<br />
<br />
Users who are already familiar with the PostgreSQL transaction log and warm standby can skip this section.<br />
<br />
An individual "instance", "server", or (confusingly) "cluster" of PostgreSQL (hereafter Server) consists of a single postmaster server process connected to a single initialized PostgreSQL data directory (PGDATA), which in turn contains several databases. Each running Server has a transaction log, located in the PGDATA/pg_xlog directory. This transaction log consists of binary snapshots of data, written to record synchronously each change to all databases' data, in case of unexpected shutdown of the database server (such as in a power failure). This ensures that data is not corrupted and no completed transaction is lost.<br />
<br />
You can also use this log to allow a copy of the original database to replicate changes made to a master database. This was first implemented with the PITR feature in PostgreSQL 8.0, and is known as "log shipping". Log shipping is required for most forms of binary replication.<br />
<br />
This log consists of 16MB segments full of new data pages (8K segments) of the database, and not of SQL statements. For this reason there is no before and after auditing possible via this log, as you cannot know exactly what has changed. Also, the log is treated as a buffer, being deleted as it is no longer needed for crash recovery. More importantly, the data page format of the log means that log segments can only be applied to a database which is binary-identical to the database which created the log. <br />
<br />
== PITR, Warm Standby, Hot Standby, and Streaming Replication ==<br />
<br />
For the rest of this tutorial, we will refer to the active read-write instance of the Server which generates transaction logs as the "Master" and the passive, read-only or offline instance (or instances) of the Server which receives transaction logs as the "Standby" (or "Standbys"). The term Master/Standby is equivalent to other terminology which may be used in the database industry, such as Master/Slave, Primary/Secondary or Primary/Replica.<br />
<br />
=== PITR ===<br />
<br />
In Point-In-Time Recovery (PITR), transaction logs are copied and saved to storage until needed. Then, when needed, the Standby server can be "brought up" (made active) and transaction logs applied, either stopping when they run out or at a prior point indicated by the administrator. PITR has been available since PostgreSQL version 8.0, and as such will not be documented here.<br />
<br />
PITR is primarily used for database forensics and recovery. It is also useful when you need to back up a very large database, as it effectively supports incremental backups, which pg_dump does not.<br />
<br />
=== Warm Standby ===<br />
<br />
In Warm Standby, transaction logs are copied from the Master and applied to the Standby immediately after they are received, or at a short delay. The Standby is offline (in "recovery mode") and not available for any query workload. This allows the Standby to be brought up to full operation very quickly. Warm Standby has been available since version 8.3, and will not be fully documented here.<br />
<br />
Warm Standby requires Log Shipping. It is primary used for database failover.<br />
<br />
=== Hot Standby ===<br />
<br />
Hot Standby is identical to Warm Standby, except that the Standby is available to run read-only queries. This offers all of the advantages of Warm Standby, plus the ability to distribute some business workload to the Standby server(s). Hot Standby by itself requires Log Shipping.<br />
<br />
Hot Standby is used both for database failover, and can also be used for load-balancing. In contrast to Streaming Replication, it places no load on the master (except for disk space requirements) and is thus theoretically infinitely scalable. A WAL archive could be distributed to dozens or hundreds of servers via network storage. The WAL files could also easily be copied over a poor quality network connection, or by SFTP.<br />
<br />
However, since Hot Standby replicates by shipping 16MB logs, it is at best minutes behind and sometimes more than that. This can be problematic both from a failover and a load-balancing perspective.<br />
<br />
=== Streaming Replication ===<br />
<br />
Streaming Replication improves either Warm Standby or Hot Standby by opening a network connection between the Standby and the Master database, instead of copying 16MB log files. This allows data changes to be copied over the network almost immediately on completion on the Master. <br />
<br />
In Streaming Replication, the master and the standby have special processes called the walsender and walreceiver which transmit modified data pages over a network port. This requires one fairly busy connection per standby, imposing an incremental load on the master for each additional standby. Still, the load is quite low and a single master should be able to support multiple standbys easily.<br />
<br />
Streaming replication does not require log shipping in normal operation. It may, however, require log shipping to start replication, and can utilize log shipping in order to catch up standbys which fall behind.<br />
<br />
= How to Replicate =<br />
<br />
== Cloning a Live Database ==<br />
<br />
If your workload doesn't allow you to take the master down (and whose does?), things get a bit more complicated. You need to somehow take a "coherent snapshot" of the master, so that you don't have an inconsistent or corrupt database on the standby. Now, in some cases this can be done via filesystem snapshotting tools or similar tricks, but as that approach is tricky and platform-dependant, we're not going to cover it here.<br />
<br />
Instead, we're going to cover the built-in method, which involves keeping a log of all changes applied to the database which happen during the copying process. The steps are essentially the same, regardless of whether you're planning to use just hot standby, streaming replication, or both. There are two parts:<br />
<br />
* Cloning the database files<br />
* Copying the archive logs<br />
<br />
Unintuitive as it is, the latter needs to be set up first, so we're going to start with that.<br />
<br />
== Setting Up Archiving On The Master ==<br />
<br />
Archiving is the process of making an extra copy of each WAL file as it is completed. These log files then need to somehow be accessed by the standby. There are three basic ways to handle this, and you should decide in advance what method you're going to use:<br />
<br />
# Manually<br />
# Automatic file copying from master to standby using rsync or simiar<br />
# Writing them to a common shared network file location<br />
<br />
The first method is only appropriate if you're archiving logs only to jump-start streaming replication, and you have a fairly low-traffic database or the ability to stop all writes. The third method is probably the easiest to manage if you have an appropriate network share; it can even be used to support multiple standbys with some extra thought and scripting. All of these methods will be explained below.<br />
<br />
This needs to be turned on on the master, which if it's never been done before may require a restart (sorry, working on it), and will certainly require a reload. You'll need to set the following parameters:<br />
<br />
wal_level = hot_standby<br />
archive_mode = on<br />
archive_command = 'some command'<br />
<br />
What archive command you use depends on which archiving approach you are taking, of course. Here are three examples of commands you might use. Note that you will need to create the "archive" directories.<br />
<br />
# Manual: cp -f %p /var/lib/postgresql/data/archive/%f </dev/null<br />
# Automatic Copy: rsync -a %p 192.168.0.2:/var/lib/pgsql/data/archive/%f<br />
# Network Share: cp -f %p /shares/walarchive/archive/%f </dev/null<br />
<br />
In these commands, %p is replace by postgres at invocation time with the full path and name of the WAL file, and %f with the name of the file alone. There are more escapes and parameters dealing with WAL archiving which will be detailed later in the tutorial. Note that, in real production, you are unlikely to want to use any commands as simple as the above. In general, you will want to have archive_command call an executable script which traps errors and can be disabled. Examples of such scripts are available in this tutorial.<br />
<br />
Now, if archive_mode was originally "off" or if you had to change wal_level, you're going to need to restart the master (sorry, this will be fixed in a later version). If you just needed to change the archive_command, however, only a reload is required.<br />
<br />
Once you've restarted or reloaded, check the master's logs to make sure archiving is working. If it's failing, the master will complain extensively. You might also check that archive log files are being created; run the command "SELECT pg_switch_xlog();" as the superuser to force a new log to be written.<br />
<br />
== Setting Up Archiving on the Standby ==<br />
<br />
The standby needs to be configured to consume logs. This is simpler than the master's setup, and doesn't really change no matter what archive copying strategy you're using.<br />
<br />
== Recovery.conf ==<br />
<br />
On the standby, replication configuration is controlled through a file called, for historical reasons, recovery.conf. If this file is present in PostgreSQL's data directory when PostgreSQL is started, that server will assume it is a standby and attempt to obey it. Generally, there is an example file installed with the other PostgreSQL shared docs. However, that example file covers all of the various replication options at once, so it's often simpler to write your own file, from scratch. Any change to recovery.conf requires a restart of the standby.<br />
<br />
In recovery.conf, you need to add a command to copy the archived WAL files to the standby's on pg_xlog directory. This is the mirror image of the archive_command on the master. Generally, a simple cp command is sufficient:<br />
<br />
restore_command = 'cp -f /var/lib/postgresql/data/archive/%f %p </dev/null'<br />
restore_command = 'cp -f /shares/walarchive/%f %p </dev/null'<br />
<br />
Again, you might want to use a simple shell script which traps error messages, and, importantly, deletes archive files which are no longer needed. If you will be doing only hot standby and not using streaming replication, you probably want to compile the pg_standby binary provided in PostgreSQL's additional modules or "contrib", and use it instead:<br />
<br />
restore_command = 'pg_standby /shares/walarchive %f %p %r'<br />
<br />
More detail on pg_standby is in its documentation.<br />
<br />
== Cloning a Snapshot of the Master ==<br />
<br />
Once you have archiving working, you're ready to clone the master database. At this point, it's a simple process:<br />
<br />
# As superuser, issue the command "SELECT pg_start_backup('backup');" on the master.<br />
# Copy all of the database files to the standby.<br />
# Start the standby database.<br />
# Issue the command "SELECT pg_stop_backup();" on the master.<br />
<br />
Of course, each of those steps deserves a little more elaboration. pg_start_backup and pg_stop_backup are special commands you issue on the master in order to create, hold open, and close, a "snapshot" which is how we make sure your copy of the database is not inconsistent. They also write special files to the archive log which tell the standby when it has a complete snapshot.<br />
<br />
If you are using the "manual" method of synching the archive logs, immediately after step 4 you need to do one last rsync or copy of the archive logs to the standby.<br />
<br />
When you're done with the cloning, you should see output similar to the below:<br />
<br />
This means that you're up and replicating, and should now be able to run queries on the standby.<br />
<br />
== Failing Over To The Standby ==<br />
<br />
Of course, one of the major reasons to have a standby is in case something (planned or unplanned) causes the master server to shut down. Then you want to "fail over", or stop replication and change the standby to a full read-write master.<br />
<br />
The recommended method is the same regardless of the type of replication or standby: via "trigger file". First, you need to set a configuration option in recovery.conf on the standby:<br />
<br />
trigger_file = '/var/lib/postgresql/data/failover'<br />
<br />
Then, when it's time to fail over, you just create an empty file with that name, such as by using the "touch" command. The standby will notice the file, attempt to apply any remaining WAL records or files it has received, and then switch to read-write or "master" mode. When this happens, you will see a message like this in the Postgres log:<br />
<br />
PostgreSQL will also rename the recovery.conf file to recovery.done in order to prevent having the new master fail on restart. For this reason, the recovery.conf file should be owned by the same user which the server runs as (usually "postgres").<br />
<br />
The alternative to using a trigger file is to failover manually, by deleting or renaming the recovery.conf file and restarting the standby. This method is inferior because it requires a restart which would interrupt any read-only connections to the standby currently in use.<br />
<br />
In a high-availability system, the above activity should be managed automatically in order to avoid downtime. PostgreSQL itself supplies no tools to do this, but numerous third-party utilities such as "Linux heartbeat" are compatible with PostgreSQL replication.<br />
<br />
It's important to prevent the original master from restarting after failover, lest you end up with a "split brain" problem and data loss. There is a substantial body of literature on this, and third-party tools, so we won't discuss them here at this time.<br />
<br />
== Load Balancing ==<br />
<br />
== Managing Archive Logs ==<br />
<br />
== Tuning and Configuration of Binary Replication ==<br />
<br />
== Monitoring Replication ==<br />
<br />
<br />
[[Category:Replication]]</div>Jaytaylor