PostgreSQL wiki - User contributions [en]

Tuning Your PostgreSQL Server

2015-05-16T13:14:48Z

Natmaka: /* random_page_cost */

__NOTOC__
''by Greg Smith, Robert Treat, and Christopher Browne''

{{Languages}}

PostgreSQL ships with a basic configuration tuned for wide compatibility rather than performance. Odds are good the default parameters are very undersized for your system. Rather than get dragged into the details of everything you should eventually know (which you can find if you want it at the [http://www.pgcon.org/2008/schedule/events/104.en.html GUC Three Hour Tour]), here we're going to sprint through a simplified view of the basics, with a look at the most common things people new to PostgreSQL aren't aware of. You should click on the name of the parameter in each section to jump to the relevant documentation in the PostgreSQL manual for more details after reading the quick intro here. There is also additional information available about many of these parameters, as well as a list of parameters you shouldn't adjust, at [https://www.packtpub.com/article/server-configuration-tuning-postgresql Server Configuration Tuning].

== Background Information on Configuration Settings ==

PostgreSQL settings can be manipulated a number of different ways, but generally you will want them changed in your configuration files, either directly or, starting with PostgreSQL 9.4, through [http://www.postgresql.org/docs/current/static/sql-altersystem.html <tt>ALTER SYSTEM</tt>]. The specific options available change from release to release, the definitive list is in the source code at src/backend/utils/misc/guc.c for your version of PostgreSQL (but the pg_settings view works well enough for most purposes).

=== The types of settings ===

There are several different types of configuration settings, divided up based on the possible inputs they take

* Boolean: true, false, on, off
* Integer: Whole numbers (2112)
* Float: Decimal values (21.12)
* Memory / Disk: Integers (2112) or "computer units" (512MB, 2112GB). Avoid integers--you need to know the underlying unit to figure out what they mean.
* Time: "Time units" aka d,m,s (30s). Sometimes the unit is left out; don't do that
* Strings: Single quoted text ('pg_log')
* ENUMs: Strings, but from a specific list ('WARNING', 'ERROR')
* Lists: A comma separated list of strings ('"$user",public,tsearch2)

=== When they take effect ===

PostgreSQL settings have different levels of flexibility for when they can be changed, usually related to internal code restrictions. The complete list of levels is:

* Postmaster: requires restart of server
* Sighup: requires a HUP of the server, either by kill -HUP (usually -1), pg_ctl reload, or <tt>SELECT pg_reload_conf()</tt>;
* User: can be set within individual sessions, take effect only within that session
* Internal: set at compile time, can't be changed, mainly for reference
* Backend: settings which must be set before session start
* Superuser: can be set at runtime for the server by superusers

Most of the time you'll only use the first of these, but the second can be useful if you have a server you don't want to take down, while the user session settings can be helpful for some special situations. You can tell which type of parameter a setting is by looking at the "context" field in the pg_settings view.

=== Important notes about configuration files ===

* Command line options override postgresql.auto.conf settings override postgresql.conf settings.
* If the same setting is listed multiple times, the last one wins.
* You can figure out the postgresql.conf location with <tt>SHOW config_file</tt>. It will generally be $PGDATA/postgresql.conf (<tt>SHOW data_directory</tt>), but watch out for symbolic links, [http://www.postgresql.org/docs/current/static/app-pg-ctl.html#AEN93617 postmaster.opts] and other trickiness
* Lines with # are comments and have no effect. For a new database, this will mean the setting is using the default, but on running systems this may not hold true! Changes to the configuration files do not take effect without a reload/restart, so it's possible for the system to be running something different from what is in the file.

=== Viewing the current settings ===

* Look at the configuration files. This is generally not definitive!
* <tt>SHOW ALL</tt>, <tt>SHOW <setting></tt> will show you the current value of the setting. Watch out for session specific changes
* <tt>SELECT * FROM pg_settings</tt> will label session specific changes as locally modified

==[http://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-LISTEN-ADDRESSES listen_addresses] ==

By default, PostgreSQL only responds to connections from the local host. If you want your server to be accessible from other systems via standard TCP/IP networking, you need to change listen_addresses from its default. The usual approach is to set it to listen to all addresses like this:

<code><pre>
listen_addresses = '*'
</pre></code>

And then control who can and cannot connect via the [http://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html pg_hba.conf] file.

==[http://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS max_connections]==

max_connections sets exactly that: the maximum number of client connections allowed. This is very important to some of the below parameters (particularly work_mem) because there are some memory resources that are or can be allocated on a per-client basis, so the maximum number of clients suggests the maximum possible memory use. Generally, PostgreSQL on good hardware can support a few hundred connections. If you want to have thousands instead, you should consider using [[Replication, Clustering, and Connection Pooling|connection pooling software]] to reduce the connection overhead.

==[http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-SHARED-BUFFERS shared_buffers]==

The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. One reason the defaults are low is because on some platforms (like older Solaris versions and SGI), having large values requires invasive action like recompiling the kernel. Even on a modern Linux system, the stock kernel will likely not allow setting shared_buffers to over 32MB without adjusting kernel settings first. (PostgreSQL 9.3 and later use a different shared memory mechanism, so kernel settings will usually not have to be adjusted there.)

If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. If you have less RAM you'll have to account more carefully for how much RAM the OS is taking up; closer to 15% is more typical there. There are some workloads where even larger settings for shared_buffers are effective, but given the way PostgreSQL also relies on the operating system cache, it's unlikely you'll find using more than 40% of RAM to work better than a smaller amount.

Be aware that if your system or PostgreSQL build is 32-bit, it might not be practical to set shared_buffers above 2 ~ 2.5GB. See [http://rhaas.blogspot.jp/2011/05/sharedbuffers-on-32-bit-systems.html this blog post] for details.

Note that on Windows, large values for shared_buffers aren't as effective, and you may find better results keeping it relatively low and using the OS cache more instead. On Windows the useful range is 64MB to 512MB.

If you are running PostgreSQL 9.2 or earlier, it's likely you will have to increase the amount of memory your operating system allows you to allocate at once to set the value for shared_buffers this high. On UNIX-like systems, if you set it above what's supported, you'll get a message like this:

<code><pre>
IpcMemoryCreate: shmget(key=5432001, size=415776768, 03600) failed: Invalid argument

This error usually means that PostgreSQL's request for a shared memory
segment exceeded your kernel's SHMMAX parameter. You can either
reduce the request size or reconfigure the kernel with larger SHMMAX.
To reduce the request size (currently 415776768 bytes), reduce
PostgreSQL's shared_buffers parameter (currently 50000) and/or
its max_connections parameter (currently 12).
</pre></code>

See [http://www.postgresql.org/docs/current/static/kernel-resources.html Managing Kernel Resources] for details on how to correct this.

Changing this setting requires restarting the database. Also, this is a hard allocation of memory; the whole thing gets allocated out of virtual memory when the database starts.

==[http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE effective_cache_size]==

effective_cache_size should be set to an estimate of how much memory is available for disk caching by the operating system and within the database itself, after taking into account what's used by the OS itself and other applications. This is a guideline for how much memory you expect to be available in the OS and PostgreSQL buffer caches, not an allocation! This value is used only by the PostgreSQL query planner to figure out whether plans it's considering would be expected to fit in RAM or not. If it's set too low, indexes may not be used for executing queries the way you'd expect. The setting for shared_buffers is not taken into account here--only the effective_cache_size value is, so it should include memory dedicated to the database too.

Setting effective_cache_size to 1/2 of total memory would be a normal conservative setting, and 3/4 of memory is a more aggressive but still reasonable amount. You might find a better estimate by looking at your operating system's statistics. On UNIX-like systems, add the free+cached numbers from free or top to get an estimate. On Windows see the "System Cache" size in the Windows Task Manager's Performance tab. Changing this setting does not require restarting the database (HUP is enough).

==[http://www.postgresql.org/docs/current/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-CHECKPOINTS checkpoint_segments checkpoint_completion_target]==

PostgreSQL writes new transactions to the database in files called WAL segments that are 16MB in size. Every time checkpoint_segments worth of these files have been written, by default 3, a checkpoint occurs. Checkpoints can be resource intensive, and on a modern system doing one every 48MB will be a serious performance bottleneck. Setting checkpoint_segments to a much larger value improves that. Unless you're running on a very small configuration, you'll almost certainly be better setting this to at least 10, which also allows usefully increasing the completion target.

For more write-heavy systems, values from 32 (checkpoint every 512MB) to 256 (every 4GB) are popular nowadays. Very large settings use a lot more disk and will cause your database to take longer to recover, so make sure you're comfortable with both those things before large increases. Normally the large settings (>64/1GB) are only used for bulk loading. Note that whatever you choose for the segments, you'll still get a checkpoint at least every 5 minutes unless you also increase checkpoint_timeout (which isn't necessary on most systems).

Checkpoint writes are spread out a bit while the system starts working toward the next checkpoint. You can spread those writes out further, lowering the average write overhead, by increasing the checkpoint_completion_target parameter to its useful maximum of 0.9 (aim to finish by the time 90% of the next checkpoint is here) rather than the default of 0.5 (aim to finish when the next one is 50% done). A setting of 0 gives something similar to the behavior of obsolete versions. The main reason the default isn't just 0.9 is that you need a larger checkpoint_segments value than the default for broader spreading to work well. For lots more information on checkpoint tuning, see [http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm Checkpoints and the Background Writer] (where you'll also learn why tuning the background writer parameters is challenging to do usefully).

==[http://www.postgresql.org/docs/current/static/routine-vacuuming.html#AUTOVACUUM autovacuum]==

The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.

However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data.

==[http://www.postgresql.org/docs/current/static/runtime-config-logging.html logging]==
There are many things you can log that may or may not be important to you. You should investigate the documentation on all of the options, but here are some tips & tricks to get you started:

*pgFouine is a tool used to analyze postgresql logs for performance tuning. If you plan to use this tool, it has specific logging requirements. Please check http://pgfouine.projects.postgresql.org/

*pgFouine has been obsoleted by [http://dalibo.github.com/pgbadger PgBadger]

*[https://github.com/darold/pgcluu PgCluu] is an handy tool from the author of PgBadger, and is a PostgreSQL performances monitoring and auditing tool.

*log_destination & log_directory (& log_filename): What you set these options to is not as important as knowing they can give you hints to determine where your database server is logging to. Best practice would be to try and make this as similar as possible across your servers. Note that in some cases, the init script starting your database may be customizing the log destination in the command line used to start the database, overriding what's in the configuration files (and making it so you'll get different behavior if you run pg_ctl manually instead of using the init script).

*log_min_error_statement: You should probably make sure this is at least on error, so that you will see any SQL commands which cause an error. should be the default on recent versions.

*log_min_duration_statement: Not necessary for everyday use, but this can generate [[Logging Difficult Queries|logs of "slow queries"]] on your system.

*log_line_prefix: Appends information to the start of each line. A good generic recommendation is '%t:%r:%u@%d:[%p]: ' : %t=timestamp, %u=db user name, %r=host connecting from, %d=database connecting to, %p=PID of connection. It may not be obvious what the PID is useful at first, but it can be vital for trying to troubleshoot problems in the future so better to put in the logs from the start.

*log_statement: Choices of none, ddl, mod, all. Using all in production leads to severe performance penalties. DDL can sometime be helpful to discover rogue changes made outside of your recommend processes, by "cowboy DBAs" for example.

==[http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET default_statistics_target]==

The database software collects statistics about each of the tables in your database to decide how to execute queries against it. If you're not getting good execution query plans particularly on larger (or more varied) tables you should increase default_statistics_target then ANALYZE the database again (or wait for autovacuum to do it for you).

;PostgreSQL 8.4 and later

The starting default_statistics_target value was raised from 10 to 100 in PostgreSQL 8.4. Increases beyond 100 may still be useful, but this increase makes for greatly improved statistics estimation in the default configuration. The maximum value for the parameter was also increased from 1000 to 10,000 in 8.4.

==[http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-WORK-MEM work_mem]==

If you do a lot of complex sorts, and have a lot of memory, then increasing the <code>work_mem</code> parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents.

This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem. You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory.

[http://www.postgresql.org/docs/9.3/static/runtime-config-logging.html#GUC-LOG-TEMP-FILES log_temp_files] can be used to log sorts, hashes, and temp files which can be useful in figuring out if sorts are spilling to disk instead of fitting in memory. You can see sorts spilling to disk using <code>EXPLAIN ANALYZE</code> plans as well. For example, if you see a line like <code>Sort Method: external merge Disk: 7526kB</code> in the output of EXPLAIN ANALYZE, a <code>work_mem</code> of at least 8MB would keep the intermediate data in memory and likely improve the query response time.

==[http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM maintenance_work_mem]==

Specifies the maximum amount of memory to be used by maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. It defaults to 16 megabytes (16MB). Since only one of these operations can be executed at a time by a database session, and an installation normally doesn't have many of them running concurrently, it's safe to set this value significantly larger than work_mem. Larger settings might improve performance for vacuuming and for restoring database dumps.

==[http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-SYNC-METHOD wal_sync_method wal_buffers]==

After every transaction, PostgreSQL forces a commit to disk out to its write-ahead log. This can be done a couple of ways, and on some platforms the other options are considerably faster than the conservative default. open_sync is the most common non-default setting switched to, on platforms that support it but default to one of the fsync methods. See [http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm Tuning PostgreSQL WAL Synchronization] for a lot of background on this topic. Note that open_sync writing is buggy on some platforms (such as [http://lwn.net/Articles/350219/ Linux]), and you should (as always) do plenty of tests under a heavy write load to make sure that you haven't made your system less stable with this change. [[Reliable Writes]] contains more information on this topic.

Linux kernels starting with version 2.6.33 will cause earlier versions of PostgreSQL to default to wal_sync_method=open_datasync; before that kernel release the default picked was always fdatasync. This can cause a significant performance decrease when combined with small writes and/or small values for wal_buffers.

Increasing wal_buffers from its tiny default of a small number of kilobytes is helpful for write-heavy systems. Benchmarking generally suggests that just increasing to 1MB is enough for some large systems, and given the amount of RAM in modern servers allocating a full WAL segment (16MB, the useful upper-limit here) is reasonable. Changing wal_buffers requires a database restart.

;PostgreSQL 9.1 and later

Starting with PostgreSQL 9.1 wal_buffers defaults to being 1/32 of the size of shared_buffers, with an upper limit of 16MB (reached when shared_buffers=512MB).

PostgreSQL 9.1 also changes the logic for selecting the default wal_sync_method such that on newer Linux kernels, it will still select fdatasync as its method--the same as on older Linux versions.

==[http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION constraint_exclusion]==

<tt>constraint_exclusion</tt> now defaults to a new choice: <tt>partition</tt>. This will only enable constraint exclusion for partitioned tables which is the right thing to do in nearly all cases.

==[http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-MAX-PREPARED-TRANSACTIONS max_prepared_transactions]==

This setting is used for managing 2 phase commit. If you do not use two phase commit (and if you don't know what it is, you don't use it), then you can set this value to 0. That will save a little bit of shared memory. For database systems with a large number (at least hundreds) of concurrent connections, be aware that this setting also affects the number of available lock-slots in pg_locks, so you may want to leave it at the default setting. There is a formula for how much memory gets allocated [http://www.postgresql.org/docs/current/static/kernel-resources.html#SHARED-MEMORY-PARAMETERS in the docs] and in the default postgresql.conf.

Changing max_prepared_transactions requires a server restart.

==[http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT synchronous_commit]==
PostgreSQL can only safely use a write cache if it has a battery backup. See [http://www.postgresql.org/docs/current/static/wal-reliability.html WAL reliability] for an essential introduction to this topic. No, really; go read that right now, it's vital to understand that if you want your database to work right.

You may be limited to approximately 100 transaction commits per second per client in situations where you don't have such a durable write cache (and perhaps only 500/second even with lots of clients).

For situations where a small amount of data loss is acceptable in return for a large boost in how many updates you can do to the database per second, consider switching synchronous commit off. This is particularly useful in the situation where you do not have a battery-backed write cache on your disk controller, because you could potentially get thousands of commits per second instead of just a few hundred.

For obsolete versions of PostgreSQL, you may find people recommending that you set ''fsync=off'' to speed up writes on busy systems. This is dangerous--a power loss could result in your database getting corrupted and not able to start again. Synchronous commit doesn't introduce the risk of ''corruption'', which is really bad, just some risk of data ''loss''.

==[http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-RANDOM-PAGE-COST random_page_cost]==
This setting suggests to the optimizer how long it will take your disks to seek to a random disk page, as a multiple of how long a sequential read (with a cost of 1.0) takes. If you have particularly fast disks, as commonly found with RAID arrays of SCSI disks, it may be appropriate to lower random_page_cost, which will encourage the query optimizer to use random access index scans. Some feel that 4.0 is always too large on current hardware; it's not unusual for administrators to standardize on always setting this between 2.0 and 3.0 instead. In some cases that behavior is a holdover from earlier PostgreSQL versions where having random_page_cost too high was more likely to screw up plan optimization than it is now (and setting at or below 2.0 was regularly necessary). Since these cost estimates are just that--estimates--it shouldn't hurt to try lower values.

But this not where you should start to search for plan problems. Note that random_page_cost is pretty far down this list (at the end in fact). If you are getting bad plans, this shouldn't be the first thing you look at, even though lowering this value may be effective. Instead, you should start by making sure autovacuum is working properly, that you are collecting enough statistics, and that you have correctly sized the memory parameters for your server--all the things gone over above. After you've done all those much more important things, if you're still getting bad plans ''then'' you should see if lowering random_page_cost is still useful.

[[Category:Administration]] [[Category:Performance]]

User talk:Accidentinjury

2014-04-22T06:45:03Z

Natmaka: spam

User:Accidentinjury

2014-04-22T06:37:05Z

Natmaka: spam

Synchronous replication

2012-12-30T17:51:02Z

Natmaka: /* CODE */

Synchronous replication is available starting in PostgreSQL 9.1 by enabling the [http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html#GUC-SYNCHRONOUS-STANDBY-NAMES synchronous_standby_names] parameter. It includes user-controlled durability specified on the master using the [http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT synchronous_commit] parameter. The design also provides high throughput by allowing concurrent processes to handle the WAL stream.

=WHAT'S DIFFERENT ABOUT THIS PATCH?=

The implementation in 9.1 includes several innovations, beyond [http://wiki.postgresql.org/wiki/Streaming_Replication Fujii Masao's work] providing an earlier synchronous replication implementation for PostgreSQL 9.0:

* Low complexity of code on Standby
* User control: All decisions to wait take place on master, allowing fine-grained control of synchronous replication. Max replication level can also be set on the standby.
* Low bandwidth: Very small response packet size with no increase in number of responses when system is under high load means very little additional bandwidth required
* Performance: Standby processes work concurrently to give good overall throughput on standby and minimal latency in all modes. 4 performance options don't interfere with each other, so offer different levels of performance/durability alongside each other.

These are major wins for PostgreSQL project over and above the basic sync rep feature.

=SYNCHRONOUS REPLICATION OVERVIEW=

Synchronous replication offers the guarantee that all changes made by a
transaction have been transferred to remote standby nodes. This is an
extension to the standard level of durability offered by a transaction
commit.

When synchronous replication is requested the transaction will wait
after it commits until it receives confirmation that the transfer has
been successful. Waiting for confirmation increases the user's certainty
that the transfer has taken place but it also necessarily increases the
response time for the requesting transaction. Synchronous replication
usually requires carefully planned and placed standby servers to ensure
applications perform acceptably. Waiting doesn't utilise system
resources, but transaction locks continue to be held until the transfer
is confirmed. As a result, incautious use of synchronous replication
will lead to reduced performance for database applications.

It may seem that there is a simple choice between durability and
performance. However, there is often a close relationship between the
importance of data and how busy the database needs to be, so this is
seldom a simple choice. With this patch, PostgreSQL now provides a range
of features designed to allow application architects to design a system
that has both good overall performance and yet good durability of the
most important data assets.

PostgreSQL allows the application designer to specify the durability
level required via replication. This can be specified for the system
overall, though it can also be specified for individual transactions.
This allows to selectively provide highest levels of protection for
critical data.

For example we, an application might consist of two types of work:
* 10% of changes are changes to important customer details
* 90% of changes are less important data that the business can more easily survive if it is lost, such as chat messages between users.

With sync replication options specified at the application level (on the
master) we can offer sync rep for the most important changes, without
slowing down the bulk of the total workload. Application level options
are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications.

Without sync rep options specified at app level, we would have a choice
of either slowing down 90% of the workload because 10% of it is
important. Or giving up our durability goals because of performance. Or
splitting those two functions onto separate database servers so that we
can set options differently on each. None of those 3 options is truly
attractive.

PostgreSQL also allows the system administrator the ability to specify
the service levels offered by standby servers. This allows multiple
standby servers to work together in various roles within a server farm.

''Note: the information about the parameters used here reflects and earlier version of this feature, and needs to be updated to reflect the form it was committed into 9.1 as''

Control of this feature relies on just 3 parameters:
On the master we can set

* synchronous_replication
* synchronous_replication_timeout

On the standby we can set

* synchronous_replication_service

These are explained in more detail in the following sections.

=USER'S OVERVIEW=

Two new USERSET parameters on the master control this
* synchronous_replication = async (default) | recv | fsync | apply
* synchronous_replication_timeout = 0+ (0 means never timeout)
(default timeout 10sec)

synchronous_replication = async is the default and means that no
synchronisaton is requested and so the commit will not wait. This is the
fastest setting. The word async is short for "asynchronous" and you may
see the term asynchronous replication discussed.

Other settings refer to progressively higher levels of durability. The
higher the level of durability requested, the longer the wait for that
level of durability to be achieved.

The precise meaning of the synchronous_replication settings is
* async - commit does not wait for a standby before replying to user
* recv - commit waits until standby has received WAL
* fsync - commit waits until standby has received and fsynced WAL
* apply - commit waits until standby has received, fsynced and applied
This provides a simple, easily understood mechanism - and one that in
its default form is very similar to other RDBMS (e.g. Oracle).

Note that in apply mode it is possible that the changes could be
accessible on the standby before the transaction that made the change
has been notified that the change is complete. Minor issue.

Network delays may occur and the standby may also crash. If no reply is
received within the timeout we raise a NOTICE and then return successful
commit (no other action is possible). Note that it is possible to
request that we never timeout, so if no standby is available we wait for
it one to appear.

When user commits, if the master does not have a currently connected
standby offering the required level of replication it will pick the next
best available level of replication. It is up to the sysadmin to provide
sufficient range of standby nodes to ensure at least one is available to
meet the requested service levels.

If multiple standbys exist, the first standby to reply that the desired
level of durability has been achieved will release the waiting commit on
the master. Other options are available also via a plugin.

==ADMINISTRATOR'S OVERVIEW==

On the standby we specify the highest type of replication service
offered by this standby server. This information is passed to the master
server when the standby connects for replication.

This allows sysadmins to designate preferred standbys. It also allows
sysadmins to completely refuse to offer a synchronous replication
service, allowing a master to explicitly avoid synchronisation across
low bandwidth or high latency links.

An additional parameter can be set in recovery.conf on the standby

* synchronous_replication_service = async (def) | recv | fsync | apply

= IMPLEMENTATION =

Some aspects can be changed without significantly altering basic
proposal, for example master-specified standby registration wouldn't
really alter this very much.

== STANDBY ==

Master-controlled sync rep means that all user wait logic is centred on
the master. The details of sync rep requests on the master are not sent
to the standby, so there is no additional master to standby traffic nor
standby-side bookkeeping overheads. It also reduces complexity of
standby code.

On the standby side the WAL Writer now operates during recovery. This
frees the WALReceiver to spend more time sending and receiving messages,
thereby minimising latency for users choosing the "recv" option. We now
have 3 processes handling WAL in an asynchronous pipeline: WAL Receiver
reads WAL data from the libpq connection then writes it to the WAL file,
the WAL Writer then fsyncs the WAL file and then the Startup process
replays the WAL. These processes act independently, so WAL pointers
(LSNs) are defined as WALReceiverLSN >= WALWriterLSN >= StartupLSN

For each new message WALReceiver gets from master we issue a reply. Each
reply sends the current state of the 3 LSNs, so the reply message size
is only 28 bytes. Replies are sent half-duplex, i.e. we don't reply
while a new message is arriving.

Note that there is absolutely not one reply per transaction on the
master. The standby knows nothing about what has been requested on the
master - replies always refer to the latest standby state and
effectively batch the responses.

We act according to the requested synchronous_replication_service
* async - no replies are sent
* recv - replies are sent upon receipt only
* fsync - replies are sent upon receipt and following fsync only
* apply - replies are sent following receipt, fsync and apply.

Replies are sent at the next available opportunity.

In apply mode, when the WALReceiver is completely quiet this means we
send 3 reply messages - one at recv, one at fsync and one at apply. When
WALreceiver is busy the volume of messages does *not* increase since the
reply can't be sent until the current incoming message has been
received, after which we were going to reply anyway so it is not an
additional message. This means we piggyback an "apply" response onto a
later "recv" reply. As a result we get minimum response times in *all*
modes and maximum throughput is not impaired at all.

When each new messages arrives from master the WALreceiver will write
the new data to the WAL file, wake the WALwriter and then reply. Each
new message from master receives a reply. If no further WAL data has
been received the WALreceiver waits on the latch. If the WALReceiver is
woken by WALWriter or Startup then it will reply to master with a
message, even if no new WAL has been received.

So in both recv, fsync and apply cases a message as soon as possible to
master, so in all cases the wait time is minimised.

When WALwriter is woken it sees if there is outstanding WAL data and if
so fsyncs it and wakes both WALreceiver and Startup. When no WAL remains
it waits on the latch.

Startup process will wake WALreceiver when it has got to the end of the
latest chunk of WAL. If no further WAL is available then it waits on its
latch.

== MASTER ==

When user backends request sync rep they wait in a queue ordered by
requested LSN. A separate queue exists for each request mode.

WALSender receives the 3 LSNs from the standby. It then wakes backends
in sequence from each queue.

We provide a single wakeup rule: first WALSender to reply with the
requested XLogRecPtr will wake the backend. This guarantees that the WAL
data for the commit is transferred as requested to at least one standby.
That is sufficient for the use cases we have discussed.

More complex wakeup rules would be possible via a plugin.

Wait timeout would be set by individual backends with a timer, just as
we do for statement_timeout.

= CODE =

Total code to implement this is low. Breaks down into 5 areas
* Zoltan's libpq changes, included almost verbatim; fairly modular, so easy to replace with something we like better
* A new module syncrep.c and syncrep.h handle the backend wait/wakeup
* Light changes to allow streaming rep to make appropriate calls
* Small amount of code to allow WALWriter to be active in recovery
* Parameter code
No docs yet.

The patch works on top of latches, though does not rely upon them for
its bulk performance characteristics. Latches only improve response time
for very low transaction rates; latches provide no additional throughput
for medium to high transaction rates.

= PERFORMANCE ANALYSIS =

Since we reply to each new chunk sent from master, "recv" mode has
absolutely minimal latency, especially since WALreceiver no longer
performs majority of fsyncs, as in 9.0 code. WALreceiver does not wait
for fsync or apply actions to complete before we reply, so fsync and
apply modes will always wait at least 2 standby->master messages which
is appropriate because those actions will typically occur much later.

This response mechanism offers highest responsive performance achievable
in "recv" mode and very good throughput under load. Note that the
different modes do not interfere with each other and can co-exist
happily while providing highest performance.

Starting WALWriter is helpful, no matter what the
synchronous_replication_service specified.

Can we optimise the sending of reply messages so that only chunks that
contain a commit deserve a reply? We could, but then we'd need to do
extra work on the master to do bookkeeping of that. It would need to be
demonstrated that there is a performance issue big enough to be worth
the overhead on master and extra code.

Is there an optimisation from reducing the number of options the standby
provides? The architecture on the standby side doesn't rely heavily on
the service level specified, nor does it rely in any way on the actual
sync rep mode specified on master. No further simplification is
possible.

= NOT YET IMPLEMENTED =

* Timeout code & NOTICE
* Code and test plugin
* Loops in walsender, walwriter and receiver treat shutdown incorrectly

I haven't yet looked at Fujii's code for this, not even sure where it
is, though hope to do so in the future. Zoltan's libpq code is the only
part of that patch used.

So far I have spent 3.5 days on this and expect to complete tomorrow. I
think that throws out the argument that this proposal is too complex to
develop in this release.

= OTHER ISSUES =

* How should master behave when we shut it down?
* How should standby behave when we shut it down?

[[Category:Replication]]

Synchronous replication

2012-12-30T17:50:21Z

Natmaka: /* SYNCHRONOUS REPLICATION OVERVIEW */

Synchronous replication is available starting in PostgreSQL 9.1 by enabling the [http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html#GUC-SYNCHRONOUS-STANDBY-NAMES synchronous_standby_names] parameter. It includes user-controlled durability specified on the master using the [http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT synchronous_commit] parameter. The design also provides high throughput by allowing concurrent processes to handle the WAL stream.

=WHAT'S DIFFERENT ABOUT THIS PATCH?=

The implementation in 9.1 includes several innovations, beyond [http://wiki.postgresql.org/wiki/Streaming_Replication Fujii Masao's work] providing an earlier synchronous replication implementation for PostgreSQL 9.0:

* Low complexity of code on Standby
* User control: All decisions to wait take place on master, allowing fine-grained control of synchronous replication. Max replication level can also be set on the standby.
* Low bandwidth: Very small response packet size with no increase in number of responses when system is under high load means very little additional bandwidth required
* Performance: Standby processes work concurrently to give good overall throughput on standby and minimal latency in all modes. 4 performance options don't interfere with each other, so offer different levels of performance/durability alongside each other.

These are major wins for PostgreSQL project over and above the basic sync rep feature.

=SYNCHRONOUS REPLICATION OVERVIEW=

Synchronous replication offers the guarantee that all changes made by a
transaction have been transferred to remote standby nodes. This is an
extension to the standard level of durability offered by a transaction
commit.

When synchronous replication is requested the transaction will wait
after it commits until it receives confirmation that the transfer has
been successful. Waiting for confirmation increases the user's certainty
that the transfer has taken place but it also necessarily increases the
response time for the requesting transaction. Synchronous replication
usually requires carefully planned and placed standby servers to ensure
applications perform acceptably. Waiting doesn't utilise system
resources, but transaction locks continue to be held until the transfer
is confirmed. As a result, incautious use of synchronous replication
will lead to reduced performance for database applications.

It may seem that there is a simple choice between durability and
performance. However, there is often a close relationship between the
importance of data and how busy the database needs to be, so this is
seldom a simple choice. With this patch, PostgreSQL now provides a range
of features designed to allow application architects to design a system
that has both good overall performance and yet good durability of the
most important data assets.

PostgreSQL allows the application designer to specify the durability
level required via replication. This can be specified for the system
overall, though it can also be specified for individual transactions.
This allows to selectively provide highest levels of protection for
critical data.

For example we, an application might consist of two types of work:
* 10% of changes are changes to important customer details
* 90% of changes are less important data that the business can more easily survive if it is lost, such as chat messages between users.

With sync replication options specified at the application level (on the
master) we can offer sync rep for the most important changes, without
slowing down the bulk of the total workload. Application level options
are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications.

Without sync rep options specified at app level, we would have a choice
of either slowing down 90% of the workload because 10% of it is
important. Or giving up our durability goals because of performance. Or
splitting those two functions onto separate database servers so that we
can set options differently on each. None of those 3 options is truly
attractive.

PostgreSQL also allows the system administrator the ability to specify
the service levels offered by standby servers. This allows multiple
standby servers to work together in various roles within a server farm.

''Note: the information about the parameters used here reflects and earlier version of this feature, and needs to be updated to reflect the form it was committed into 9.1 as''

Control of this feature relies on just 3 parameters:
On the master we can set

* synchronous_replication
* synchronous_replication_timeout

On the standby we can set

* synchronous_replication_service

These are explained in more detail in the following sections.

=USER'S OVERVIEW=

Two new USERSET parameters on the master control this
* synchronous_replication = async (default) | recv | fsync | apply
* synchronous_replication_timeout = 0+ (0 means never timeout)
(default timeout 10sec)

synchronous_replication = async is the default and means that no
synchronisaton is requested and so the commit will not wait. This is the
fastest setting. The word async is short for "asynchronous" and you may
see the term asynchronous replication discussed.

Other settings refer to progressively higher levels of durability. The
higher the level of durability requested, the longer the wait for that
level of durability to be achieved.

The precise meaning of the synchronous_replication settings is
* async - commit does not wait for a standby before replying to user
* recv - commit waits until standby has received WAL
* fsync - commit waits until standby has received and fsynced WAL
* apply - commit waits until standby has received, fsynced and applied
This provides a simple, easily understood mechanism - and one that in
its default form is very similar to other RDBMS (e.g. Oracle).

Note that in apply mode it is possible that the changes could be
accessible on the standby before the transaction that made the change
has been notified that the change is complete. Minor issue.

Network delays may occur and the standby may also crash. If no reply is
received within the timeout we raise a NOTICE and then return successful
commit (no other action is possible). Note that it is possible to
request that we never timeout, so if no standby is available we wait for
it one to appear.

When user commits, if the master does not have a currently connected
standby offering the required level of replication it will pick the next
best available level of replication. It is up to the sysadmin to provide
sufficient range of standby nodes to ensure at least one is available to
meet the requested service levels.

If multiple standbys exist, the first standby to reply that the desired
level of durability has been achieved will release the waiting commit on
the master. Other options are available also via a plugin.

==ADMINISTRATOR'S OVERVIEW==

On the standby we specify the highest type of replication service
offered by this standby server. This information is passed to the master
server when the standby connects for replication.

This allows sysadmins to designate preferred standbys. It also allows
sysadmins to completely refuse to offer a synchronous replication
service, allowing a master to explicitly avoid synchronisation across
low bandwidth or high latency links.

An additional parameter can be set in recovery.conf on the standby

* synchronous_replication_service = async (def) | recv | fsync | apply

= IMPLEMENTATION =

Some aspects can be changed without significantly altering basic
proposal, for example master-specified standby registration wouldn't
really alter this very much.

== STANDBY ==

Master-controlled sync rep means that all user wait logic is centred on
the master. The details of sync rep requests on the master are not sent
to the standby, so there is no additional master to standby traffic nor
standby-side bookkeeping overheads. It also reduces complexity of
standby code.

On the standby side the WAL Writer now operates during recovery. This
frees the WALReceiver to spend more time sending and receiving messages,
thereby minimising latency for users choosing the "recv" option. We now
have 3 processes handling WAL in an asynchronous pipeline: WAL Receiver
reads WAL data from the libpq connection then writes it to the WAL file,
the WAL Writer then fsyncs the WAL file and then the Startup process
replays the WAL. These processes act independently, so WAL pointers
(LSNs) are defined as WALReceiverLSN >= WALWriterLSN >= StartupLSN

For each new message WALReceiver gets from master we issue a reply. Each
reply sends the current state of the 3 LSNs, so the reply message size
is only 28 bytes. Replies are sent half-duplex, i.e. we don't reply
while a new message is arriving.

Note that there is absolutely not one reply per transaction on the
master. The standby knows nothing about what has been requested on the
master - replies always refer to the latest standby state and
effectively batch the responses.

We act according to the requested synchronous_replication_service
* async - no replies are sent
* recv - replies are sent upon receipt only
* fsync - replies are sent upon receipt and following fsync only
* apply - replies are sent following receipt, fsync and apply.

Replies are sent at the next available opportunity.

In apply mode, when the WALReceiver is completely quiet this means we
send 3 reply messages - one at recv, one at fsync and one at apply. When
WALreceiver is busy the volume of messages does *not* increase since the
reply can't be sent until the current incoming message has been
received, after which we were going to reply anyway so it is not an
additional message. This means we piggyback an "apply" response onto a
later "recv" reply. As a result we get minimum response times in *all*
modes and maximum throughput is not impaired at all.

When each new messages arrives from master the WALreceiver will write
the new data to the WAL file, wake the WALwriter and then reply. Each
new message from master receives a reply. If no further WAL data has
been received the WALreceiver waits on the latch. If the WALReceiver is
woken by WALWriter or Startup then it will reply to master with a
message, even if no new WAL has been received.

So in both recv, fsync and apply cases a message as soon as possible to
master, so in all cases the wait time is minimised.

When WALwriter is woken it sees if there is outstanding WAL data and if
so fsyncs it and wakes both WALreceiver and Startup. When no WAL remains
it waits on the latch.

Startup process will wake WALreceiver when it has got to the end of the
latest chunk of WAL. If no further WAL is available then it waits on its
latch.

== MASTER ==

When user backends request sync rep they wait in a queue ordered by
requested LSN. A separate queue exists for each request mode.

WALSender receives the 3 LSNs from the standby. It then wakes backends
in sequence from each queue.

We provide a single wakeup rule: first WALSender to reply with the
requested XLogRecPtr will wake the backend. This guarantees that the WAL
data for the commit is transferred as requested to at least one standby.
That is sufficient for the use cases we have discussed.

More complex wakeup rules would be possible via a plugin.

Wait timeout would be set by individual backends with a timer, just as
we do for statement_timeout.

= CODE =

Total code to implement this is low. Breaks down into 5 areas
* Zoltan's libpq changes, included almost verbatim; fairly modular, so
easy to replace with something we like better
* A new module syncrep.c and syncrep.h handle the backend wait/wakeup
* Light changes to allow streaming rep to make appropriate calls
* Small amount of code to allow WALWriter to be active in recovery
* Parameter code
No docs yet.

The patch works on top of latches, though does not rely upon them for
its bulk performance characteristics. Latches only improve response time
for very low transaction rates; latches provide no additional throughput
for medium to high transaction rates.

= PERFORMANCE ANALYSIS =

Since we reply to each new chunk sent from master, "recv" mode has
absolutely minimal latency, especially since WALreceiver no longer
performs majority of fsyncs, as in 9.0 code. WALreceiver does not wait
for fsync or apply actions to complete before we reply, so fsync and
apply modes will always wait at least 2 standby->master messages which
is appropriate because those actions will typically occur much later.

This response mechanism offers highest responsive performance achievable
in "recv" mode and very good throughput under load. Note that the
different modes do not interfere with each other and can co-exist
happily while providing highest performance.

Starting WALWriter is helpful, no matter what the
synchronous_replication_service specified.

Can we optimise the sending of reply messages so that only chunks that
contain a commit deserve a reply? We could, but then we'd need to do
extra work on the master to do bookkeeping of that. It would need to be
demonstrated that there is a performance issue big enough to be worth
the overhead on master and extra code.

Is there an optimisation from reducing the number of options the standby
provides? The architecture on the standby side doesn't rely heavily on
the service level specified, nor does it rely in any way on the actual
sync rep mode specified on master. No further simplification is
possible.

= NOT YET IMPLEMENTED =

* Timeout code & NOTICE
* Code and test plugin
* Loops in walsender, walwriter and receiver treat shutdown incorrectly

I haven't yet looked at Fujii's code for this, not even sure where it
is, though hope to do so in the future. Zoltan's libpq code is the only
part of that patch used.

So far I have spent 3.5 days on this and expect to complete tomorrow. I
think that throws out the argument that this proposal is too complex to
develop in this release.

= OTHER ISSUES =

* How should master behave when we shut it down?
* How should standby behave when we shut it down?

[[Category:Replication]]

Priorities

2012-12-02T17:17:14Z

Natmaka: /* Is CPU really the bottleneck? */

== Prioritizing users, queries, or databases ==

PostgreSQL has no facilities to limit what resources a particular user, query, or database consumes, or correspondingly to set priorities such that one user/query/database gets more resources than others. It's necessary to use operating system facilities to achieve what limited prioritization is possible.

There are three main resources that PostgreSQL users, queries, and databases will contend for:

* Memory
* CPU
* Disk I/O

Of these, disk I/O is commonly a bottleneck for database applications, but that's not always the case. Some schema designs and queries are particularly CPU heavy. Others really benefit from having lots of memory to work with, typically for sorting.

== Are priorities really the problem? ==

Before struggling too much with prioritizing your queries/users/databases, it's worthwhile to optimize your queries and [[Tuning_Your_PostgreSQL_Server|tune your database]]. You may find that you can get perfectly acceptable performance without playing with priorities or taking extreme measures, using techniques such as:

* [[Using_EXPLAIN|Improving your queries]]
* Tune autovacuum to reduce bloat
* [[Performance_Optimization|Generally polishing your cluster's performance]]
* Avoiding use of [[VACUUM FULL]]. That can lead to bloated indexes that eat lots of memory and take forever to scan, wasting disk I/O bandwidth. See the wiki page on [[VACUUM FULL]] for more information.

=== Is CPU really the bottleneck? ===

People often complain of pegged (100%) CPU and assume that's the cause of database slowdowns. That's not necessarily the case - a system may show an apparent 100% CPU use, but in fact be mainly limited by I/O bandwidth. Consider the following test, which starts 20 `dd' processes, each reading a different 1GB block from the hard disk (here: /dev/md0) at 1GB offsets (run the test as root).

<pre>
sync ; sh -c "echo 3 > /proc/sys/vm/drop_caches
for i in `seq 1 20`; do
( dd if=/dev/md0 bs=1M count=1000 skip=$(($i * 1000)) of=/dev/null &)
done
</pre>

results in `top' output of:

<pre>
top - 14:51:55 up 3 days, 2:09, 5 users, load average: 10.92, 4.94, 2.93
Tasks: 259 total, 3 running, 256 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.6%us, 15.0%sy, 0.0%ni, 0.0%id, 78.6%wa, 0.8%hi, 4.0%si, 0.0%st
Mem: 4055728k total, 3843408k used, 212320k free, 749448k buffers
Swap: 2120544k total, 4144k used, 2116400k free, 2303356k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
33 root 15 -5 0 0 0 R 5 0.0 0:26.67 kswapd0
904 root 20 0 4152 1772 628 D 5 0.0 0:00.62 dd
874 root 20 0 4152 1768 628 D 3 0.0 0:00.74 dd
908 root 20 0 4152 1768 628 D 3 0.0 0:00.80 dd
888 root 20 0 4152 1772 628 D 3 0.0 0:00.44 dd
906 root 20 0 4152 1772 628 D 3 0.0 0:00.56 dd
894 root 20 0 4152 1768 628 D 2 0.0 0:00.49 dd
902 root 20 0 4152 1772 628 D 2 0.0 0:00.46 dd
.... etc ....
</pre>

... which could be confused for a busy CPU, but is really load caused by disk I/O. The key warning sign here is the presence of a high iowait cpu percentage ("%wa"), indicating that much of the apparent load is actually caused by delays in the I/O subsystem. Most of the `dd' processes are in 'D' state - ie uninterruptable sleep in a system call - and if you check "wchan" with "ps" you'll see that they're sleeping waiting for I/O.

Rather than assuming that CPU contention is the issue, it's a good idea to use the available [[Performance Analysis Tools]] to get a better idea of where your system bottlenecks really are.

== Prioritizing CPU ==

For adjusting the CPU priority of PostgreSQL processes, you can use "renice" (on UNIX systems), but it's a bit clumsy to do since you need to "renice" the backend of interest, not the client program connected to that backend. You can get the backend process id using the SQL query "SELECT pg_backend_pid()" or by looking at the pg_stat_activity view.

One significant limitation of "renice", or any approach based on the setpriority() call, is that on most UNIX-like platforms one must be root to lower the numerical priority value (i.e. schedule the process to run more urgently) of a process.

Increasing the priority of important backends, via a root user's call to "renice", instead of lowering the priority of unimportant ones, may be more effective.

== prioritize module ==
The [http://pgxn.org/dist/prioritize/ prioritize] extension lets users adjust the CPU priority, in the same way that "renice" does, via the SQL function set_backend_priority(). Normal users may increase the priority value of any backend process running under the same username. Superusers may increase the priority value of any backend process. Just like with using "renice" manually, it is not possible to lower a backend's priority value, since PostgreSQL will not be running as the "root" user.

If you know your application will be running an unimportant CPU-heavy query, you could have it call set_backend_priority(pg_backend_pid(), 20) after installing the "prioritize" module, so that the process is scheduled for the lowest possible urgency.

== Prioritizing I/O ==

I/O is harder. Some operating systems offer I/O priorities for
processes, like Linux's ionice, and you'd think you could use these in a
similar way to how you use 'nice'. Unfortunately, that won't work particularly well,
because a lot of the work PostgreSQL does - especially disk writes - are
done via a separate background writer process working from memory shared
by all backends. Similarly, the write-ahead logs are managed by their
own process via shared memory. Because of those two, it's very hard to effectively give one
user priority over another for writes. ionice should be moderately
effective for reads, though.

As with "nice", effective control on a per-connection level will require the addition of appropriate helper
functions, and user co-operation is required to achieve per-user priorities.

Better separation of I/O workloads will require [[Prioritizing_databases_by_separating_into_multiple_clusters|cluster separation]], which has its own costs and is only effective on the per-database level.

== Prioritizing memory ==

PostgreSQL does have some [[Tuning_Your_PostgreSQL_Server|tunable parameters]] for memory use that are per-client, particularly <code>[http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-WORK-MEM work_mem]</code> and <code>[http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM maintenance_work_mem]</code>. These may be set within a given connection to allow that backend to use more than the usual amount of memory for things like sorts and index creation. You can set these to conservative, low values in <code>postgresql.conf</code> then use the <code>SET</code> command to assign higher values to them for a particular backend, eg <code>SET work_mem = '100MB';</code>.

You can set different values for <code>work_mem</code> and <code>maintenance_work_mem</code> using per-user GUC variables. For example:

<pre>
ALTER USER myuser SET work_mem = '50MB';
</pre>

You cannot affect the shared memory allocation done with settings like shared_buffers this way, that value is fixed at database startup time and can't be changed without restarting it.

There's no easy way in most operating systems to prioritize memory allocations, so that for example the OS would prefer to swap one backend's memory out instead of another's.

== External links ==

* [http://www.cs.cmu.edu/~harchol/Papers/actual-icde-submission.pdf CMU article studying CPU priorities on Postgres and DB2 on TPC-C and TPC-W workloads]

== Credits ==
Page initially by [[User:Ringerc|Ringerc]] 02:34, 26 November 2009 (UTC)

[[Category:FAQ]] [[Category:Performance]]

Performance Optimization

2012-06-12T16:02:49Z

Natmaka: /* General Setup and Optimization */ link update

== General Setup and Optimization ==
* [[Tuning Your PostgreSQL Server]] by Greg Smith, Robert Treat, and Christopher Browne
* [http://www.revsys.com/writings/postgresql-performance.html Performance Tuning PostgreSQL] by Frank Wiles
* [http://www.pgcon.org/2008/schedule/events/104.en.html GUCs: A Three Hour Tour] by Josh Berkus. Also useful here is his [http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc tuning OpenOffice spreadsheet], which suggests tuning values for 5 different types of workloads.
* [http://linuxfinances.info/info/quickstart.html QuickStart Guide to Tuning PostgreSQL] by Christopher Browne
* [http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm 5-Minute Introduction to PostgreSQL Performance] by Greg Smith
* [http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html Annotated postgresql.conf] by Josh Berkus and Shridhar Daithankar (older V7.4 targeted version of material covered in the GUC tour referenced above)
* [http://www.varlena.com/GeneralBits/Tidbits/perf.html Performance Tuning] by Josh Berkus and Shridhar Daithankar
* [http://www.zope.org/Members/pupq/pg_in_aggregates Replacing Slow Loops in PostgreSQL] by Joel Burton
* [http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/ PostgreSQL Hardware Performance Tuning] by Bruce Momjian
* [http://www.targeted.org/articles/databases/fragmentation.html The effects of data fragmentation in a mixed load database] by Dmitry Dvoinikov
* [http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/Postgres_Performance_Update83.pdf PostgreSQL Performance Features in 8.3] by Simon Riggs
* [http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/Postgres_Performance_Update84.pdf PostgreSQL Performance Features in 8.4] by Simon Riggs

Performance courses are available from a number of companies. Check [http://www.postgresql.org/about/eventarchive events and trainings] for further details.

==Critical maintenance for performance==
*[[Introduction to VACUUM, ANALYZE, EXPLAIN, and COUNT]] by Jim Nasby.
*[[VACUUM FULL]] and why you should avoid it
*[[Planner Statistics]]
*[[Using EXPLAIN]]
*[[Logging Difficult Queries]]
*[[Logging Checkpoints]]
*[http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm Checkpoints and the Background Writer: PostgreSQL 8.3 Improvements and Migration] by Greg Smith
*[[Bulk Loading and Restores]]
*[[Performance Analysis Tools]] by Craig Ringer

== Database architecture ==
* [[Priorities|Limiting and prioritizing user/query/database resource usage]] by Craig Ringer
* [[Prioritizing databases by separating into multiple clusters]] by Craig Ringer
* [[Clustering]]
* [[Shared Storage]]

==Database Hardware Selection and Setup==
* [[Database Hardware]]
* [[Reliable Writes]]

==Benchmark Workloads==
* [[:Category:Benchmarking]]

[[Category:Administration]][[Category:Performance]]
[[Category:General articles and guides]]

Working with Dates and Times in PostgreSQL

2011-05-12T13:45:02Z

Natmaka: /* I need to display a DATE as text, or convert text into a DATE or INTERVAL */

by Josh Berkus

This FAQ is intended to answer the following questions:

'''Q: Where are the DATEADD() and DATEDIFF() functions in PostgreSQL?''' 
'''Q: How do I tell the amount of time between X and Y?'''

KEYWORDS: date, datetime, timestamp, operator, dateadd, datediff, interval

=== First, the legalese ===

Copyright 2001 Josh Berkus (http://www.agliodbs.com). Permission granted to use in any public forum for which no fee is charged if this copyright notice appears in the document, or alternately in any published for-fee work if 1% or more of the proceeds of such work are donated or paid to benefit PostgreSQL development. This advice is provided with no warranty whatsoever, including any warranty of fitness for a particular purpose. Use at your own risk.

=== INTRODUCTION ===
One of PostgreSQL's joys is a robust support of a variety of date and time data types and their associated operators. This has allowed me to write calendaring applications in PostgreSQL that would have been considerably more difficult on other platforms.

Before we get down to the nuts-and-bolts, I need to explain a few things to the many who have come to us from database applications which are less ANSI 92 SQL compliant than PostgreSQL (particularly Microsoft SQL Server, SyBase and Microsoft Access). If you are already educated, you'll want to skip down to "Working with DATETIME, DATE, and INTERVAL values".

(BTW, I am not on an anti-Microsoft tirade here. I use MS SQL Server as an example of a non-standards-compliant database because I am a certified MS SQL Server admin and know its problems quite well. There are plenty of other non-compliant databases on the market.)

=== ANSI SQL and OPERATORS ===

In the ANSI SQL world, operators (such as + - * % || !) are defined only in the context of the data types being operated upon. Thus the division of two integers ( INT / INT ) does not function in the same way as the division of two float values (FLOAT / FLOAT). More dramatically, you may subtract one integer (INT - INT) from another, but you may not subtract one string from another (VARCHAR - VARCHAR), let alone subtract a string from an integer (INT - VARCHAR). The subtraction operator (-) in these two operations, while it looks the same, is in fact not the same owing to a different datatype context. In the absence of a predefined context, the operator does not function at all and you get an error message.

This fundamental rule has a number of tedious consequences. Frequently you must CAST two values to the same data type in order to work with them. For example, try adding a FLOAT and a NUMERIC value; you will get an error until you help out the database by defining them both as FLOAT or both as NUMERIC (CAST(FLOAT AS NUMERIC) + NUMERIC). Even more so, appending an integer to the end of a string requires a type conversion function (to_char(INT, '00000')). Further, if you want to define your own data types, you must spend the hours necessary to define all possible operators for them as well.

Some database developers, in a rush to get their products to market, saw the above "user-unfriendly" behaviour and cut it out of the system by defining all operators to work in a context-insensitive way. Thus, in Microsoft Transact-SQL, you may add a DOUBLE and an INTEGER, or even append an INTEGER directly to a string in some cases. The database can handle the implicit conversions for you, because they have been simplified.

However, the Transact-SQL developers disregarded the essential reason for including context-sensitive operators into the SQL standard. Only with real, context-sensitive operators can you handle special data types that do not follow arithmetic or concatenation rules. PostgreSQL's ability to handle IP addresses, geometric shapes, and, most importantly for our discussion, dates and times, is dependant on this robust operator implementation. Non-compliant dialects of SQL, such as Transact-SQL, are forced to resort to proprietary functions like DATEADD() and DATEDIFF() in order to work with dates and times, and cannot handle more complex data types at all.

Thus, to answer the first question :

'''Q. Where are the DATEADD and DATEDIFF functions in PostgreSQL?''' 
'''A.''' There are none. PostgreSQL does not need them. Use the + and - operators instead. Read on.

=== WORKING with DATETIME, DATE, and INTERVAL VALUES ===

[http://www.postgresql.org/docs/current/interactive/datatype-datetime.html Complete docs on date/time data types exist], I will not attempt to reproduce them here. Instead, I will simply try to explain to the beginner what you need to know to actually work with dates, times, and intervals.

==== Types ====

;DATETIME or TIMESTAMP:Structured "real" date and time values, containing year, month, day, hour, minute, second and millisecond for all useful date & time values (4713 BC to over 100,000 AD).

;DATE:Simplified integer-based representation of a date defining only year, month, and day.

;INTERVAL:Structured value showing a period of time, including any/all of years, months, weeks, days, hours, minutes, seconds, and milliseconds. "1 day", "42 minutes 10 seconds", and "2 years" are all INTERVAL values.

==== What about TIMESTAMP WITH TIME ZONE? ====

An important topic, that I don't want to get into here. Eventually someone will document this. Suffice it to say that all TIMESTAMP values carry TIMEZONE data as well which you may safely ignore if you don't need to handle different time zones.

==== Which do I want to use: DATE or TIMESTAMP? I don't need minutes or hours in my value ====

That depends. DATE is easier to work with for arithmetic (e.g. something reoccurring at a random interval of days), takes less storage space, and doesn't trail "00:00:00" strings you don't need when printed. However, TIMESTAMP is far better for real calendar calculations (e.g. something that happens on the 15th of each month or the 2nd Thursday of leap years). More below.

Now, to work with TIMESTAMP and INTERVAL, you need to understand these few simple rules :

===== 1. The difference between two TIMESTAMPs is always an INTERVAL =====

TIMESTAMP '1999-12-30' - TIMESTAMP '1999-12-11' = INTERVAL '19 days'

===== 2. You may add or subtract an INTERVAL to a TIMESTAMP to produce another TIMESTAMP =====

TIMESTAMP '1999-12-11' + INTERVAL '19 days' = TIMESTAMP '1999-12-30'

===== 3. You may add or subtract two INTERVALS =====

INTERVAL '1 month' + INTERVAL '1 month 3 days' = INTERVAL '2 months 3 days'

===== 4. Multiplication and division of INTERVALS is under development and discussion at this time =====

It is suggested that you avoid it until implementation is complete or you may get unexpected results.

===== 5. You may NOT (ever) perform Addition, Multiplication, or Division operations with two TIMESTAMPS =====

TIMESTAMP '2001-03-24' + TIMESTAMP '2001-10-01' = OPERATION ERROR

===== 6. Many larger INTERVAL values, like the calendar values they reflect, are '''not constant''' in length when expressed in smaller INTERVAL values =====

For example (differences bolded):

TIMESTAMP '2001-0'''7'''-02' + INTERVAL '1 month' = TIMESTAMP '2001-08-02' 
TIMESTAMP '2001-0'''7'''-02' + INTERVAL '31 days' = TIMESTAMP '2001-08-0'''2'''' 

'''but:''' 

TIMESTAMP '2001-0'''2'''-02' + INTERVAL '1 month' = TIMESTAMP '2001-03-02' 
TIMESTAMP '2001-0'''2'''-02' + INTERVAL '31' days' = TIMESTAMP '2001-03-0'''5''''

This makes the TIMESTAMP/INTERVAL combination ideal, for example, for scheduling an event which must reoccur every month on the 8th regardless of the length of the month, but problematic if you are trying to figure out the number of days in the last 3.5 months. Keep it in mind!

The DATE datatype, however, is simpler to deal with if less powerful.

==== Operations with DATEs ====

===== 1. The difference between two DATES is always an INTEGER, representing the number of DAYS difference =====

DATE '1999-12-30' - DATE '1999-12-11' = INTEGER 19

===== You may add or subtract an INTEGER to a DATE to produce another DATE =====

DATE '1999-12-11' + INTEGER 19 = DATE '1999-12-30'

===== Because the difference of two DATES is an INTEGER, this difference may be added, subtracted, divided, multiplied, or even modulo (%) =====

===== As with TIMESTAMP, you may NOT perform Addition, Multiplication, Division, or other operations with two DATES =====

===== DATE/INTEGER cannot figure out the varying lengths of months and years =====

Because DATE differences are always calculated as whole numbers of days, DATE/INTEGER cannot figure out the varying lengths of months and years. Thus, you cannot use DATE/INTEGER to schedule something for the 5th of every month without some very fancy length-of-month calculating on the fly. This makes DATE ideal for calendar applications involving a lot of calculating based on numbers of days (e.g. "For how many 14-day periods has employee "x" been employed?") but poor for actual calendaring apps. Keep it in mind.

==== I'm porting an app from MS SQL Server, and I need to support the DATEDIFF and DATEADD functions so that my stored views will work ====

Proceed to PostgreSQL TechDocs (http://techdocs.postgresql.org). There are many porting resources there, and I'd be surprised if someone hasn't already re-created these functions under PostgreSQL.

==== I need to display a DATE as text, or convert text into a DATE or INTERVAL ====

You want the [http://www.postgresql.org/docs/current/interactive/functions-formatting.html to_date(), to_char()], and [http://www.postgresql.org/docs/current/interactive/functions-datetime.html interval()] functions.

==== What if I want to get the month as an integer out of a date?====

You want the [http://www.postgresql.org/docs/current/interactive/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT extract()] function. This function also works to give you other numeric intervals from a timestamp, including the Unix system datetime (e.g. EXTRACT ( epoch from some_date ))

Working with Dates and Times in PostgreSQL

2011-05-12T13:40:55Z

Natmaka: /* What if I want to get the month as an integer out of a date? */

by Josh Berkus

This FAQ is intended to answer the following questions:

'''Q: Where are the DATEADD() and DATEDIFF() functions in PostgreSQL?''' 
'''Q: How do I tell the amount of time between X and Y?'''

KEYWORDS: date, datetime, timestamp, operator, dateadd, datediff, interval

=== First, the legalese ===

Copyright 2001 Josh Berkus (http://www.agliodbs.com). Permission granted to use in any public forum for which no fee is charged if this copyright notice appears in the document, or alternately in any published for-fee work if 1% or more of the proceeds of such work are donated or paid to benefit PostgreSQL development. This advice is provided with no warranty whatsoever, including any warranty of fitness for a particular purpose. Use at your own risk.

=== INTRODUCTION ===
One of PostgreSQL's joys is a robust support of a variety of date and time data types and their associated operators. This has allowed me to write calendaring applications in PostgreSQL that would have been considerably more difficult on other platforms.

Before we get down to the nuts-and-bolts, I need to explain a few things to the many who have come to us from database applications which are less ANSI 92 SQL compliant than PostgreSQL (particularly Microsoft SQL Server, SyBase and Microsoft Access). If you are already educated, you'll want to skip down to "Working with DATETIME, DATE, and INTERVAL values".

(BTW, I am not on an anti-Microsoft tirade here. I use MS SQL Server as an example of a non-standards-compliant database because I am a certified MS SQL Server admin and know its problems quite well. There are plenty of other non-compliant databases on the market.)

=== ANSI SQL and OPERATORS ===

In the ANSI SQL world, operators (such as + - * % || !) are defined only in the context of the data types being operated upon. Thus the division of two integers ( INT / INT ) does not function in the same way as the division of two float values (FLOAT / FLOAT). More dramatically, you may subtract one integer (INT - INT) from another, but you may not subtract one string from another (VARCHAR - VARCHAR), let alone subtract a string from an integer (INT - VARCHAR). The subtraction operator (-) in these two operations, while it looks the same, is in fact not the same owing to a different datatype context. In the absence of a predefined context, the operator does not function at all and you get an error message.

This fundamental rule has a number of tedious consequences. Frequently you must CAST two values to the same data type in order to work with them. For example, try adding a FLOAT and a NUMERIC value; you will get an error until you help out the database by defining them both as FLOAT or both as NUMERIC (CAST(FLOAT AS NUMERIC) + NUMERIC). Even more so, appending an integer to the end of a string requires a type conversion function (to_char(INT, '00000')). Further, if you want to define your own data types, you must spend the hours necessary to define all possible operators for them as well.

Some database developers, in a rush to get their products to market, saw the above "user-unfriendly" behaviour and cut it out of the system by defining all operators to work in a context-insensitive way. Thus, in Microsoft Transact-SQL, you may add a DOUBLE and an INTEGER, or even append an INTEGER directly to a string in some cases. The database can handle the implicit conversions for you, because they have been simplified.

However, the Transact-SQL developers disregarded the essential reason for including context-sensitive operators into the SQL standard. Only with real, context-sensitive operators can you handle special data types that do not follow arithmetic or concatenation rules. PostgreSQL's ability to handle IP addresses, geometric shapes, and, most importantly for our discussion, dates and times, is dependant on this robust operator implementation. Non-compliant dialects of SQL, such as Transact-SQL, are forced to resort to proprietary functions like DATEADD() and DATEDIFF() in order to work with dates and times, and cannot handle more complex data types at all.

Thus, to answer the first question :

'''Q. Where are the DATEADD and DATEDIFF functions in PostgreSQL?''' 
'''A.''' There are none. PostgreSQL does not need them. Use the + and - operators instead. Read on.

=== WORKING with DATETIME, DATE, and INTERVAL VALUES ===

[http://www.postgresql.org/docs/current/interactive/datatype-datetime.html Complete docs on date/time data types exist], I will not attempt to reproduce them here. Instead, I will simply try to explain to the beginner what you need to know to actually work with dates, times, and intervals.

==== Types ====

;DATETIME or TIMESTAMP:Structured "real" date and time values, containing year, month, day, hour, minute, second and millisecond for all useful date & time values (4713 BC to over 100,000 AD).

;DATE:Simplified integer-based representation of a date defining only year, month, and day.

;INTERVAL:Structured value showing a period of time, including any/all of years, months, weeks, days, hours, minutes, seconds, and milliseconds. "1 day", "42 minutes 10 seconds", and "2 years" are all INTERVAL values.

==== What about TIMESTAMP WITH TIME ZONE? ====

An important topic, that I don't want to get into here. Eventually someone will document this. Suffice it to say that all TIMESTAMP values carry TIMEZONE data as well which you may safely ignore if you don't need to handle different time zones.

==== Which do I want to use: DATE or TIMESTAMP? I don't need minutes or hours in my value ====

That depends. DATE is easier to work with for arithmetic (e.g. something reoccurring at a random interval of days), takes less storage space, and doesn't trail "00:00:00" strings you don't need when printed. However, TIMESTAMP is far better for real calendar calculations (e.g. something that happens on the 15th of each month or the 2nd Thursday of leap years). More below.

Now, to work with TIMESTAMP and INTERVAL, you need to understand these few simple rules :

===== 1. The difference between two TIMESTAMPs is always an INTERVAL =====

TIMESTAMP '1999-12-30' - TIMESTAMP '1999-12-11' = INTERVAL '19 days'

===== 2. You may add or subtract an INTERVAL to a TIMESTAMP to produce another TIMESTAMP =====

TIMESTAMP '1999-12-11' + INTERVAL '19 days' = TIMESTAMP '1999-12-30'

===== 3. You may add or subtract two INTERVALS =====

INTERVAL '1 month' + INTERVAL '1 month 3 days' = INTERVAL '2 months 3 days'

===== 4. Multiplication and division of INTERVALS is under development and discussion at this time =====

It is suggested that you avoid it until implementation is complete or you may get unexpected results.

===== 5. You may NOT (ever) perform Addition, Multiplication, or Division operations with two TIMESTAMPS =====

TIMESTAMP '2001-03-24' + TIMESTAMP '2001-10-01' = OPERATION ERROR

===== 6. Many larger INTERVAL values, like the calendar values they reflect, are '''not constant''' in length when expressed in smaller INTERVAL values =====

For example (differences bolded):

TIMESTAMP '2001-0'''7'''-02' + INTERVAL '1 month' = TIMESTAMP '2001-08-02' 
TIMESTAMP '2001-0'''7'''-02' + INTERVAL '31 days' = TIMESTAMP '2001-08-0'''2'''' 

'''but:''' 

TIMESTAMP '2001-0'''2'''-02' + INTERVAL '1 month' = TIMESTAMP '2001-03-02' 
TIMESTAMP '2001-0'''2'''-02' + INTERVAL '31' days' = TIMESTAMP '2001-03-0'''5''''

This makes the TIMESTAMP/INTERVAL combination ideal, for example, for scheduling an event which must reoccur every month on the 8th regardless of the length of the month, but problematic if you are trying to figure out the number of days in the last 3.5 months. Keep it in mind!

The DATE datatype, however, is simpler to deal with if less powerful.

==== Operations with DATEs ====

===== 1. The difference between two DATES is always an INTEGER, representing the number of DAYS difference =====

DATE '1999-12-30' - DATE '1999-12-11' = INTEGER 19

===== You may add or subtract an INTEGER to a DATE to produce another DATE =====

DATE '1999-12-11' + INTEGER 19 = DATE '1999-12-30'

===== Because the difference of two DATES is an INTEGER, this difference may be added, subtracted, divided, multiplied, or even modulo (%) =====

===== As with TIMESTAMP, you may NOT perform Addition, Multiplication, Division, or other operations with two DATES =====

===== DATE/INTEGER cannot figure out the varying lengths of months and years =====

Because DATE differences are always calculated as whole numbers of days, DATE/INTEGER cannot figure out the varying lengths of months and years. Thus, you cannot use DATE/INTEGER to schedule something for the 5th of every month without some very fancy length-of-month calculating on the fly. This makes DATE ideal for calendar applications involving a lot of calculating based on numbers of days (e.g. "For how many 14-day periods has employee "x" been employed?") but poor for actual calendaring apps. Keep it in mind.

==== I'm porting an app from MS SQL Server, and I need to support the DATEDIFF and DATEADD functions so that my stored views will work ====

Proceed to PostgreSQL TechDocs (http://techdocs.postgresql.org). There are many porting resources there, and I'd be surprised if someone hasn't already re-created these functions under PostgreSQL.

==== I need to display a DATE as text, or convert text into a DATE or INTERVAL ====

You want the to_date(), to_char(), and interval() functions. See "functions and operators" in the PostgreSQL docs: http://www.postgresql.org/docs/current/interactive/functions.html

==== What if I want to get the month as an integer out of a date?====

You want the [http://www.postgresql.org/docs/current/interactive/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT extract()] function. This function also works to give you other numeric intervals from a timestamp, including the Unix system datetime (e.g. EXTRACT ( epoch from some_date ))

User:Natmaka

2011-05-12T12:45:13Z

Natmaka: Created page with "[http://makarevitch.org Nat Makarevitch]"

[http://makarevitch.org Nat Makarevitch]

Working with Dates and Times in PostgreSQL

2011-05-12T12:44:44Z

Natmaka: /* WORKING with DATETIME, DATE, and INTERVAL VALUES */

by Josh Berkus

This FAQ is intended to answer the following questions:

'''Q: Where are the DATEADD() and DATEDIFF() functions in PostgreSQL?''' 
'''Q: How do I tell the amount of time between X and Y?'''

KEYWORDS: date, datetime, timestamp, operator, dateadd, datediff, interval

=== First, the legalese ===

Copyright 2001 Josh Berkus (http://www.agliodbs.com). Permission granted to use in any public forum for which no fee is charged if this copyright notice appears in the document, or alternately in any published for-fee work if 1% or more of the proceeds of such work are donated or paid to benefit PostgreSQL development. This advice is provided with no warranty whatsoever, including any warranty of fitness for a particular purpose. Use at your own risk.

=== INTRODUCTION ===
One of PostgreSQL's joys is a robust support of a variety of date and time data types and their associated operators. This has allowed me to write calendaring applications in PostgreSQL that would have been considerably more difficult on other platforms.

Before we get down to the nuts-and-bolts, I need to explain a few things to the many who have come to us from database applications which are less ANSI 92 SQL compliant than PostgreSQL (particularly Microsoft SQL Server, SyBase and Microsoft Access). If you are already educated, you'll want to skip down to "Working with DATETIME, DATE, and INTERVAL values".

(BTW, I am not on an anti-Microsoft tirade here. I use MS SQL Server as an example of a non-standards-compliant database because I am a certified MS SQL Server admin and know its problems quite well. There are plenty of other non-compliant databases on the market.)

=== ANSI SQL and OPERATORS ===

In the ANSI SQL world, operators (such as + - * % || !) are defined only in the context of the data types being operated upon. Thus the division of two integers ( INT / INT ) does not function in the same way as the division of two float values (FLOAT / FLOAT). More dramatically, you may subtract one integer (INT - INT) from another, but you may not subtract one string from another (VARCHAR - VARCHAR), let alone subtract a string from an integer (INT - VARCHAR). The subtraction operator (-) in these two operations, while it looks the same, is in fact not the same owing to a different datatype context. In the absence of a predefined context, the operator does not function at all and you get an error message.

This fundamental rule has a number of tedious consequences. Frequently you must CAST two values to the same data type in order to work with them. For example, try adding a FLOAT and a NUMERIC value; you will get an error until you help out the database by defining them both as FLOAT or both as NUMERIC (CAST(FLOAT AS NUMERIC) + NUMERIC). Even more so, appending an integer to the end of a string requires a type conversion function (to_char(INT, '00000')). Further, if you want to define your own data types, you must spend the hours necessary to define all possible operators for them as well.

Some database developers, in a rush to get their products to market, saw the above "user-unfriendly" behaviour and cut it out of the system by defining all operators to work in a context-insensitive way. Thus, in Microsoft Transact-SQL, you may add a DOUBLE and an INTEGER, or even append an INTEGER directly to a string in some cases. The database can handle the implicit conversions for you, because they have been simplified.

However, the Transact-SQL developers disregarded the essential reason for including context-sensitive operators into the SQL standard. Only with real, context-sensitive operators can you handle special data types that do not follow arithmetic or concatenation rules. PostgreSQL's ability to handle IP addresses, geometric shapes, and, most importantly for our discussion, dates and times, is dependant on this robust operator implementation. Non-compliant dialects of SQL, such as Transact-SQL, are forced to resort to proprietary functions like DATEADD() and DATEDIFF() in order to work with dates and times, and cannot handle more complex data types at all.

Thus, to answer the first question :

'''Q. Where are the DATEADD and DATEDIFF functions in PostgreSQL?''' 
'''A.''' There are none. PostgreSQL does not need them. Use the + and - operators instead. Read on.

=== WORKING with DATETIME, DATE, and INTERVAL VALUES ===

[http://www.postgresql.org/docs/current/interactive/datatype-datetime.html Complete docs on date/time data types exist], I will not attempt to reproduce them here. Instead, I will simply try to explain to the beginner what you need to know to actually work with dates, times, and intervals.

==== Types ====

;DATETIME or TIMESTAMP:Structured "real" date and time values, containing year, month, day, hour, minute, second and millisecond for all useful date & time values (4713 BC to over 100,000 AD).

;DATE:Simplified integer-based representation of a date defining only year, month, and day.

;INTERVAL:Structured value showing a period of time, including any/all of years, months, weeks, days, hours, minutes, seconds, and milliseconds. "1 day", "42 minutes 10 seconds", and "2 years" are all INTERVAL values.

==== What about TIMESTAMP WITH TIME ZONE? ====

An important topic, that I don't want to get into here. Eventually someone will document this. Suffice it to say that all TIMESTAMP values carry TIMEZONE data as well which you may safely ignore if you don't need to handle different time zones.

==== Which do I want to use: DATE or TIMESTAMP? I don't need minutes or hours in my value ====

That depends. DATE is easier to work with for arithmetic (e.g. something reoccurring at a random interval of days), takes less storage space, and doesn't trail "00:00:00" strings you don't need when printed. However, TIMESTAMP is far better for real calendar calculations (e.g. something that happens on the 15th of each month or the 2nd Thursday of leap years). More below.

Now, to work with TIMESTAMP and INTERVAL, you need to understand these few simple rules :

===== 1. The difference between two TIMESTAMPs is always an INTERVAL =====

TIMESTAMP '1999-12-30' - TIMESTAMP '1999-12-11' = INTERVAL '19 days'

===== 2. You may add or subtract an INTERVAL to a TIMESTAMP to produce another TIMESTAMP =====

TIMESTAMP '1999-12-11' + INTERVAL '19 days' = TIMESTAMP '1999-12-30'

===== 3. You may add or subtract two INTERVALS =====

INTERVAL '1 month' + INTERVAL '1 month 3 days' = INTERVAL '2 months 3 days'

===== 4. Multiplication and division of INTERVALS is under development and discussion at this time =====

It is suggested that you avoid it until implementation is complete or you may get unexpected results.

===== 5. You may NOT (ever) perform Addition, Multiplication, or Division operations with two TIMESTAMPS =====

TIMESTAMP '2001-03-24' + TIMESTAMP '2001-10-01' = OPERATION ERROR

===== 6. Many larger INTERVAL values, like the calendar values they reflect, are '''not constant''' in length when expressed in smaller INTERVAL values =====

For example (differences bolded):

TIMESTAMP '2001-0'''7'''-02' + INTERVAL '1 month' = TIMESTAMP '2001-08-02' 
TIMESTAMP '2001-0'''7'''-02' + INTERVAL '31 days' = TIMESTAMP '2001-08-0'''2'''' 

'''but:''' 

TIMESTAMP '2001-0'''2'''-02' + INTERVAL '1 month' = TIMESTAMP '2001-03-02' 
TIMESTAMP '2001-0'''2'''-02' + INTERVAL '31' days' = TIMESTAMP '2001-03-0'''5''''

This makes the TIMESTAMP/INTERVAL combination ideal, for example, for scheduling an event which must reoccur every month on the 8th regardless of the length of the month, but problematic if you are trying to figure out the number of days in the last 3.5 months. Keep it in mind!

The DATE datatype, however, is simpler to deal with if less powerful.

==== Operations with DATEs ====

===== 1. The difference between two DATES is always an INTEGER, representing the number of DAYS difference =====

DATE '1999-12-30' - DATE '1999-12-11' = INTEGER 19

===== You may add or subtract an INTEGER to a DATE to produce another DATE =====

DATE '1999-12-11' + INTEGER 19 = DATE '1999-12-30'

===== Because the difference of two DATES is an INTEGER, this difference may be added, subtracted, divided, multiplied, or even modulo (%) =====

===== As with TIMESTAMP, you may NOT perform Addition, Multiplication, Division, or other operations with two DATES =====

===== DATE/INTEGER cannot figure out the varying lengths of months and years =====

Because DATE differences are always calculated as whole numbers of days, DATE/INTEGER cannot figure out the varying lengths of months and years. Thus, you cannot use DATE/INTEGER to schedule something for the 5th of every month without some very fancy length-of-month calculating on the fly. This makes DATE ideal for calendar applications involving a lot of calculating based on numbers of days (e.g. "For how many 14-day periods has employee "x" been employed?") but poor for actual calendaring apps. Keep it in mind.

==== I'm porting an app from MS SQL Server, and I need to support the DATEDIFF and DATEADD functions so that my stored views will work ====

Proceed to PostgreSQL TechDocs (http://techdocs.postgresql.org). There are many porting resources there, and I'd be surprised if someone hasn't already re-created these functions under PostgreSQL.

==== I need to display a DATE as text, or convert text into a DATE or INTERVAL ====

You want the to_date(), to_char(), and interval() functions. See "functions and operators" in the PostgreSQL docs: http://www.postgresql.org/docs/current/interactive/functions.html

==== What if I want to get the month as an integer out of a date?====

You want the extract() function. This function also works to give you other numeric intervals from a timestamp, including the Unix system datetime (e.g. EXTRACT ( epoch from some_date ))