https://wiki.postgresql.org/api.php?action=feedcontributions&user=Ams&feedformat=atomPostgreSQL wiki - User contributions [en]2024-03-29T15:32:28ZUser contributionsMediaWiki 1.35.13https://wiki.postgresql.org/index.php?title=Todo&diff=22878Todo2014-07-14T12:09:11Z<p>Ams: Remove easy annotation</p>
<hr />
<div><div style="margin: 1ex 1em; float: right;"><br />
__TOC__<br />
</div><br />
<br />
This list contains '''known PostgreSQL bugs and feature requests''' and we hope it is complete. If you would like to work on an item, please read the [[Developer FAQ]] first. There is also a [[Development_information|development information page]].<br />
<br />
* {{TodoPending}} - marks ordinary, incomplete items<br />
* {{TodoEasy}} - marks items that are easier to implement<br />
* {{TodoDone}} - marks changes that are done, and will appear in the PostgreSQL 9.5 release.<br />
<br />
For help on editing this list, please see [[Talk:Todo]]. <b>Please do not add items here without discussion on the mailing list.</b><br />
<br />
<b>For Developers:</b> Unfortunately this list does not contain all the information necessary for someone to start coding a feature. Some of these items might have become unnecessary since they were added --- others might be desirable but the implementation might be unclear. When selecting items listed below, be prepared to first discuss the value of the feature. Do not assume that you can select one, code it and then expect it to be committed. Always discuss design on Hackers list before starting to code. The flow should be:<br />
<br />
Desirability -> Design -> Implement -> Test -> Review -> Commit<br />
<br />
<div style="padding: 1ex 4em;"><br />
== Administration ==<br />
<br />
{{TodoItem<br />
|Allow administrators to cancel multi-statement idle transactions<br />
|This allows locks to be released, but it is complex to report the cancellation back to the client.<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php <nowiki>Cancelling idle in transaction state</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg00441.php <nowiki>Re: Cancelling idle in transaction state</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow administrators to cancel long-lived prepared transactions<br />
* [http://www.postgresql.org/message-id/20961.1403630269@sss.pgh.pa.us Re: idle_in_transaction_timeout]<br />
}}<br />
<br />
{{TodoItem<br />
|Check for unreferenced table files created by transactions that were in-progress when the server terminated abruptly<br />
* [http://archives.postgresql.org/pgsql-patches/2006-06/msg00096.php <nowiki>Removing unreferenced files</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Set proper permissions on non-system schemas during db creation<br />
|Currently all schemas are owned by the super-user because they are copied from the template1 database. However, since all objects are inherited from the template database, it is not clear that setting schemas to the db owner is correct.}}<br />
<br />
{{TodoItem<br />
|Allow log_min_messages to be specified on a per-module basis<br />
|This would allow administrators to see more detailed information from specific sections of the backend, e.g. checkpoints, autovacuum, etc. Another idea is to allow separate configuration files for each module, or allow arbitrary SET commands to be passed to them. See also [[Logging Brainstorm]].}}<br />
<br />
{{TodoItem<br />
|Simplify creation of partitioned tables<br />
|This would allow creation of partitioned tables without requiring creation of triggers or rules for INSERT/UPDATE/DELETE, and constraints for rapid partition selection. Options could include range and hash partition selection. See also [[Table partitioning]]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow custom variables to appear in pg_settings()<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00850.php <nowiki>Re: count(*) performance improvement ideas</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Have custom variables be transaction-safe<br />
* {{MessageLink|4B577E9F.8000505@dunslane.net|Custom GUCs still a bit broken}}<br />
}}<br />
<br />
{{TodoItem<br />
|Implement the SQL-standard mechanism whereby REVOKE ROLE revokes only the privilege granted by the invoking role, and not those granted by other roles<br />
* [http://archives.postgresql.org/pgsql-bugs/2007-05/msg00010.php <nowiki>Re: Grantor name gets lost when grantor role dropped</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Prevent query cancel packets from being replayed by an attacker, especially when using SSL<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-08/msg00345.php <nowiki>Replay attack of query cancel</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Provide a way to query the log collector subprocess to determine the name of the currently active log file<br />
* [http://archives.postgresql.org/pgsql-general/2008-11/msg00418.php <nowiki>Current log files when rotating?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow simpler reporting of the unix domain socket directory and allow easier configuration of its default location<br />
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg01555.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-10/msg01482.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow custom daemons to be automatically stopped/started along with the postmaster<br />
|This allows easier administration of daemons like user job schedulers or replication-related daemons.<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01701.php <nowiki>Re: scheduler in core</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve logging of prepared transactions recovered during startup<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-11/msg00092.php <nowiki>&quot;recovering prepared transaction&quot; after server restart message</nowiki>]<br />
}}<br />
<br />
=== Configuration files ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Consider normalizing fractions in postgresql.conf, perhaps using '%'<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg00550.php <nowiki>Fractions in GUC variables</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow Kerberos to disable stripping of realms so we can check the username@realm against multiple realms<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00009.php <nowiki>krb_match_realm patch</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve LDAP authentication configuration options<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01745.php <nowiki>Proposed Patch - LDAPS support for servers on port 636 w/o TLS</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add external tool to auto-tune some postgresql.conf parameters<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00000.php <nowiki>Re: Overhauling GUCS</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00033.php <nowiki>Simple postgresql.conf wizard</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add 'hostgss' pg_hba.conf option to allow GSS link-level encryption<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg01454.php <nowiki>Re: Plans for 8.4</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Process pg_hba.conf keywords as case-insensitive<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00432.php <nowiki>More robust pg_hba.conf parsing/error logging</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow pg_hba.conf to process include files<br />
* [http://www.postgresql.org/message-id/86fvnm5t44.fsf@jerry.enova.com HBA files w/include support]<br />
}}<br />
<br />
{{TodoItem<br />
|Create utility to compute accurate random_page_cost value<br />
* http://archives.postgresql.org/pgsql-performance/2011-04/msg00162.php<br />
* http://archives.postgresql.org/pgsql-performance/2011-04/msg00362.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow configuration files to be independently validated<br />
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php<br />
* http://archives.postgresql.org/message-id/12666.1310774573@sss.pgh.pa.us<br />
}}<br />
<br />
{{TodoItem<br />
|Allow postgresql.conf settings to be accepted by backends even if some settings are invalid for those backends<br />
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow all backends to receive postgresql.conf setting changes at the same time<br />
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow synchronous_standby_names to be disabled after communication failure with all synchronous standby servers exceeds some timeout<br />
|This also requires successful execution of a synchronous notification command.<br />
* http://archives.postgresql.org/pgsql-hackers/2012-07/msg00409.php<br />
* [http://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB7713DD9A622@SZXEML508-MBX.china.huawei.com Standalone synchronous master]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block<br />
* Currently it displays zero.<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Tablespaces ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow a database in tablespace t1 with tables created in tablespace t2 to be used as a template for a new database created with default tablespace t2<br />
|Currently all objects in the default database tablespace must have default tablespace specifications. This is because new databases are created by copying directories. If you mix default tablespace tables and tablespace-specified tables in the same directory, creating a new database from such a mixed directory would create a new database with tables that had incorrect explicit tablespaces. To fix this would require modifying pg_class in the newly copied database, which we don't currently do.}}<br />
<br />
{{TodoItem<br />
|Allow reporting of which objects are in which tablespaces<br />
|This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace.}}<br />
<br />
{{TodoItem<br />
|Allow WAL replay of CREATE TABLESPACE to work when the directory structure on the recovery computer is different from the original}}<br />
<br />
{{TodoItem<br />
|Allow per-tablespace quotas}}<br />
<br />
{{TodoItem<br />
|Allow tablespaces on RAM-based partitions for unlogged tables<br />
* http://archives.postgresql.org/pgsql-advocacy/2011-05/msg00033.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow toast tables to be moved to a different tablespace<br />
* [http://archives.postgresql.org/pgsql-hackers/2011-05/msg00980.php]<br />
* {{messageLink|CAFEQCbH756DyyAPQ1ykh3+b+kE1-EhWRww1WO_x5v38C-uLnUg@mail.gmail.com|patch : Allow toast tables to be moved to a different tablespace}} (issues remain)<br />
* [http://archives.postgresql.org/message-id/CAFEQCbEq07OopgE5xFYv2Q3eMq45hRSJkjCBO+kvpJq9NEVhow@mail.gmail.com Allow toast tables to be moved to a different tablespace]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Statistics Collector ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow statistics last vacuum/analyze execution times to be displayed without requiring track_counts to be enabled<br />
* [http://archives.postgresql.org/pgsql-docs/2007-04/msg00028.php <nowiki>row-level stats and last analyze time</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Clear table counters on TRUNCATE<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php <nowiki>Small TRUNCATE glitch</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Track number of WAL files ready to be archived in pg_stat_archiver <br />
* [http://www.postgresql.org/message-id/CAB7nPqSCrcZGGy_SmpT7ubSzVGNMtphYU1JJZYyapHuN46E-Tw@mail.gmail.com <nowiki>pg_stat_archiver missing feature</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== SSL ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow SSL authentication/encryption over unix domain sockets<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00924.php <nowiki>Re: Spoofing as the postmaster</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow SSL key file permission checks to be optionally disabled when sharing SSL keys with other applications<br />
* [http://archives.postgresql.org/pgsql-bugs/2007-12/msg00069.php <nowiki>BUG #3809: SSL &quot;unsafe&quot; private key permissions bug</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow SSL CRL files to be re-read during configuration file reload, rather than requiring a server restart<br />
|Unlike SSL CRT files, CRL (Certificate Revocation List) files are updated frequently<br />
* [http://archives.postgresql.org/pgsql-general/2008-12/msg00832.php <nowiki>Automatic CRL reload</nowiki>]<br />
Alternatively or additionally supporting OCSP (online certificate security protocol) would provide real-time revocation discovery without reloading<br />
}}<br />
<br />
{{TodoItem<br />
| Allow automatic selection of SSL client certificates from a certificate store<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00406.php <nowiki>Allow multiple certificates or keys in the postgresql.crt/.key files</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
| Send the full certificate server chain to the client<br />
* [http://archives.postgresql.org/pgsql-bugs/2009-12/msg00145.php BUG #5245: Full Server Certificate Chain Not Sent to client]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Point-In-Time Recovery (PITR) ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow archive_mode to be changed without server restart?<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01655.php <nowiki>Enabling archive_mode without restart</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider avoiding WAL switching via archive_timeout if there has been no database activity<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-01/msg01469.php <nowiki>archive_timeout behavior for no activity</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg00395.php <nowiki>Re: archive_timeout behavior for no activity</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow base backup from standby to continue when the standby is promoted.<br />
* [http://archives.postgresql.org/pgsql-hackers/2012-10/msg00239.php <nowiki>Re: Promoting a standby during base backup</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add recovery target option to stop as soon as consistency is reached.<br />
* [http://archives.postgresql.org/message-id/5188F87D.1080908@vmware.com <nowiki>Re: Recovery target 'immediate'</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Standby server mode ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
| Allow pg_xlogfile_name() to be used in recovery mode<br />
* [http://archives.postgresql.org/message-id/3f0b79eb1001190135vd9f62f1sa7868abc1ea61d12@mail.gmail.com <nowiki>Streaming replication and pg_xlogfile_name()</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
| Prevent variables inherited from the server environment from begin used for making streaming replication connections.<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01011.php <nowiki>Re: Parameter name standby_mode</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
| Change walsender so that it applies per-role settings<br />
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg00642.php<br />
}}<br />
<br />
{{TodoItem<br />
| Restructure configuration parameters for standby mode<br />
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg01820.php<br />
}}<br />
<br />
{{TodoItemf<br />
| Allow time-delayed application of logs on the standby<br />
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00992.php<br />
}}<br />
<br />
{{TodoItem<br />
| Add -X parameter to pg_basebackup to specify a different directory for px_xlog, like initdb<br />
}}<br />
<br />
{{TodoItem<br />
| Add a new "eager" synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period<br />
|This would require some type of command to be executed to alert administrators of this change.<br />
* http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Data Types ==<br />
<br />
{{TodoItem<br />
|Fix data types where equality comparison is not intuitive, e.g. box<br />
* http://archives.postgresql.org/pgsql-hackers/2011-10/msg01643.php<br />
}}<br />
<br />
{{TodoItem<br />
|Add support for public SYNONYMs<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-03/msg00519.php <nowiki>Proposal for SYNONYMS</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg02043.php<br />
* http://archives.postgresql.org/pgsql-general/2010-12/msg00139.php<br />
}}<br />
<br />
{{TodoItem<br />
|Add support for SQL-standard GENERATED/IDENTITY columns<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-07/msg00543.php <nowiki>Re: Three weeks left until feature freeze</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-08/msg00038.php <nowiki>GENERATED ... AS IDENTITY, Was: Re: Feature Freeze</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00344.php <nowiki>Behavior of GENERATED columns per SQL2003</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00076.php <nowiki>Re: [HACKERS] Behavior of GENERATED columns per SQL2003</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00604.php <nowiki>IDENTITY/GENERATED patch</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider placing all sequences in a single table, or create a system view<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php <nowiki>Re: newbie: renaming sequences task</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2012-02/msg00258.php Removing special case OID generation]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider a special data type for regular expressions<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg01067.php <nowiki>Why is there a tsquery data type?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow renaming and deleting enumerated values from an existing enumerated data type<br />
}}<br />
<br />
{{TodoItem<br />
|Support scoped IPv6 addresses in the inet type<br />
* [http://archives.postgresql.org/pgsql-bugs/2007-05/msg00111.php <nowiki>strange problem with ip6</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Considering improving performance of computing CHAR() value lengths<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg00900.php <nowiki>char() overhead on read-only workloads not so insignifcant as the docs claim it is...</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01787.php <nowiki>Re: [PATCH] backend: compare word-at-a-time in bcTruelen</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add overlaps geometric operators that ignore point overlaps<br />
* http://archives.postgresql.org/pgsql-hackers/2010-03/msg00861.php<br />
}}<br />
<br />
{{TodoItem<br />
|Remove or improve rounding in geometric comparison operators<br />
* http://archives.postgresql.org/message-id/9804.1346187849@sss.pgh.pa.us<br />
}}<br />
<br />
{{TodoItem<br />
| Add IMMUTABLE column attribute<br />
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg00623.php<br />
}}<br />
<br />
=== Domains ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow functions defined as casts to domains to be called during casting<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-05/msg00072.php <nowiki>bug? non working casts for domain</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-09/msg01681.php <nowiki>TODO: Fix CREATE CAST on DOMAINs</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow values to be cast to domain types<br />
* [http://archives.postgresql.org/pgsql-hackers/2003-06/msg01206.php <nowiki>Domain casting still doesn't work right</nowiki>] <br />
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00289.php <nowiki>domain casting?</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00812.php<br />
}}<br />
<br />
{{TodoItem<br />
|Make domains work better with polymorphic functions<br />
* [http://archives.postgresql.org/message-id/4887.1228700773@sss.pgh.pa.us Polymorphic types vs. domains]<br />
* [http://archives.postgresql.org/message-id/15535.1238774571@sss.pgh.pa.us some difficulties with fixing it]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Dates and Times ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow infinite intervals just like infinite timestamps<br />
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg00076.php<br />
}}<br />
<br />
{{TodoItem<br />
|Determine how to represent date/time field extraction on infinite timestamps<br />
* [http://archives.postgresql.org/message-id/CA+mi_8bda-Fnev9iXeUbnqhVaCWzbYhHkWoxPQfBca9eDPpRMw@mail.gmail.com extract(epoch from infinity) is not 0]<br />
* [http://archives.postgresql.org/message-id/CADAkt-icuESH16uLOCXbR-dKpcvwtUJE4JWXnkdAjAAwP6j12g@mail.gmail.com converting between infinity timestamp and float8]<br />
}}<br />
<br />
<br />
{{TodoItem<br />
|Allow TIMESTAMP WITH TIME ZONE to store the original timezone information, either zone name or offset from UTC<br />
|If the TIMESTAMP value is stored with a time zone name, interval computations should adjust based on the time zone rules. <br />
* [http://archives.postgresql.org/pgsql-hackers/2004-10/msg00705.php <nowiki>timestamp with time zone a la sql99</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Have timestamp subtraction not call justify_hours()?<br />
* [http://archives.postgresql.org/pgsql-sql/2006-10/msg00059.php <nowiki>timestamp subtraction (was Re: formatting intervals with to_char)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add function to allow the creation of timestamps using parameters<br />
* http://archives.postgresql.org/pgsql-performance/2010-06/msg00232.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow a comma to denote fractional seconds in ISO-8601-compliant times (and timestamps)<br />
* http://www.postgresql.org/message-id/7D5AC9AB-238D-4FE7-8857-18D98190A4D9@justatheory.com<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Arrays ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Add support for arrays of domains<br />
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00114.php <nowiki>Re: updated WIP: arrays of composites</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow single-byte header storage for array elements}}<br />
<br />
{{TodoItem<br />
|Add function to detect if an array is empty<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00475.php <nowiki>Re: array_length()</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve handling of NULLs in arrays<br />
* [http://archives.postgresql.org/pgsql-bugs/2008-11/msg00009.php <nowiki>BUG #4509: array_cat's null behaviour is inconsistent</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg01040.php<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Binary Data ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Improve vacuum of large objects, like contrib/vacuumlo?}}<br />
<br />
{{TodoItem<br />
|Auto-delete large objects when referencing row is deleted<br />
|contrib/lo offers this functionality.}}<br />
<br />
{{TodoItem<br />
|Allow read/write into TOAST values like large objects<br />
|Writing might require the TOAST column to be stored EXTERNAL.<br />
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00049.php<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== MONEY Data Type ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Add locale-aware MONEY type, and support multiple currencies<br />
* [http://archives.postgresql.org/pgsql-general/2005-08/msg01432.php <nowiki>A real currency type</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01181.php <nowiki>Money type todos?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|MONEY dumps in a locale-specific format making it difficult to restore to a system with a different locale}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Text Search ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow dictionaries to change the token that is passed on to later dictionaries<br />
* [http://archives.postgresql.org/pgsql-patches/2007-11/msg00081.php <nowiki>a tsearch2 (8.2.4) dictionary that only filters out stopwords</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Exact phrase search, <br />
* [http://www.sai.msu.su/~megera/wiki/2009-08-12 <nowiki>Algebra for full-text queries</nowiki>]<br />
* [http://www.sai.msu.su/~megera/postgres/talks/2009.pdf <nowiki>Some recent advances in<br />
full-text search</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider a function-based API for '@@' searches<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00511.php <nowiki>Some recent advances in<br />
full-text search</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve text search error messages<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00966.php <nowiki>Poorly designed tsearch NOTICEs</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg01146.php <nowiki>Re: Poorly designed tsearch NOTICEs</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider changing error to warning for strings larger than one megabyte<br />
* [http://archives.postgresql.org/pgsql-bugs/2008-02/msg00190.php <nowiki>BUG #3975: tsearch2 index should not bomb out of 1Mb limit</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2008-03/msg00062.php <nowiki>Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|tsearch and tsdicts regression tests fail in Turkish locale on glibc<br />
* [http://archives.postgresql.org/message-id/49749645.5070801@gmx.net tsearch with Turkish locale]<br />
}}<br />
<br />
{{TodoItem<br />
|tsquery negator operator treated as part of lexeme<br />
* [http://archives.postgresql.org/pgsql-bugs/2009-06/msg00346.php BUG #4887: inclusion operator (@>) on tsqeries behaves not conforming to documentation]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve handling of dash and plus signs in email address user names, and perhaps improve URL parsing<br />
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg00772.php<br />
* [http://archives.postgresql.org/message-id/E1Ri8il-0008Ct-9p@wrigleys.postgresql.org tsearch does not recognize all valid emails]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve default parser, to more easily allow adding new tokens<br />
* http://archives.postgresql.org/message-id/23485.1297727826@sss.pgh.pa.us<br />
}}<br />
<br />
{{TodoItem<br />
|Add additional support functions<br />
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00319.php<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== XML ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow XML arrays to be cast to other data types<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00981.php <nowiki>proposal casting from XML[] to int[], numeric[], text[]</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00231.php <nowiki>Re: proposal casting from XML[] to int[], numeric[], text[]</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00471.php <nowiki>Re: proposal casting from XML[] to int[], numeric[], text[]</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add XML Schema validation and xmlvalidate functions (SQL:2008)}}<br />
<br />
{{TodoItem<br />
|Add xmlvalidatedtd variant to support validating against a DTD?}}<br />
<br />
{{TodoItem<br />
|Relax-NG validation; libxml2 supports this already}}<br />
<br />
{{TodoItem<br />
|Allow reliable XML operation non-UTF8 server encodings (xpath(), in particular, is known to not work)<br />
* [http://archives.postgresql.org/pgsql-bugs/2009-01/msg00135.php <nowiki>BUG #4622: xpath only work in utf-8 server encoding</nowiki>] <br />
* http://archives.postgresql.org/message-id/4110.1238973350@sss.pgh.pa.us}}<br />
<br />
{{TodoItem<br />
|Add functions from SQL:2006: XMLDOCUMENT, XMLCAST, XMLTEXT}}<br />
<br />
{{TodoItem<br />
|Add XMLNAMESPACES support in XMLELEMENT and elsewhere}}<br />
<br />
{{TodoItem<br />
|Move XSLT from contrib/xml2 to a more reasonable location<br />
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg00539.php<br />
}}<br />
<br />
{{TodoItem<br />
|Report errors returned by the XSLT library<br />
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg00562.php<br />
}}<br />
<br />
{{TodoItem<br />
|Improve the XSLT parameter passing API<br />
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg00416.php<br />
}}<br />
<br />
{{TodoItem<br />
|XML Canonical: Convert XML documents to canonical form to compare them. libxml2 has support for this.}}<br />
<br />
{{TodoItem<br />
|Add pretty-printed XML output option<br />
|Parse a document and serialize it back in some indented form. libxml2 might support this.}}<br />
<br />
{{TodoItem<br />
|Add XMLQUERY (from the SQL/XML standard)}}<br />
<br />
{{TodoItem<br />
|Allow XML shredding<br />
|In some cases shredding could be better option (if there is no need to keep XML docs entirely, e.g. if we have already developed tools that understand only relational data. This would be a separate module that implements annotated schema decomposition technique, similar to DB2 and SQL Server functionality.}}<br />
<br />
{{TodoItem<br />
|Fix Nested or repeated xpath() that apparently mess up namespaces [http://archives.postgresql.org/pgsql-bugs/2008-03/msg00097.php] [http://archives.postgresql.org/pgsql-bugs/2008-03/msg00144.php] [http://archives.postgresql.org/pgsql-general/2008-03/msg00295.php] [http://archives.postgresql.org/pgsql-bugs/2008-07/msg00054.php] [http://archives.postgresql.org/message-id/004f01c90e91$138e9d10$3aabd730$@anstett@iaas.uni-stuttgart.de]}}<br />
<br />
{{TodoItem<br />
|XPath: Adding the <x> at the root causes problems [http://archives.postgresql.org/pgsql-bugs/2008-05/msg00184.php] [http://archives.postgresql.org/pgsql-bugs/2008-07/msg00054.php] [http://archives.postgresql.org/pgsql-general/2008-07/msg00613.php]}}<br />
<br />
{{TodoItem<br />
|xpath_table needs to be implemented/implementable to get rid of contrib/xml2 [http://archives.postgresql.org/pgsql-general/2008-05/msg00823.php]}}<br />
<br />
{{TodoItem<br />
|xpath_table is pretty broken anyway [http://archives.postgresql.org/pgsql-hackers/2010-02/msg02424.php]}}<br />
<br />
{{TodoItem<br />
|better handling of XPath data types [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php] [http://archives.postgresql.org/message-id/004a01c90e90$4b986d90$e2c948b0$@anstett@iaas.uni-stuttgart.de]}}<br />
<br />
{{TodoItem<br />
|Improve handling of PIs and DTDs in xmlconcat() [http://archives.postgresql.org/message-id/200904211211.n3LCB09p008988@wwwmaster.postgresql.org]}}<br />
<br />
{{TodoItem<br />
|Restructure XML and /contrib/xml2 functionality<br />
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02314.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00017.php<br />
}}<br />
<br />
{{TodoItem<br />
|Verify Xpath escaping behavior<br />
* [http://www.postgresql.org/message-id/E1VOXZv-0008Q9-0Z@wrigleys.postgresql.org Xpath behaviour unintuitive / arguably wrong]<br />
* [http://www.postgresql.org/message-id/CAAY5AM1L83y79rtOZAUJioREO6n4%3DXAFKcGu6qO3hCZE1yJytg@mail.gmail.com xpath missing entity decoding - bug or feature]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Functions ==<br />
<br />
{{TodoItem<br />
|Enforce typmod for function inputs, function results and parameters for spi_prepare'd statements called from PLs<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-01/msg01403.php <nowiki>Re: BUG #2917: spi_prepare doesn't accept typename aliases</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-11/msg01160.php <nowiki>RFC for adding typmods to functions</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix IS OF so it matches the ISO specification, and add documentation<br />
* [http://archives.postgresql.org/pgsql-patches/2003-08/msg00060.php <nowiki>Re: [HACKERS] IS OF</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00060.php <nowiki>ToDo: add documentation for operator IS OF</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Implement Boyer-Moore searching in LIKE queries<br />
* {{messageLink|27645.1220635769@sss.pgh.pa.us|TODO item: Implement Boyer-Moore searching (First time hacker)}}<br />
}}<br />
<br />
{{TodoItem<br />
|Prevent malicious functions from being executed with the permissions of unsuspecting users<br />
|Index functions are safe, so VACUUM and ANALYZE are safe too. Triggers, CHECK and DEFAULT expressions, and rules are still vulnerable. <br />
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00268.php <nowiki>Some notes about the index-functions security vulnerability</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Reduce memory usage of aggregates in set returning functions<br />
* [http://archives.postgresql.org/pgsql-performance/2008-01/msg00031.php <nowiki>Re: Performance of aggregates over set-returning functions</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix /contrib/ltree operator<br />
* [http://archives.postgresql.org/pgsql-bugs/2007-11/msg00044.php <nowiki>BUG #3720: wrong results at using ltree</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix /contrib/btree_gist's implementation of inet indexing<br />
* [http://archives.postgresql.org/pgsql-bugs/2010-10/msg00099.php <nowiki>BUG #5705: btree_gist: Index on inet changes query result</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|<nowiki>Fix inconsistent precedence of =, &gt;, and &lt; compared to &lt;&gt;, &gt;=, and &lt;=</nowiki><br />
* [http://archives.postgresql.org/pgsql-bugs/2007-12/msg00145.php <nowiki>BUG #3822: Nonstandard precedence for comparison operators</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix regular expression bug when using complex back-references<br />
* [http://archives.postgresql.org/pgsql-bugs/2007-10/msg00000.php <nowiki>BUG #3645: regular expression back references seem broken</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Have /contrib/dblink reuse unnamed connections<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00895.php <nowiki>dblink un-named connection doesn't get re-used</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve formatting of pg_get_viewdef() output<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg01648.php <nowiki>pg_get_viewdef formattiing</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01885.php <nowiki>Re: pretty print viewdefs</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2011-12/msg00906.php reprise: pretty print viewdefs]<br />
}}<br />
<br />
{{TodoItem<br />
|Add function to dump pg_depend information cleanly<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00226.php <nowiki>Elementary dependency look-up</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add function to allow easier transaction id comparisons<br />
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg00786.php<br />
}}<br />
<br />
=== Character Formatting ===<br />
<br />
{{TodoSubsection}}<br />
{{TodoItem<br />
|Allow to_date() and to_timestamp() to accept localized month names}}<br />
<br />
{{TodoItem<br />
|Add missing parameter handling in to_char()<br />
* [http://archives.postgresql.org/pgsql-hackers/2005-12/msg00948.php <nowiki>Re: to_char and i18n</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Throw an error from to_char() instead of printing a string of "#" when a number doesn't fit in the desired output format.<br />
* discussed in [http://archives.postgresql.org/message-id/37ed240d0907290836w42187222n18664dfcbcb445b1@mail.gmail.com "to_char, support for EEEE format"]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow to_char() on interval values to accumulate the highest unit requested<br />
|2= Some special format flag would be required to request such accumulation. Such functionality could also be added to EXTRACT. Prevent accumulation that crosses the month/day boundary because of the uneven number of days in a month.<br />
* to_char(INTERVAL '1 hour 5 minutes', 'MI') =&gt; 65<br />
* to_char(INTERVAL '43 hours 20 minutes', 'MI' ) =&gt; 2600<br />
* to_char(INTERVAL '43 hours 20 minutes', 'WK:DD:HR:MI') =&gt; 0:1:19:20<br />
* to_char(INTERVAL '3 years 5 months','MM') =&gt; 41<br />
}}<br />
<br />
{{TodoItem<br />
|Fix to_number() handling for values not matching the format string<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg01447.php <nowiki>Re: numeric_to_number() function skipping some digits</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Multi-Language Support ==<br />
<br />
{{TodoItem<br />
|Add NCHAR (as distinguished from ordinary varchar)<br />
* [http://www.postgresql.org/message-id/A756FAD7EDC2E24F8CAB7E2F3B5375E918B12BC0@FALEX03.au.fjanz.com UTF8 national character data type support WIP patch and list of open issues.]<br />
}}<br />
<br />
{{TodoItem<br />
|Add a cares-about-collation column to pg_proc, so that unresolved-collation errors can be thrown at parse time<br />
* [http://archives.postgresql.org/pgsql-hackers/2011-03/msg01520.php <nowiki>Open issues for collations</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Integrate collations with text search configurations<br />
* [http://archives.postgresql.org/message-id/28887.1303579034@sss.pgh.pa.us <nowiki>Some TODO items for collations</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Integrate collations with to_char() and related functions<br />
* [http://archives.postgresql.org/message-id/28887.1303579034@sss.pgh.pa.us <nowiki>Some TODO items for collations</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Support collation-sensitive equality and hashing functions<br />
* [http://archives.postgresql.org/pgsql-hackers/2011-06/msg00472.php <nowiki> contrib/citext versus collations</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add a LOCALE option to CREATE DATABASE, as a shorthand<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00119.php <nowiki> Re: 8.4 open items list</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Support multiple simultaneous character sets, per SQL:2008}}<br />
<br />
{{TodoItem<br />
|Improve UTF8 combined character handling?}}<br />
<br />
{{TodoItem<br />
|Add octet_length_server() and octet_length_client()}}<br />
<br />
{{TodoItem<br />
|Make octet_length_client() the same as octet_length()?}}<br />
<br />
{{TodoItem<br />
|Fix problems with wrong runtime encoding conversion for NLS message files}}<br />
<br />
{{TodoItem<br />
|Add URL to more complete multi-byte regression tests<br />
* [http://archives.postgresql.org/pgsql-hackers/2005-07/msg00272.php <nowiki>Multi-byte and client side character encoding tests for copy command..</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix contrib/fuzzystrmatch to work with multibyte encodings<br />
* [http://archives.postgresql.org/pgsql-bugs/2009-04/msg00047.php <nowiki> soundex function returns UTF-16 characters</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00138.php <nowiki> dmetaphone woes</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Change memory allocation for multi-byte functions so memory is allocated inside conversion functions<br />
|Currently we preallocate memory based on worst-case usage.}}<br />
<br />
{{TodoItem<br />
|Add ability to use case-insensitive regular expressions on multi-byte characters<br />
|Currently it works for UTF-8, but not other multi-byte encodings<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00433.php <nowiki>Regexps vs. locale</nowiki>]<br />
* {{MessageLink|20091201210024.B1393753FB7@cvs.postgresql.org|A partial solution for UTF-8}}<br />
}}<br />
<br />
{{TodoItem<br />
|Improve encoding of connection startup messages sent to the client<br />
|Currently some authentication error messages are sent in the server encoding<br />
* [http://archives.postgresql.org/pgsql-general/2008-12/msg00801.php <nowiki>encoding of PostgreSQL messages</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-general/2009-01/msg00005.php <nowiki>Re: encoding of PostgreSQL messages</nowiki>]<br />
* [http://www.postgresql.org/message-id/20131220030725.GA1411150@tornado.leadboat.com multibyte messages are displayed incorrectly on the client]<br />
}}<br />
<br />
{{TodoItem<br />
|More sensible support for Unicode combining characters, normal forms<br />
* http://archives.postgresql.org/message-id/200904141532.44618.peter_e@gmx.net<br />
}}<br />
<br />
== Views and Rules ==<br />
<br />
{{TodoItem<br />
|Allow VIEW/RULE recompilation when the underlying tables change<br />
|This is both difficult and controversial.<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg01723.php Re: About "Allow VIEW/RULE recompilation when the underlying tables change"]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg01724.php Re: About "Allow VIEW/RULE recompilation when the underlying tables change2"]<br />
* [http://archives.postgresql.org/message-id/CACk%3DU9NFSzWrEba8G5dZ%3DTZLy3_hx3QXGyCcKVWT%3D4iA1FjMuA@mail.gmail.com VIEW still referring to old name of field]<br />
* [http://www.postgresql.org/message-id/87mwe4k46y.fsf@commandprompt.com Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?]<br />
}}<br />
{{TodoItem<br />
|Make it possible to use RETURNING together with conditional DO INSTEAD rules, such as for partitioning setups<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00577.php <nowiki>RETURNING and DO INSTEAD ... Intentional or not?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve ability to modify views via ALTER TABLE<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00691.php <nowiki>Re: idea: storing view source in system catalogs</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg01410.php <nowiki>modifying views</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-08/msg00300.php <nowiki>Re: patch: Add columns via CREATE OR REPLACE VIEW</nowiki>]<br />
}}<br />
<br />
== SQL Commands ==<br />
<br />
{{TodoItem<br />
|Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT<br />
* [http://dissipatedheat.com/2011/11/10/how-not-to-write-a-patch-for-postgresql/ How not to write this patch.]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve type determination of unknown (NULL or quoted literal) result columns for UNION/INTERSECT/EXCEPT<br />
* [http://archives.postgresql.org/message-id/9799.1302719551@sss.pgh.pa.us <nowiki>UNION construct type cast gives poor error message</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add ROLLUP, CUBE, GROUPING SETS options to GROUP BY<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00838.php <nowiki>WIP: grouping sets support</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00466.php <nowiki>Implementation of GROUPING SETS (T431: Extended grouping capabilities)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow prepared transactions with temporary tables created and dropped in the same transaction, and when an ON COMMIT DELETE ROWS temporary table is accessed<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00047.php <nowiki>Re: &quot;could not open relation 1663/16384/16584: No such file or directory&quot; in a specific combination of transactions with temp tables</nowiki>]<br />
* [http://archives.postgresql.org/message-id/492543D5.9050904@enterprisedb.com A suggestion on how to implement this]<br />
}}<br />
<br />
{{TodoItem<br />
|Add a GUC variable to warn about non-standard SQL usage in queries}}<br />
<br />
{{TodoItem<br />
|Add SQL-standard MERGE/REPLACE/UPSERT command<br />
|MERGE is typically used to merge two tables. REPLACE or UPSERT command does UPDATE, or on failure, INSERT. See [[SQL MERGE]] for notes on the implementation details.<br />
}}<br />
<br />
{{TodoItem<br />
|Add NOVICE output level for helpful messages<br />
|For example, have it warn about unjoined tables. This could also control automatic sequence/index creation messages.<br />
}}<br />
<br />
{{TodoItem<br />
|Allow NOTIFY in rules involving conditionals}}<br />
<br />
{{TodoItem<br />
|Allow LISTEN on patterns<br />
* http://www.postgresql.org/message-id/52693FC5.7070507@gmail.com<br />
}}<br />
<br />
{{TodoItem<br />
|Allow EXPLAIN to identify tables that were skipped because of constraint_exclusion<br />
}}<br />
<br />
{{TodoItem<br />
|Simplify dropping roles that have objects in several databases}}<br />
<br />
{{TodoItem<br />
|Allow the count returned by SELECT, etc to be represented as an int64 to allow a higher range of values}}<br />
<br />
{{TodoItem<br />
|Add support for WITH RECURSIVE ... CYCLE<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00291.php <nowiki>WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows</nowiki>]}}<br />
<br />
{{TodoItem<br />
|Add DEFAULT .. AS OWNER so permission checks are done as the table owner<br />
|This would be useful for SERIAL nextval() calls and CHECK constraints.}}<br />
<br />
{{TodoItem<br />
|Allow DISTINCT to work in multiple-argument aggregate calls}}<br />
<br />
{{TodoItem<br />
|Add comments on system tables/columns using the information in catalogs.sgml<br />
|Ideally the information would be pulled from the SGML file automatically.}}<br />
<br />
{{TodoItem<br />
|Prevent the specification of conflicting transaction read/write options<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00684.php <nowiki>Re: SET TRANSACTION and SQL Standard</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow DELETE and UPDATE to be used with LIMIT and ORDER BY<br />
* http://archives.postgresql.org/pgadmin-hackers/2010-04/msg00078.php<br />
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg01997.php<br />
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00021.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow PREPARE of cursors}}<br />
<br />
{{TodoItem<br />
|Have DISCARD PLANS discard plans cached by functions<br />
|DISCARD all should do the same.<br />
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00431.php<br />
}}<br />
<br />
{{TodoItem<br />
|Avoid multiple-evaluation of BETWEEN and IN arguments containing volatile expressions<br />
* http://archives.postgresql.org/message-id/4D95B605.2020709@enterprisedb.com<br />
}}<br />
<br />
{{TodoItem<br />
|Fix nested CASE-WHEN constructs<br />
* http://archives.postgresql.org/message-id/4DDCEEB8.50602@enterprisedb.com<br />
}}<br />
<br />
{{TodoItem<br />
|IS NULL testing of nested ROW() values is inconsistent<br />
* http://www.postgresql.org/message-id/50B3D11F.20408@2ndQuadrant.com<br />
}}<br />
<br />
=== CREATE ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow CREATE TABLE AS to determine column lengths for complex expressions like SELECT col1 || col2}}<br />
<br />
{{TodoItem<br />
|Have WITH CONSTRAINTS also create constraint indexes<br />
* [http://archives.postgresql.org/pgsql-patches/2007-04/msg00149.php <nowiki>Re: CREATE TABLE LIKE INCLUDING INDEXES support</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Move NOT NULL constraint information to pg_constraint<br />
|Currently NOT NULL constraints are stored in pg_attribute without any designation of their origins, e.g. primary keys. One manifest problem is that dropping a PRIMARY KEY constraint does not remove the NOT NULL constraint designation. Another issue is that we should probably force NOT NULL to be propagated from parent tables to children, just as CHECK constraints are. (But then does dropping PRIMARY KEY affect children?)<br />
* http://archives.postgresql.org/message-id/19768.1238680878@sss.pgh.pa.us<br />
* http://archives.postgresql.org/message-id/200909181005.n8IA5Ris061239@wwwmaster.postgresql.org<br />
* http://archives.postgresql.org/pgsql-hackers/2011-07/msg01223.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-07/msg00358.php<br />
}}<br />
<br />
{{TodoItem<br />
|Prevent concurrent CREATE TABLE from sometimes returning a cryptic error message<br />
* [http://archives.postgresql.org/pgsql-bugs/2007-10/msg00169.php <nowiki>BUG #3692: Conflicting create table statements throw unexpected error</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add CREATE SCHEMA ... LIKE that copies a schema}}<br />
<br />
{{TodoItem<br />
|Fix CREATE OR REPLACE FUNCTION to not leave objects depending on the function in inconsistent state<br />
* [http://archives.postgresql.org/pgsql-general/2008-08/msg00985.php indexes on functions and create or replace function]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow temporary tables to exist as empty by default in all sessions<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00006.php <nowiki>what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg01329.php <nowiki>idea: global temp tables</nowiki>]<br />
* [http://archives.postgresql.org//pgsql-hackers/2009-05/msg00016.php <nowiki>Re: idea: global temp tables</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg01098.php <nowiki>global temporary tables</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2012-04/msg01148.php Temporary tables under hot standby]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow the creation of "distinct" types<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01647.php <nowiki>Distinct types</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider analyzing temporary tables when they are first used in a query<br />
|Autovacuum cannot analyze or vacuum temporary tables.<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00416.php <nowiki>autovacuum and temp tables support</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow an unlogged table to be changed to logged<br />
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00437.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00323.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00237.php<br />
* [http://www.postgresql.org/message-id/CAFcNs+peg3VPG2%3Dv6Lu3vfCDP8mt7cs6-RMMXxjxWNLREgSRVQ@mail.gmail.com make an unlogged table logged]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== UPDATE ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|<nowiki>Allow UPDATE tab SET ROW (col, ...) = (SELECT...)</nowiki><br />
* [http://archives.postgresql.org/pgsql-hackers/2006-07/msg01308.php <nowiki>Re: [PATCHES] extension for sql update</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00865.php <nowiki>UPDATE using sub selects</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2007-04/msg00315.php <nowiki>UPDATE using sub selects</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2008-03/msg00237.php <nowiki>Re: UPDATE using sub selects</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Research self-referential UPDATEs that see inconsistent row versions in read-committed mode<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00507.php <nowiki>Concurrently updating an updatable view</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg00016.php <nowiki>Re: Do we need a TODO? (was Re: Concurrently updating anupdatable view)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve performance of EvalPlanQual mechanism that rechecks already-updated rows<br />
|This is related to the previous item, which questions whether it even has the right semantics<br />
* [http://archives.postgresql.org/pgsql-bugs/2008-09/msg00045.php <nowiki>BUG #4401: concurrent updates to a table blocks one update indefinitely</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-bugs/2009-07/msg00302.php <nowiki>BUG #4945: Parallel update(s) gone wild</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== ALTER ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Have ALTER TABLE RENAME of a SERIAL column rename the sequence<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php <nowiki>Re: newbie: renaming sequences task</nowiki>]<br />
* [http://archives.postgresql.org/message-id/CADLWmXUV4LbLhMZL8rYMhCy72aZZLB5BSARPQVgoX0BrxA0FFg@mail.gmail.com renaming implicit sequences]<br />
}}<br />
<br />
{{TodoItem<br />
|Have ALTER SEQUENCE RENAME rename the sequence name stored in the sequence table<br />
* [http://archives.postgresql.org/pgsql-bugs/2007-09/msg00092.php <nowiki>BUG #3619: Renaming sequence does not update its 'sequence_name' field</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-bugs/2007-10/msg00007.php <nowiki>Re: BUG #3619: Renaming sequence does not update its 'sequence_name' field</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php <nowiki>Re: newbie: renaming sequences task</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add ALTER DOMAIN to modify the underlying data type}}<br />
<br />
{{TodoItem<br />
|Allow ALTER TABLESPACE to move the tablespace to different directories}}<br />
<br />
{{TodoItem<br />
|Allow moving system tables to other tablespaces, where possible<br />
|Currently non-global system tables must be in the default database tablespace. Global system tables can never be moved.}}<br />
<br />
{{TodoItem<br />
|Have ALTER INDEX update the name of a constraint using that index}}<br />
<br />
{{TodoItem<br />
|Allow column display reordering by recording a display, storage, and permanent id for every column?<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00782.php <nowiki>Re: column ordering, was Re: [PATCHES] Enums patch v2</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg01029.php <nowiki>Column reordering in pg_dump</nowiki>]<br />
* http://archives.postgresql.org/message-id/1324412114-sup-9608@alvh.no-ip.org<br />
* [http://www.postgresql.org/message-id/CAApHDvqhnuznxd4xVMFDcGn+nHVYyUtJ-TvbRsOuR%3DPaVbbGqw@mail.gmail.com logical column order and physical column order]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow deactivating (and reactivating) indexes via ALTER TABLE<br />
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg01191.php<br />
}}<br />
<br />
{{TodoItem<br />
|Add ALTER OPERATOR ... RENAME<br />
|needs to consider effects of changing operator precedence<br />
* [http://archives.postgresql.org/message-id/1322948781.26266.9.camel@vanquo.pezone.net Missing rename support]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== CLUSTER ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Automatically maintain clustering on a table<br />
|This might require some background daemon to maintain clustering during periods of low usage. It might also require tables to be only partially filled for easier reorganization. Another idea would be to create a merged heap/index data file so an index lookup would automatically access the heap data too. A third idea would be to store heap rows in hashed groups, perhaps using a user-supplied hash function.<br />
* [http://archives.postgresql.org/pgsql-performance/2004-08/msg00350.php <nowiki>Equivalent praxis to CLUSTERED INDEX?</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00155.php <nowiki>Re: Grouped Index Tuples</nowiki>]<br />
* http://community.enterprisedb.com/git/<br />
* [http://archives.postgresql.org/pgsql-performance/2009-10/msg00346.php <nowiki>Re: maintain_cluster_order_v5.patch</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
| Allow CLUSTER to be used on a partial index<br />
* http://www.postgresql.org/message-id/CAMkU%3D1zYwoHHsqJ8wfK3GdG_t_a6t4RK-GFDSKymQ0EGP%3DtypA@mail.gmail.com<br />
}} <br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== COPY ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow COPY to report error lines and continue<br />
|This requires the use of a savepoint before each COPY line is processed, with ROLLBACK on COPY failure. <br />
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00572.php <nowiki>Re: VLDB Features</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow COPY FROM to create index entries in bulk<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00811.php <nowiki>Batch update of indexes on data loading</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve COPY performance<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00954.php <nowiki>Re: 8.3 / 8.2.6 restore comparison</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01882.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow COPY to report errors sooner<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01169.php <nowiki>Timely reporting of COPY errors</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow COPY to handle other number formats<br />
|E.g. the German notation. Best would be something like WITH DECIMAL ','.<br />
}}<br />
<br />
{{TodoItem<br />
|Allow a stalled COPY to exit if the backend is terminated<br />
* [http://archives.postgresql.org/pgsql-bugs/2009-04/msg00067.php <nowiki>Re: possible bug not in open items</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow COPY "text" format to output a header<br />
* http://www.postgresql.org/message-id/CACfv+pJ31tesLvncJyP24quo8AE+M0GP6p6MEpwPv6yV8%3DsVHQ@mail.gmail.com<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== GRANT/REVOKE ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow SERIAL sequences to inherit permissions from the base table?}}<br />
<br />
{{TodoItem<br />
|Allow dropping of a role that has connection rights<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00736.php <nowiki>DROP ROLE dependency tracking ...</nowiki>]<br />
}}<br />
{{TodoEndSubsection}}<br />
<br />
=== DECLARE CURSOR ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Prevent DROP TABLE from dropping a table referenced by its own open cursor?}}<br />
<br />
{{TodoItem<br />
|Provide some guarantees about the behavior of cursors that invoke volatile functions<br />
* [http://archives.postgresql.org/message-id/20997.1244563664@sss.pgh.pa.us Re: Cursor with hold emits the same row more than once across commits in 8.3.7]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== INSERT ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow INSERT/UPDATE of the system-generated oid value for a row}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== SHOW/SET ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM ANALYZE, and CLUSTER}}<br />
<br />
{{TodoItem<br />
|Rationalize the discrepancy between settings that use values in bytes and SHOW that returns the object count<br />
* [http://archives.postgresql.org/pgsql-docs/2008-07/msg00007.php <nowiki>Re: [ADMIN] shared_buffers and shmmax</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== ANALYZE ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified percentage}}<br />
<br />
{{TodoItem<br />
|Have EXPLAIN ANALYZE report rows as floating-point numbers<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg01363.php <nowiki>explain analyze rows=%.0f</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg00108.php <nowiki>Re: explain analyze rows=%.0f</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve how ANALYZE computes in-doubt tuples<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00771.php <nowiki>VACUUM/ANALYZE counting of in-doubt tuples</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Window Functions ===<br />
See {{messageLink|357.1230492361@sss.pgh.pa.us|TODO items for window functions}}.<br />
{{TodoSubsection}}<br />
{{TodoItem<br />
|Support creation of user-defined window functions<br />
|We have the ability to create new window functions written in C. Is it<br />
worth the effort to create an API that would let them be written in PL/pgsql, etc?}}<br />
<br />
{{TodoItem<br />
|Implement full support for window framing clauses<br />
|In addition to done clauses described in the [http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS latest doc], these clauses are not implemented yet.<br />
* RANGE BETWEEN ... PRECEDING/FOLLOWING<br />
* EXCLUDE<br />
}}<br />
<br />
{{TodoItem<br />
|Investigate tuplestore performance issues<br />
|The tuplestore_in_memory() thing is just a band-aid, we ought to try to solve it properly. tuplestore_advance seems like a weak spot as well.<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00152.php <nowiki>tuplestore potential performance problem</nowiki>]<br />
}}<br />
<br />
{{TodoItem|Do we really need so much duplicated code between Agg and WindowAgg?}}<br />
<br />
{{TodoItem<br />
|Teach planner to evaluate multiple windows in the optimal order<br />
|Currently windows are always evaluated in the query-specified order.<br />
* http://archives.postgresql.org/message-id/3CDAD71E9D70417290FCF66F0178D1E1@amd64<br />
}}<br />
<br />
{{TodoItem<br />
|Implement DISTINCT clause in window aggregates<br />
|Some proprietary RDBMSs have implemented it already, so it helps with porting from those.}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Integrity Constraints ==<br />
=== Keys ===<br />
<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Improve deferrable unique constraints for cases with many conflicts<br />
|The current implementation fires a trigger for each potentially conflicting row. This might not scale well for an update that changes many key values at once.<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Referential Integrity ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Add MATCH PARTIAL referential integrity}}<br />
<br />
{{TodoItem<br />
|Change foreign key constraint for array -&gt; element to mean element in array?<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-10/msg01814.php <nowiki>foreign keys for array/period contains relationships</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix problem when cascading referential triggers make changes on cascaded tables, seeing the tables in an intermediate state<br />
* [http://archives.postgresql.org/pgsql-hackers/2005-09/msg00174.php <nowiki>Re: [PATCHES] Work-in-progress referential action trigger timing</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Are ri_KeysEqual checks in the RI enforcement triggers still necessary?<br />
* [http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php <nowiki>Re: Effects of cascading references in foreign keys</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Check Constraints ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Run check constraints only when affected columns are changed<br />
* http://archives.postgresql.org/message-id/1326055327.15293.13.camel@vanquo.pezone.net<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Server-Side Languages ==<br />
<br />
{{TodoItem<br />
|Add support for polymorphic arguments and return types to languages other than PL/PgSQL}}<br />
<br />
{{TodoItem<br />
|Add support for OUT and INOUT parameters to languages other than PL/PgSQL}}<br />
<br />
{{TodoItem<br />
|Add more fine-grained specification of functions taking arbitrary data types<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00367.php <nowiki>RfD: more powerful &quot;any&quot; types</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Implement stored procedures<br />
|This might involve the control of transaction state and the return of multiple result sets<br />
* [http://archives.postgresql.org/pgsql-general/2008-10/msg00454.php <nowiki>PL/pgSQL stored procedure returning multiple result sets (SELECTs)?</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg01375.php <nowiki>Proposal: real procedures again (8.4)</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg00542.php<br />
* [http://archives.postgresql.org/pgsql-hackers/2011-04/msg01149.php <nowiki>Gathering specs and discussion on feature (post 9.1)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow holdable cursors in SPI}}<br />
<br />
=== SQL-Language Functions ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Rethink query plan caching and timing of parse analysis within SQL-language functions<br />
|They should work more like plpgsql functions do ...<br />
* [http://archives.postgresql.org/pgsql-bugs/2011-05/msg00078.php <nowiki>Re: BUG #6019: invalid cached plan on inherited table</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== PL/pgSQL ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]}}<br />
<br />
{{TodoItem<br />
|<nowiki>Allow listing of record column names, and access to record columns via variables, e.g. columns := r.(*), tval2 := r.(colname)</nowiki><br />
* [http://archives.postgresql.org/pgsql-patches/2005-07/msg00458.php <nowiki>Re: PL/PGSQL: Dynamic Record Introspection</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2006-05/msg00302.php <nowiki>Re: PL/PGSQL: Dynamic Record Introspection</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2006-06/msg00031.php <nowiki>Re: PL/PGSQL: Dynamic Record Introspection</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow row and record variables to be set to NULL constants, and allow NULL tests on such variables<br />
|Because a row is not scalar, do not allow assignment from NULL-valued scalars.<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-10/msg00070.php <nowiki>NULL and plpgsql rows</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider keeping separate cached copies when search_path changes<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg01009.php <nowiki>pl/pgsql Plan Invalidation and search_path</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve handling of NULL row values vs. NULL rows<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-09/msg01758.php <nowiki>Null row vs. row of nulls in plpgsql</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg01973.php<br />
}}<br />
<br />
{{TodoItem<br />
|Improve PERFORM handling of WITH queries or document limitation<br />
* http://archives.postgresql.org/pgsql-bugs/2011-03/msg00309.php<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== PL/Perl ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow regex operations in plperl using UTF8 characters in non-UTF8 encoded databases}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== PL/Python ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Develop a trusted variant of PL/Python.}}<br />
<br />
{{TodoItem<br />
|Create a new restricted execution class that will allow passing function arguments in as locals. Passing them as globals means functions cannot be called recursively.<br />
* [http://archives.postgresql.org/pgsql-hackers/2011-02/msg01468.php <nowiki>Re: pl/python do not delete function arguments</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add a DB-API compliant interface on top of the SPI interface<br />
* http://petereisentraut.blogspot.com/2011/11/plpydbapi-db-api-for-plpython.html<br />
}}<br />
<br />
{{TodoItem<br />
|For functions returning a setof record with a composite type, cache the I/O functions for the composite type<br />
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg02007.php<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== PL/Tcl ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Add table function support}}<br />
<br />
{{TodoItem<br />
|Check encoding validity of values passed back to Postgres in function returns, trigger tuple changes, and SPI calls.}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Clients ==<br />
<br />
{{TodoItem<br />
|Add a function like pg_get_indexdef() that report more detailed index information<br />
* [http://archives.postgresql.org/pgsql-bugs/2007-12/msg00166.php <nowiki>BUG #3829: Wrong index reporting from pgAdmin III (v1.8.0 rev 6766-6767)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Split out pg_resetxlog output into pre- and post-sections<br />
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg02040.php<br />
}}<br />
<br />
=== pg_ctl ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Improve pg_ctl's detection of running postmasters<br />
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00000.php<br />
* http://archives.postgresql.org/pgsql-committers/2011-06/msg00001.php<br />
}}<br />
<br />
{{TodoItem<br />
|Add additional shutdown modes, and change the default?<br />
* http://archives.postgresql.org/pgsql-hackers/2012-04/msg01283.php<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== psql ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Have psql \ds show all sequences and their settings<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00916.php <nowiki>Re: TODO item: Have psql show current values for a sequence</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00401.php <nowiki>Quick patch: Display sequence owner</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Move psql backslash database information into the backend, use mnemonic commands?<br />
|This would allow non-psql clients to pull the same information out of the database as psql. <br />
* [http://archives.postgresql.org/pgsql-hackers/2004-01/msg00191.php <nowiki>Re: psql \d option list overloaded</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Make psql's \d commands more consistent in their handling of schemas<br />
* [http://archives.postgresql.org/pgsql-hackers/2004-11/msg00014.php <nowiki>Re: psql and schemas</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Make psql's \d commands distinguish default privileges from no privileges<br />
|ACL displays were visibly different for the two cases before we "improved" them by using array_to_string.<br />
* [http://archives.postgresql.org/pgsql-bugs/2011-05/msg00082.php <nowiki>BUG #6021: There is no difference between default and empty access privileges with \dp</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consistently display privilege information for all objects in psql}}<br />
<br />
{{TodoItemEasy<br />
|\s without arguments (display history) fails with libedit, doesn't use pager either<br />
* [http://archives.postgresql.org/pgsql-bugs/2011-06/msg00114.php <nowiki> psql \s not working - OS X</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add a \set variable to control whether \s displays line numbers<br />
|Another option is to add \# which lists line numbers, and allows command execution.<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00255.php <nowiki>Re: psql possible TODO</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Include the symbolic SQLSTATE name in verbose error reports<br />
* [http://archives.postgresql.org/pgsql-general/2007-09/msg00438.php <nowiki>Re: Checking is TSearch2 query is valid</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add prompt escape to display the client and server versions<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00310.php <nowiki>WIP patch for TODO Item: Add prompt escape to display the client and server versions</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add option to wrap column values at whitespace boundaries, rather than chopping them at a fixed width.<br />
|Currently, &quot;wrapped&quot; format chops values into fixed widths. Perhaps the word wrapping could use the same algorithm documented in the W3C specification. <br />
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00404.php <nowiki>Re: psql wrapped format default for backslash-d commands</nowiki>]<br />
* http://www.w3.org/TR/CSS21/tables.html#auto-table-layout}}<br />
<br />
{{TodoItem<br />
|Support the ReST table output format<br />
|Details about the ReST format: http://docutils.sourceforge.net/rst.html#reference-documentation<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-08/msg01007.php <nowiki>Proposal: new border setting in psql</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00518.php <nowiki>Re: Proposal: new border setting in psql</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00609.php <nowiki>Re: Proposal: new border setting in psql</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add option to print advice for people familiar with other databases<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-01/msg01845.php <nowiki>MySQL-ism help patch for psql</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add ability to edit views with \ev<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00023.php <nowiki>Adding \ev view editor?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix FETCH_COUNT to handle SELECT ... INTO and WITH queries<br />
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01565.php<br />
* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00192.php<br />
}}<br />
<br />
{{TodoItem<br />
|Prevent psql from sending remaining single-line multi-statement queries after reconnecting<br />
* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php<br />
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01283.php<br />
}}<br />
<br />
{{TodoItem<br />
|Consider having psql -c read .psqlrc, for consistency<br />
|psql -f already reads .psqlrc<br />
}}<br />
<br />
{{TodoItem<br />
|Allow processing of multiple -f (file) options<br />
* http://www.postgresql.org/message-id/AANLkTikFpzrTRl6392GhatQdwlCWQTXFdSMxh5CP51iv@mail.gmail.com<br />
}}<br />
<br />
{{TodoItem<br />
|Improve line drawing characters<br />
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00386.php<br />
}}<br />
<br />
{{TodoItem<br />
|Consider improving the continuation prompt<br />
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg01772.php<br />
}}<br />
<br />
{{TodoItem<br />
|Improve speed of tab completion by using LIKE<br />
* http://www.postgresql.org/message-id/20120821174847.GL1267@tamriel.snowman.net<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== pg_dump / pg_restore ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItemEasy<br />
|<nowiki>Add full object name to the tag field. eg. for operators we need '=(integer, integer)', instead of just '='.</nowiki>}}<br />
<br />
{{TodoItemEasy<br />
|Modify pg_dump to create skeleton views for reload (which are then updated via CREATE OR REPLACE VIEW) when views have circular dependencies. This should eliminate the need for the CREATE RULE "_RETURN" hack currently used to address this issue. Thread and additional information here:<br />
* [http://www.postgresql.org/message-id/25554.1360895028@sss.pgh.pa.us Description of change]<br />
|}}<br />
<br />
{{TodoItem<br />
|Add pg_dumpall custom format dumps?<br />
* [http://archives.postgresql.org/pgsql-general/2010-05/msg00509.php pg_dumpall custom format]<br />
|}}<br />
<br />
{{TodoItem<br />
|Avoid using platform-dependent locale names in pg_dumpall output<br />
|Using native locale names puts roadblocks in the way of porting a dump to another platform. One possible solution is to get<br />
CREATE DATABASE to accept some agreed-on set of locale names and fix them up to meet the platform's requirements.<br />
* http://archives.postgresql.org/message-id/21396.1241716688@sss.pgh.pa.us<br />
}}<br />
<br />
{{TodoItem<br />
|In a selective dump, allow dumping of an object and all its dependencies}}<br />
<br />
{{TodoItem<br />
|Add options like pg_restore -l and -L to pg_dump}}<br />
<br />
{{TodoItem<br />
|Stop dumping CASCADE on DROP TYPE commands in clean mode}}<br />
<br />
{{TodoItem<br />
|Allow pg_restore to load different parts of the COPY data for a single table simultaneously}}<br />
<br />
{{TodoItem<br />
|Remove support for dumping from pre-7.3 servers<br />
|In 7.3 and later, we can get accurate dependency information from the server. pg_dump still contains a lot of crufty code<br />
to try to deal with the lack of dependency info in older servers, but the usefulness of maintaining that code grows small.}}<br />
<br />
{{TodoItem<br />
|Refactor handling of database attributes between pg_dump and pg_dumpall<br />
|Currently only pg_dumpall emits database attributes, such as ALTER DATABASE SET commands and database-level GRANTs.<br />
Many people wish that pg_dump would do that. One proposal is to let pg_dump issue such commands if the -C switch was used,<br />
but it's unclear whether that will satisfy the demand.<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg01031.php <nowiki>ALTER DATABASE vs pg_dump</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-bugs/2010-05/msg00010.php summary of the issues]<br />
}}<br />
<br />
{{TodoItem<br />
|Change pg_dump so that a comment on the dumped database is applied to the loaded database, even if the database has a different name.<br />
|This will require new backend syntax, perhaps COMMENT ON CURRENT DATABASE. This is related to the previous item.}}<br />
<br />
{{TodoItem<br />
|Allow parallel restore of tar dumps<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-02/msg01154.php <nowiki>Re: parallel restore</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Preserve sparse storage of large objects over dump/restore<br />
* [http://archives.postgresql.org/message-id/18789.1349750451@sss.pgh.pa.us <nowiki>TODO item: teach pg_dump about sparsely-stored large objects</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Prevent PL/pgSQL comment from throwing an error in a non-superuser restore<br />
* [http://www.postgresql.org/message-id/E1VuYH7-0008Rz-SV@wrigleys.postgresql.org Reloading dump fails at COMMENT ON EXTENSION plpgsql]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== ecpg ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Docs<br />
|Document differences between ecpg and the SQL standard and information about the Informix-compatibility module.}}<br />
<br />
{{TodoItem<br />
|Solve cardinality &gt; 1 for input descriptors / variables?}}<br />
<br />
{{TodoItem<br />
|Add a semantic check level, e.g. check if a table really exists}}<br />
<br />
{{TodoItem<br />
|fix handling of DB attributes that are arrays}}<br />
<br />
{{TodoItem<br />
|Fix nested C comments}}<br />
<br />
{{TodoItemEasy<br />
|sqlwarn[6] should be 'W' if the PRECISION or SCALE value specified}}<br />
<br />
{{TodoItem<br />
|Make SET CONNECTION thread-aware, non-standard?}}<br />
<br />
{{TodoItem<br />
|Allow multidimensional arrays}}<br />
<br />
{{TodoItem<br />
|Implement COPY FROM STDIN}} <br />
<br />
{{TodoItem<br />
|Provide a way to specify size of a bytea parameter<br />
* [http://archives.postgresql.org/message-id/200906192131.n5JLVoMo044178@wwwmaster.postgresql.org <nowiki>BUG #4866: ECPG and BYTEA</nowiki>]<br />
}}<br />
<br />
{{TodoItemEasy<br />
|Fix small memory leaks in ecpg<br />
|Memory leaks in a short running application like ecpg are not really a problem, but make debugging more complicated}} <br />
<br />
{{TodoItem<br />
|Allow reuse of cursor name variables<br />
* [http://archives.postgresql.org/message-id/20100329113435.GA3430@feivel.credativ.lan <nowiki>Problems with variable cursorname in ecpg</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== libpq ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Prevent PQfnumber() from lowercasing unquoted column names<br />
|PQfnumber() should never have been doing lowercasing, but historically it has so we need a way to prevent it}}<br />
<br />
{{TodoItem<br />
|Consider disallowing multiple queries in PQexec() as an additional barrier to SQL injection attacks<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-01/msg00184.php <nowiki>Re: InitPostgres and flatfiles question</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add PQexecf() that allows complex parameter substitution<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01803.php <nowiki>Last minute mini-proposal (I know, know) for PQexecf()</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add SQLSTATE and severity to errors generated within libpq itself<br />
* [http://archives.postgresql.org/pgsql-interfaces/2007-11/msg00015.php <nowiki>v8.1: Error severity on libpq PGconn*</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01425.php<br />
}}<br />
<br />
{{TodoItem<br />
|Add support for interface/ipaddress binding to libpq<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01811.php <nowiki>SR/libpq - outbound interface/ipaddress binding</nowiki>]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== HTTP===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow access to the database via HTTP<br />
|See [[HTTP_API]]}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Triggers ==<br />
<br />
{{TodoItem<br />
|Improve storage of deferred trigger queue<br />
|Right now all deferred trigger information is stored in backend memory. This could exhaust memory for very large trigger queues. This item involves dumping large queues into files, or doing some kind of join to process all the triggers, some bulk operation, or a bitmap. <br />
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00876.php <nowiki>Re: BUG #4204: COPY to table with FK has memory leak</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-10/msg00464.php <nowiki>Scaling up deferred unique checks and the after trigger queue</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2011-08/msg00023.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow triggers to be disabled in only the current session.<br />
|This is currently possible by starting a multi-statement transaction, modifying the system tables, performing the desired SQL, restoring the system tables, and committing the transaction. ALTER TABLE ... TRIGGER requires a table lock so it is not ideal for this usage.}}<br />
<br />
{{TodoItem<br />
|With disabled triggers, allow pg_dump to use ALTER TABLE ADD FOREIGN KEY<br />
|If the dump is known to be valid, allow foreign keys to be added without revalidating the data.}}<br />
<br />
{{TodoItem<br />
|Allow statement-level triggers to access modified rows}}<br />
<br />
{{TodoItem<br />
|When statement-level triggers are defined on a parent table, have them fire only on the parent table, and fire child table triggers only where appropriate<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg01883.php <nowiki>Statement-level triggers and inheritance</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Tighten trigger permission checks<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00564.php <nowiki>Security leak with trigger functions?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow BEFORE INSERT triggers on views<br />
* [http://archives.postgresql.org/pgsql-general/2007-02/msg01466.php <nowiki>Re: Why can't I put a BEFORE EACH ROW trigger on a view?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add database and transaction-level triggers<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00451.php <nowiki>Proposal for db level triggers</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00620.php <nowiki>triggers on prepare, commit, rollback... ?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Reduce locking requirements for creating a trigger<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00635.php <nowiki>Re: Change lock requirements for adding a trigger</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Avoid requirement for "AFTER" trigger functions to return a value<br />
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02384.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow creation of inline triggers<br />
* http://archives.postgresql.org/pgsql-hackers/2012-02/msg00708.php<br />
}}<br />
<br />
== Inheritance ==<br />
<br />
{{TodoItem<br />
|Allow inherited tables to inherit indexes, UNIQUE constraints, and primary/foreign keys<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-05/msg00285.php <nowiki>Partitioning/inherited tables vs FKs</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00039.php<br />
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00305.php<br />
}}<br />
<br />
{{TodoItem<br />
|Honor UNIQUE INDEX on base column in INSERTs/UPDATEs on inherited table, e.g. INSERT INTO inherit_table (unique_index_col) VALUES (dup) should fail<br />
|The main difficulty with this item is the problem of creating an index that can span multiple tables.}}<br />
<br />
{{TodoItem<br />
|Determine whether ALTER TABLE / SET SCHEMA should work on inheritance hierarchies (and thus support ONLY). If yes, implement it.}}<br />
<br />
{{TodoItem<br />
|ALTER TABLE variants sometimes support recursion and sometimes not, but this is poorly/not documented, and the ONLY marker would then be silently ignored. Clarify the documentation, and reject ONLY if it is not supported.}}<br />
<br />
== Indexes ==<br />
<br />
{{TodoItem<br />
|Prevent index uniqueness checks when UPDATE does not modify the column<br />
|Uniqueness (index) checks are done when updating a column even if the column is not modified by the UPDATE.<br />
However, HOT already short-circuits this in common cases, so more work might not be helpful.<br />
* http://www.postgresql.org/message-id/CA+TgmoZOyaTanfEvNUdiHBCuu9Zh0JVP1e_UTVbx6Rvj9vFC9Q@mail.gmail.com<br />
}}<br />
<br />
{{TodoItem<br />
|Allow the creation of on-disk bitmap indexes which can be quickly combined with other bitmap indexes<br />
|Such indexes could be more compact if there are only a few distinct values. Such indexes can also be compressed. Keeping such indexes updated can be costly.<br />
* [http://archives.postgresql.org/pgsql-patches/2005-07/msg00512.php <nowiki>Re: Bitmap index AM</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg01107.php <nowiki>Bitmap index thoughts</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00265.php <nowiki>Stream bitmaps</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01214.php <nowiki>Re: Bitmapscan changes - Requesting further feedback</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00013.php <nowiki>Updated bitmap index patch</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00741.php <nowiki>Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01023.php <nowiki>Bitmap Indexes: request for feedback</nowiki>]<br />
* http://archives.postgresql.org/message-id/800923.27831.qm@web29010.mail.ird.yahoo.com <br />
}}<br />
<br />
{{TodoItem<br />
|Allow accurate statistics to be collected on indexes with more than one column or expression indexes, perhaps using per-index statistics<br />
* [http://archives.postgresql.org/pgsql-performance/2006-10/msg00222.php <nowiki>Re: Simple join optimized badly?</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01131.php <nowiki>Stats for multi-column indexes</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00741.php <nowiki>Cross-column statistics revisited</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg01431.php <nowiki>Multi-Dimensional Histograms</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00913.php<br />
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg02179.php <br />
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00459.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02054.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg01731.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00894.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-09/msg00679.php<br />
}}<br />
<br />
{{TodoItem<br />
|Consider having a larger statistics target for indexed columns and expression indexes. <br />
}}<br />
<br />
{{TodoItem<br />
|Consider smaller indexes that record a range of values per heap page, rather than having one index entry for every heap row<br />
|This is useful if the heap is clustered by the indexed values. <br />
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00341.php <nowiki>Grouped Index Tuples</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg01264.php <nowiki>Grouped Index Tuples</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00465.php <nowiki>Grouped Index Tuples / Clustered Indexes</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php <nowiki>Bitmapscan changes</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00014.php <nowiki>Re: GIT patch</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00487.php <nowiki>Re: Index Tuple Compression Approach?</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01589.php <nowiki>Re: Index AM change proposals, redux</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY<br />
|This is difficult because you must upgrade to an exclusive table lock to replace the existing index file. CREATE INDEX CONCURRENTLY does not have this complication. This would allow index compaction without downtime. <br />
* [http://archives.postgresql.org/pgsql-performance/2007-08/msg00289.php <nowiki>Re: When/if to Reindex</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2012-09/msg00911.php<br />
* http://archives.postgresql.org/pgsql-hackers/2012-10/msg00128.php<br />
* [http://www.postgresql.org/message-id/CAB7nPqTys6JUQDxUczbJb0BNW0kPrW8WdZuk11KaxQq6o98PJg@mail.gmail.com Support for REINDEX CONCURRENTLY]<br />
* [https://wiki.postgresql.org/wiki/Reindex_concurrently Wiki page listing current situation on the matter]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow multiple indexes to be created concurrently, ideally via a single heap scan<br />
|pg_restore allows parallel index builds, but it is done via subprocesses, and there is no SQL interface for this.<br />
Cluster could definitely benefit from this.<br />
* http://archives.postgresql.org/pgsql-performance/2011-04/msg00093.php<br />
* http://www.postgresql.org/message-id/CADVWZZJ5AS%3DXVrDwfTQqQP_V1+_fTYcZhq%3Dd5CbCXoALCjObbg@mail.gmail.com<br />
}}<br />
<br />
{{TodoItem<br />
|Consider sorting entries before inserting into btree index<br />
* [http://archives.postgresql.org/pgsql-general/2008-01/msg01010.php <nowiki>Re: ATTN: Clodaldo was Performance problem. Could it be related to 8.3-beta4?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow creation of an index that can do comparisons to test if a value is between two column values<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00757.php <nowiki>Proposal: temporal extension &quot;period&quot; data type</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider using "effective_io_concurrency" for index scans<br />
|Currently only bitmap scans use this, which might be fine because most multi-row index scans use bitmap scans.<br />
* [http://www.postgresql.org/message-id/CAGTBQpZzf70n0PYJ%3DVQLd+jb3wJGo%3D2TXmY+SkJD6G_vjC5QNg@mail.gmail.com Prefetch index pages for B-Tree index scans]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix problem with btree page splits during checkpoints<br />
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00052.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-09/msg00184.php<br />
}}<br />
<br />
{{TodoItem<br />
|[http://archives.postgresql.org/pgsql-hackers/2012-05/msg00669.php Support amgettuple() in GIN (useful for exclusion constraints)]<br />
}}<br />
<br />
{{TodoItem<br />
| Allow "loose" or "skip" scans on btree indexes in which the first column has low cardinality<br />
* http://archives.postgresql.org/pgsql-performance/2012-08/msg00159.php<br />
}}<br />
<br />
{{TodoItem<br />
| Make the planner's "special index operator" mechanism extensible<br />
* http://www.postgresql.org/message-id/27270.1364700924@sss.pgh.pa.us<br />
}}<br />
<br />
{{TodoItem<br />
| Allow index only count for indexes which doesn't support index only scan<br />
}}<br />
<br />
{{TodoItem<br />
|Improve GIN performance<br />
* [http://www.postgresql.org/message-id/52F373CC.4050800@vmware.com Small GIN optimizations (after 9.4)]<br />
}}<br />
<br />
=== GIST ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Add more GIST index support for geometric data types}}<br />
<br />
{{TodoItem<br />
|Allow GIST indexes to create certain complex index types, like digital trees (see Aoki)}}<br />
<br />
{{TodoItem<br />
|Fix performance issues in contrib/seg and contrib/cube GiST support<br />
* [http://archives.postgresql.org/message-id/alpine.DEB.2.00.0904161633160.4053@aragorn.flymine.org GiST index performance]<br />
* [http://archives.postgresql.org/message-id/alpine.DEB.2.00.0904221704470.22330@aragorn.flymine.org draft patch]<br />
* [http://archives.postgresql.org/pgsql-performance/2009-05/msg00069.php <nowiki>Re: GiST index performance</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-performance/2009-06/msg00068.php <nowiki>GiST index performance</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|[http://archives.postgresql.org/message-id/4DC8D284-05CF-4E3D-9670-AC9A32C37A36@justatheory.com GiST index support for arrays]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow index only scan for GIST indexes (when possible)}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Hash ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Add UNIQUE capability to hash indexes}}<br />
<br />
{{TodoItem<br />
|Add hash WAL logging for crash recovery<br />
* http://archives.postgresql.org/pgsql-performance/2011-09/msg00196.php<br />
* [http://www.postgresql.org/message-id/CA+TgmoZyMoJSrFxHXQ06G8jhjXQcsKvDiHB_8z_7nc7hj7iHYQ@mail.gmail.com Save Hash Indexes]<br />
* [http://www.postgresql.org/message-id/CAM3SWZRBpAz%3DbZYCxvQDSGKR5OA5yEhGVOCit7AyStUtq2cBDA@mail.gmail.com GSoC on WAL-logging hash indexes]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow multi-column hash indexes}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Sorting ==<br />
<br />
{{TodoItem<br />
|Consider whether duplicate keys should be sorted by block/offset<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00558.php <nowiki>Remove hacks for old bad qsort() implementations?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider being smarter about memory and external files used during sorts<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg01101.php <nowiki>Sorting Improvements for 8.4</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00045.php <nowiki>Re: Sorting Improvements for 8.4</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider detoasting keys before sorting}}<br />
<br />
{{TodoItem<br />
|Allow sorts of skinny tuples to use even more available memory.<br />
* Now that it is not limited by MaxAllocSize, don't limit by INT_MAX either.<br />
* http://www.postgresql.org/message-id/CA+U5nMKkRMin1pV8VMpS6_n7hcOWSG0kZS3oFL9JOa8DV6vJyQ@mail.gmail.com<br />
}}<br />
<br />
== Fsync ==<br />
<br />
{{TodoItem<br />
|Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options and whether fsync does anything<br />
|Ideally this requires a separate test program like /contrib/pg_test_fsync that can be run at initdb time or optionally later.<br />
}}<br />
<br />
{{TodoItem<br />
|Consider sorting writes during checkpoint<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php <nowiki>Sorted writes in checkpoint</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2008-07/msg00050.php <nowiki>Re: Sorting writes during checkpoint</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg02012.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg00278.php<br />
* http://archives.postgresql.org/pgsql-hackers/2012-01/msg00493.php<br />
}}<br />
<br />
== Cache Usage ==<br />
<br />
{{TodoItem<br />
|Provide a way to calculate an &quot;estimated COUNT(*)&quot;<br />
|Perhaps by using the optimizer's cardinality estimates or random sampling.<br />
* [http://archives.postgresql.org/pgsql-hackers/2005-11/msg00943.php <nowiki>Re: Improving count(*)</nowiki>]<br />
* http://wiki.postgresql.org/wiki/Slow_Counting<br />
}}<br />
<br />
{{TodoItem<br />
|Consider automatic caching of statements at various levels:<br />
* Parsed query tree<br />
* Query execute plan<br />
* Query results <br />
<br />
:<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg00823.php <nowiki>Cached Query Plans (was: global prepared statements)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider increasing internal areas (NUM_CLOG_BUFFERS) when shared buffers is increased<br />
* [http://archives.postgresql.org/pgsql-hackers/2005-10/msg01419.php <nowiki>Re: slru.c race condition (was Re: TRAP: FailedAssertion(&quot;!((itemid)-&gt;lp_flags &amp; 0x01)&quot;,)</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00030.php <nowiki>clog_buffers to 64 in 8.3?</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-performance/2007-08/msg00024.php <nowiki>CLOG Patch</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider decreasing the amount of memory used by PrivateRefCount<br />
|<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php <nowiki>PrivateRefCount (for 8.3)</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php <nowiki>Re: PrivateRefCount (for 8.3)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider allowing higher priority queries to have referenced buffer cache pages stay in memory longer<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00562.php <nowiki>Re: How to keep a table in memory?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve cache lookup speed for sessions accessing many relations<br />
* http://archives.postgresql.org/pgsql-hackers/2012-11/msg00356.php<br />
}}<br />
<br />
{{TodoItem<br />
|Fix memory leak caused by negative catcache entries<br />
* [http://www.postgresql.org/message-id/51C0A1FF.2050404@vmware.com <nowiki>Re: Memory leak in PL/pgSQL function which CREATE/SELECT/DROP a temporary table</nowiki>]<br />
}}<br />
<br />
== Vacuum ==<br />
<br />
{{TodoItem<br />
|Auto-fill the free space map by scanning the buffer cache or by checking pages written by the background writer<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-02/msg01125.php <nowiki>Dead Space Map</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-03/msg00011.php <nowiki>Re: Automatic free space map filling</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow concurrent inserts to use recently created pages rather than creating new ones<br />
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg00853.php<br />
}}<br />
<br />
{{TodoItem<br />
|Consider having single-page pruning update the visibility map<br />
* <nowiki>https://commitfest.postgresql.org/action/patch_view?id=75</nowiki><br />
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg02344.php <nowiki>Re: visibility maps and heap_prune</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow VACUUM FULL and CLUSTER to update the visibility map<br />
* [http://www.postgresql.org/message-id/20130112191404.255800@gmx.com index-only scans : abnormal heap fetches after VACUUM FULL]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve tracking of total relation tuple counts now that vacuum doesn't always scan the whole heap<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg00531.php Partial vacuum versus pg_class.reltuples]<br />
}}<br />
<br />
{{TodoItem<br />
|Bias FSM towards returning free space near the beginning of the heap file, in hopes that empty pages at the end can be truncated by VACUUM<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg01124.php <nowiki>FSM search modes</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider a more compact data representation for dead tuple locations within VACUUM<br />
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00143.php <nowiki>Re: Have vacuum emit a warning when it runs out of maintenance_work_mem</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Provide more information in order to improve user-side estimates of dead space bloat in relations<br />
* [http://archives.postgresql.org/pgsql-general/2009-05/msg01039.php <nowiki>Re: Bloated Table</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve locking behaviour of vacuum during trailing page truncation<br />
* http://archives.postgresql.org/pgsql-bugs/2011-03/msg00319.php<br />
* http://archives.postgresql.org/message-id/4D8DF88E.7080205@Yahoo.com<br />
}}<br />
<br />
{{TodoItem<br />
|Reduce the number of table scans performed by vacuum<br />
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg01119.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00605.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-07/msg00624.php<br />
}}<br />
<br />
{{TodoItem<br />
|Vacuum Gin indexes in physically order rather than logical order<br />
* http://archives.postgresql.org/pgsql-hackers/2012-04/msg00443.php<br />
}}<br />
<br />
{{TodoItem<br />
|Avoid creation of the free space map for small tables<br />
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg01751.php<br />
* http://archives.postgresql.org/pgsql-hackers/2012-08/msg00552.php<br />
* http://archives.postgresql.org/pgsql-hackers/2012-08/msg00615.php<br />
}}<br />
<br />
=== Auto-vacuum ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Issue log message to suggest VACUUM FULL if a table is nearly empty?<br />
*[http://www.postgresql.org/message-id/F40B0968DB0A904DA78A924E633BE78645FAAF@SYDEXCHTMP2.au.fjanz.com discussion]<br />
}}<br />
<br />
{{TodoItem<br />
|Prevent long-lived temporary tables from causing frozen-xid advancement starvation<br />
|The problem is that autovacuum cannot vacuum them to set frozen xids; only the session that created them can do that. <br />
* [http://archives.postgresql.org/pgsql-general/2007-06/msg01645.php <nowiki>Re: AutoVacuum Behaviour Question</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Prevent autovacuum from running if an old transaction is still running from the last vacuum<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00899.php <nowiki>Re: Autovacuum and OldestXmin</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Have autoanalyze of parent tables occur when child tables are modified<br />
* http://archives.postgresql.org/pgsql-performance/2010-06/msg00137.php<br />
* http://archives.postgresql.org/pgsql-performance/2010-10/msg00271.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow visibility map all-visible bits to be set even when an auto-ANALYZE is running<br />
* http://archives.postgresql.org/pgsql-hackers/2012-01/msg00356.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow parallel cores to be used by vacuumdb<br />
* [http://archives.postgresql.org/message-id/4F10A728.7090403@agliodbs.com vacuumdb -j]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve autovacuum tuning<br />
* http://www.postgresql.org/message-id/5078AD6B.8060802@agliodbs.com<br />
* http://www.postgresql.org/message-id/20130124215715.GE4528@alvh.no-ip.org<br />
}}<br />
<br />
{{TodoItem<br />
|Improve setting of visibility map bits for read-only and insert-only workloads<br />
* http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Locking ==<br />
<br />
{{TodoItem<br />
|Fix priority ordering of read and write light-weight locks<br />
* [http://archives.postgresql.org/pgsql-hackers/2004-11/msg00893.php <nowiki>lwlocks and starvation</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2004-11/msg00905.php <nowiki>Re: lwlocks and starvation</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix problem when multiple subtransactions of the same outer transaction hold different types of locks, and one subtransaction aborts<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-11/msg01011.php <nowiki>FOR SHARE vs FOR UPDATE locks</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00001.php <nowiki>Re: FOR SHARE vs FOR UPDATE locks</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00435.php <nowiki>Re: [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00773.php <nowiki>Re: savepoints and upgrading locks</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add idle_in_transaction_timeout GUC so locks are not held for long periods of time}}<br />
<br />
{{TodoItem<br />
|Improve deadlock detection when a page cleaning lock conflicts with a shared buffer that is pinned<br />
* [http://archives.postgresql.org/pgsql-bugs/2008-01/msg00138.php <nowiki>BUG #3883: Autovacuum deadlock with truncate?</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00873.php <nowiki>Thoughts about bug #3883</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-committers/2008-01/msg00365.php <nowiki>Re: pgsql: Add checks to TRUNCATE, CLUSTER, and REINDEX to prevent</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Detect deadlocks involving LockBufferForCleanup()<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00873.php <nowiki>Thoughts about bug #3883</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow finer control over who is cancelled in a deadlock<br />
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01727.php<br />
}}<br />
<br />
== Startup Time Improvements ==<br />
<br />
{{TodoItem<br />
|Experiment with multi-threaded backend for backend creation<br />
|This would prevent the overhead associated with process creation. Most operating systems have trivial process creation time compared to database startup overhead, but a few operating systems (Win32, Solaris) might benefit from threading. Also explore the idea of a single session using multiple threads to execute a statement faster.}}<br />
<br />
{{TodoItem<br />
|Allow backends to change their database without restart<br />
|This allows for faster server startup.<br />
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00843.php<br />
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00336.php<br />
}}<br />
<br />
== Write-Ahead Log ==<br />
<br />
{{TodoItem<br />
|Eliminate need to write full pages to WAL before page modification<br />
|Currently, to protect against partial disk page writes, we write full page images to WAL before they are modified so we can correct any partial page writes during recovery. These pages can also be eliminated from point-in-time archive files. <br />
* [http://archives.postgresql.org/pgsql-hackers/2002-06/msg00655.php <nowiki>Re: Index Scans become Seq Scans after VACUUM ANALYSE</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg01191.php<br />
* [http://archives.postgresql.org/message-id/20120105061916.GB21048@fetter.org WIP double writes]<br />
* [http://archives.postgresql.org/message-id/4EFC449F02000025000441CD@gw.wicourts.gov double writes]<br />
* [http://archives.postgresql.org/message-id/20120110214344.GB21106@fetter.org Double-write with Fast Checksums]<br />
* [http://archives.postgresql.org/message-id/1962493974.656458.1327703514780.JavaMail.root@zimbra-prod-mbox-4.vmware.com double writes using "double-write buffer" approach]<br />
* http://archives.postgresql.org/pgsql-hackers/2012-10/msg01463.php<br />
}}<br />
<br />
{{TodoItem<br />
|When full page writes are off, write CRC to WAL and check file system blocks on recovery<br />
|If CRC check fails during recovery, remember the page in case a later CRC for that page properly matches. The difficulty is that hint bits are not WAL logged, meaning a valid page might not match the earlier CRC.}}<br />
<br />
{{TodoItem<br />
|Write full pages during file system write and not when the page is modified in the buffer cache<br />
|This allows most full page writes to happen in the background writer. It might cause problems for applying WAL on recovery into a partially-written page, but later the full page will be replaced from WAL.<br />
* [http://archives.postgresql.org/message-id/CAGvK12UST-tPhyLrSLuSpwFxZbAO79yYrhV2xaLmS2MkUxNUVQ@mail.gmail.com Page Checksums + Double Writes]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider compression of full page writes<br />
* [http://www.postgresql.org/message-id/CAHGQGwGqG8e9YN0fNCUZqTTT%3DhNr7Ly516kfT5ffqf4pp1qnHg@mail.gmail.com Compression of full-page-writes]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow WAL information to recover corrupted pg_controldata<br />
* [http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php <nowiki>Re: [HACKERS] pg_resetxlog -r flag</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Find a way to reduce rotational delay when repeatedly writing last WAL page<br />
|Currently fsync of WAL requires the disk platter to perform a full rotation to fsync again. One idea is to write the WAL to different offsets that might reduce the rotational delay. <br />
* [http://archives.postgresql.org/pgsql-hackers/2002-11/msg00483.php <nowiki>500 tpsQL + WAL log implementation</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Speed WAL recovery by allowing more than one page to be prefetched<br />
|This should be done utilizing the same infrastructure used for prefetching in general to avoid introducing complex error-prone code in WAL replay. <br />
* [http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php <nowiki>Slow PITR restore</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php <nowiki>Re: [GENERAL] Slow PITR restore</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg01279.php <nowiki>Read-ahead and parallelism in redo recovery</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve WAL concurrency by increasing lock granularity<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00556.php <nowiki>Reworking WAL locking</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Be more aggressive about creating WAL files<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg01325.php <nowiki>Re: PANIC caused by open_sync on Linux</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2004-07/msg01075.php <nowiki>PreallocXlogFiles</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2005-04/msg00556.php <nowiki>WAL/PITR additional items</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Have resource managers report the duration of their status changes<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg01468.php <nowiki>Recovery of Multi-stage WAL actions</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Close deleted WAL files held open in *nix by long-lived read-only backends<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-11/msg01754.php <nowiki>Deleted WAL files held open by backends in Linux</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg00060.php <nowiki>Re: Deleted WAL files held open by backends in Linux</nowiki>]<br />
}}<br />
<br />
== Optimizer / Executor ==<br />
<br />
{{TodoItem<br />
|Improve selectivity functions for geometric operators}}<br />
<br />
{{TodoItem<br />
|Consider increasing the default values of from_collapse_limit, join_collapse_limit, and/or geqo_threshold<br />
* [http://archives.postgresql.org/message-id/4136ffa0905210551u22eeb31bn5655dbe7c9a3aed5@mail.gmail.com from_collapse_limit vs. geqo_threshold]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve ability to display optimizer analysis using OPTIMIZER_DEBUG<br />
* http://archives.postgresql.org/pgsql-hackers/2012-08/msg00597.php<br />
}}<br />
<br />
{{TodoItem<br />
|Log statements where the optimizer row estimates were dramatically different from the number of rows actually found?}}<br />
<br />
{{TodoItem<br />
|Consider compressed annealing to search for query plans<br />
|This might replace GEQO.<br />
* http://archives.postgresql.org/message-id/15658.1241278636%40sss.pgh.pa.us<br />
}}<br />
<br />
{{TodoItem<br />
|Improve use of expression indexes for ORDER BY <br />
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01553.php <nowiki>Resjunk sort columns, Heikki's index-only quals patch, and bug #5000</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Modify the planner to better estimate caching effects<br />
* http://archives.postgresql.org/pgsql-performance/2010-11/msg00117.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow shared buffer cache contents to affect index cost computations<br />
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg01140.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow the CTE (Common Table Expression) optimization fence to be optionally disabled<br />
* http://archives.postgresql.org/pgsql-hackers/2012-09/msg00700.php<br />
* http://archives.postgresql.org/pgsql-performance/2012-11/msg00161.php<br />
}}<br />
<br />
{{TodoItem<br />
|Teach the planner how to better use partial indexes for index-only scans<br />
* http://www.postgresql.org/message-id/25141.1345072858@sss.pgh.pa.us<br />
* http://www.postgresql.org/message-id/79C7D74D-59B0-4D97-A5E5-55553EF299AA@justatheory.com<br />
}}<br />
<br />
=== Hashing ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Consider using a hash for joining to a large IN (VALUES ...) list<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00450.php <nowiki>Planning large IN lists</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow single batch hash joins to preserve outer pathkeys<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php Re: Potential Join Performance Issue]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]<br />
}}<br />
<br />
{{TodoItem<br />
|"lazy" hash tables - look up only the tuples that are actually requested<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]<br />
}}<br />
<br />
{{TodoItem<br />
|Avoid building the same hash table more than once during the same query<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]<br />
}}<br />
<br />
{{TodoItem<br />
|Avoid hashing for distinct and then re-hashing for hash join<br />
* [http://archives.postgresql.org/message-id/4136ffa0902191346g62081081v8607f0b92c206f0a@mail.gmail.com Re: Fixing Grittner's planner issues]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Background Writer ==<br />
<br />
{{TodoItem<br />
|Consider having the background writer update the transaction status hint bits before writing out the page<br />
|Implementing this requires the background writer to have access to system catalogs and the transaction status log.}}<br />
<br />
{{TodoItem<br />
|Consider adding buffers the background writer finds reusable to the free list <br />
* [http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php <nowiki>Background LRU Writer/free list</nowiki>]<br />
* [http://archives.postgresql.org/message-id/CA+U5nMKtvyDcV4zTr7bq7t6cA2nBfLxCJ8tQgVBnc5ddRPO+Bg@mail.gmail.com our buffer replacement strategy is kind of lame]<br />
* [http://www.postgresql.org/message-id/CAOeZVic4HikhmzVD%3DZP4JY9g8PgpyiQQOXOELWP%3DkR+%3DH1Frgg@mail.gmail.com Page replacement algorithm in buffer cache]<br />
* [http://www.postgresql.org/message-id/002f01ce50a8$e057c7a0$a10756e0$@kapila@huawei.com Move unused buffers to freelist]<br />
}}<br />
<br />
{{TodoItem<br />
|Automatically tune bgwriter_delay based on activity rather then using a fixed interval<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php <nowiki>Background LRU Writer/free list</nowiki>]<br />
* [http://archives.postgresql.org/message-id/CA+U5nMKtvyDcV4zTr7bq7t6cA2nBfLxCJ8tQgVBnc5ddRPO+Bg@mail.gmail.com our buffer replacement strategy is kind of lame]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider whether increasing BM_MAX_USAGE_COUNT improves performance<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg01007.php <nowiki>Bgwriter LRU cleaning: we've been going at this all wrong</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Test to see if calling PreallocXlogFiles() from the background writer will help with WAL segment creation latency<br />
* [http://archives.postgresql.org/pgsql-patches/2007-06/msg00340.php <nowiki>Re: Load Distributed Checkpoints, final patch</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add auto-tuning of work_mem<br />
* [http://www.postgresql.org/message-id/20131009143046.GT22450@momjian.us Auto-tuning work_mem and maintenance_work_mem]<br />
}}<br />
<br />
== Concurrent Use of Resources ==<br />
<br />
{{TodoItem<br />
|Do async I/O for faster random read-ahead of data<br />
|Async I/O allows multiple I/O requests to be sent to the disk with results coming back asynchronously.<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php <nowiki>Asynchronous I/O Support</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-performance/2007-09/msg00255.php <nowiki>Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00027.php <nowiki>There's random access and then there's random access</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2008-01/msg00170.php <nowiki>Bitmap index scan preread using posix_fadvise (Was: There's random access and then there's random access)</nowiki>]<br />
The above patch is already applied as of 8.4, but it still remains to figure out how to handle plain indexscans effectively.<br />
* [http://archives.postgresql.org//pgsql-hackers/2009-01/msg00806.php Problems with the patch submitted for posix_fadvise in index scans]<br />
}}<br />
<br />
{{TodoItem<br />
|Experiment with multi-threaded backend for better I/O utilization<br />
|This would allow a single query to make use of multiple I/O channels simultaneously. One idea is to create a background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be expanded to allow concurrent reads from multiple devices in a partitioned table.<br />
* http://archives.postgresql.org/pgsql-performance/2011-02/msg00123.php<br />
* http://archives.postgresql.org/pgsql-hackers/2012-10/msg01139.php<br />
}}<br />
<br />
{{TodoItem<br />
|Experiment with multi-threaded backend for better CPU utilization<br />
|This would allow several CPUs to be used for a single query, such as for sorting or query execution.<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00945.php <nowiki>Multi CPU Queries - Feedback and/or suggestions wanted!</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|SMP scalability improvements<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00439.php <nowiki>Straightforward changes for increased SMP scalability</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00206.php <nowiki>Re: Reducing Transaction Start/End Contention</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00361.php <nowiki>Re: Reducing Transaction Start/End Contention</nowiki>]<br />
}}<br />
<br />
== TOAST ==<br />
<br />
{{TodoItem<br />
|Allow user configuration of TOAST thresholds<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00213.php <nowiki>Re: Proposed adjustments in MaxTupleSize and toastthresholds</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php <nowiki>pg_lzcompress strategy parameters</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Reduce unnecessary cases of deTOASTing<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00895.php <nowiki>Re: [PATCHES] Eliminate more detoast copies for packed varlenas</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Reduce costs of repeat de-TOASTing of values<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg01096.php <nowiki>WIP patch: reducing overhead for repeat de-TOASTing</nowiki>]<br />
}}<br />
<br />
== Monitoring ==<br />
{{TodoItem<br />
|Expand pg_stat_activity for easier integration with monitoring tools<br />
|* http://archives.postgresql.org/message-id/4DFA13A5.2060200@2ndQuadrant.com<br />
}}<br />
<br />
{{TodoItem<br />
|Add column to pg_stat_activity that shows the progress of long-running commands like CREATE INDEX and VACUUM<br />
* [http://archives.postgresql.org/pgsql-patches/2008-04/msg00203.php <nowiki>EXPLAIN progress info</nowiki>]<br />
* The CLUSTER/VACUUM FULL implementation would also be useful to track this way<br />
}}<br />
<br />
{{TodoItem<br />
|Have pg_stat_activity display query strings in the correct client encoding<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00131.php <nowiki>pg_stats queries versus per-database encodings</nowiki>]<br />
}}<br />
<br />
{{TodoItemEasy<br />
|Expose pg_controldata via an SQL interface<br />
|Helpful for monitoring replicated databases<br />
* http://archives.postgresql.org/message-id/4B901D73.8030003@agliodbs.com<br />
* [http://archives.postgresql.org/message-id/4B959D7A.6010907@joeconway.com initial patch]<br />
}}<br />
<br />
{{TodoItem<br />
| Add entry creation timestamp column to pg_stat_replication<br />
* http://archives.postgresql.org/pgsql-hackers/2011-08/msg00694.php<br />
}}<br />
<br />
{{TodoItem<br />
| Allow reporting of stalls due to wal_buffer wrap-around<br />
* http://archives.postgresql.org/pgsql-hackers/2012-02/msg00826.php<br />
}}<br />
<br />
{{TodoItem<br />
| Restructure pg_stat_database columns tup_returned and tup_fetched to return meaningful values<br />
* http://www.postgresql.org/message-id/20121012060345.GA29214@toroid.org<br />
}}<br />
<br />
== Miscellaneous Performance ==<br />
<br />
{{TodoItem<br />
|Use mmap() rather than shared memory for shared buffers?<br />
|This would remove the requirement for SYSV SHM but would introduce portability issues. Anonymous mmap (or mmap to /dev/zero) is required to prevent I/O overhead. We could also consider mmap() for writing WAL.<br />
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00750.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00756.php<br />
* http://www.postgresql.org/message-id/20140115114909.GI4963@suse.de<br />
}}<br />
<br />
{{TodoItem<br />
|Rather than consider mmap()-ing in 8k pages, consider mmap()'ing entire files into a backend?<br />
|Doing I/O to large tables would consume a lot of address space or require frequent mapping/unmapping. Extending the file also causes mapping problems that might require mapping only individual pages, leading to thousands of mappings. Another problem is that there is no way to _prevent_ I/O to disk from the dirty shared buffers so changes could hit disk before WAL is written.<br />
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01239.php<br />
}}<br />
<br />
{{TodoItem<br />
|Consider ways of storing rows more compactly on disk:<br />
* Reduce the row header size?<br />
* Consider reducing on-disk varlena length from four bytes to two because a heap row cannot be more than 64k in length}}<br />
<br />
{{TodoItem<br />
|Consider transaction start/end performance improvements<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php <nowiki>Reducing Transaction Start/End Contention</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00361.php <nowiki>Re: Reducing Transaction Start/End Contention</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow configuration of backend priorities via the operating system<br />
|Though backend priorities make priority inversion during lock waits possible, research shows that this is not a huge problem.<br />
* [http://archives.postgresql.org/pgsql-general/2007-02/msg00493.php <nowiki>Priorities for users or queries?</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider increasing the minimum allowed number of shared buffers<br />
* [http://archives.postgresql.org/pgsql-bugs/2008-02/msg00157.php <nowiki>Re: [PATCH] Don't bail with legitimate -N/-B options</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider if CommandCounterIncrement() can avoid its AcceptInvalidationMessages() call<br />
* [http://archives.postgresql.org/pgsql-committers/2007-11/msg00585.php <nowiki>pgsql: Avoid incrementing the CommandCounter when</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation<br />
* [http://archives.postgresql.org/pgsql-performance/2007-12/msg00090.php <nowiki>Re: TB-sized databases</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider not storing a NULL bitmap on disk if all the NULLs are trailing<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00624.php <nowiki>Proposal for Null Bitmap Optimization(for Trailing NULLs)</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2007-12/msg00109.php <nowiki>Re: [HACKERS] Proposal for Null Bitmap Optimization(for TrailingNULLs)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Sort large UPDATE/DELETEs so it is done in heap order<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg01119.php <nowiki>Possible future performance improvement: sort updates/deletes by ctid</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider decreasing the I/O caused by updating tuple hint bits<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00847.php <nowiki>Hint Bits and Write I/O</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2008-07/msg00199.php <nowiki>Re: [HACKERS] Hint Bits and Write I/O</nowiki>]<br />
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg00695.php<br />
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00792.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg01063.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01408.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01453.php<br />
}}<br />
<br />
{{TodoItem<br />
|Avoid the requirement of freezing pages that are infrequently modified <br />
|If all rows on a page are visible, it is possible to set a bit in the visibility map (once the visibility map is 100% reliable) and not need to freeze the page, avoiding a page rewrite<br />
* http://archives.postgresql.org/message-id/4BF701CF.2090205@agliodbs.com<br />
* http://archives.postgresql.org/pgsql-hackers/2010-06/msg00082.php<br />
* http://www.postgresql.org/message-id/20130523175148.GA29374@alap2.anarazel.de<br />
* http://www.postgresql.org/message-id/CA+TgmoaEmnoLZmVbb8gvY69NA8zw9BWpiZ9+TLz-LnaBOZi7JA@mail.gmail.com<br />
* http://www.postgresql.org/message-id/51A7553E.5070601@vmware.com<br />
}}<br />
<br />
{{TodoItem<br />
|Avoid reading in b-tree pages when replaying vacuum records in hot standby mode<br />
* [http://archives.postgresql.org/message-id/1272571938.4161.14739.camel@ebony <nowiki>Hot Standby tuning for btree_xlog_vacuum()</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Restructure truncation logic to be more resistant to failure<br />
|This also involves not writing dirty buffers for a truncated or dropped relation<br />
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01032.php<br />
}}<br />
<br />
{{TodoItem<br />
|Consider adding logic to increase large tables by more than 8k<br />
|This would reduce file system fragmentation<br />
* http://archives.postgresql.org/pgsql-bugs/2011-03/msg00337.php<br />
}}<br />
<br />
== Miscellaneous Other ==<br />
<br />
{{TodoItem<br />
|Deal with encoding issues for filenames in the server filesystem<br />
* {{MessageLink|20090413184335.39BE.52131E4D@oss.ntt.co.jp|a proposed patch here}}<br />
* {{MessageLink|8484.1244655656@sss.pgh.pa.us|some issues about it here}}<br />
* {{MessageLink|20100107103740.97A5.52131E4D@oss.ntt.co.jp|Windows-specific patch here}}<br />
}}<br />
<br />
{{TodoItem<br />
|Deal with encoding issues in the output of localeconv()<br />
* [http://archives.postgresql.org/message-id/40c6d9160904210658y590377cfw6dbbecb53d2b8be0@mail.gmail.com bug report]<br />
* [http://archives.postgresql.org/message-id/49EF8DA0.90008@tpf.co.jp draft patch]<br />
* [http://archives.postgresql.org/message-id/21710.1243620986@sss.pgh.pa.us review of patch]<br />
}}<br />
<br />
{{TodoItem<br />
|Provide schema name and other fields available from SQL GET DIAGNOSTICS in error reports<br />
* [http://archives.postgresql.org/message-id/dcc563d10810211907n3c59a920ia9eb7cd2a6d5ea58@mail.gmail.com <nowiki>How to get schema name which violates fk constraint</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-11/msg00846.php <nowiki>patch - Report the schema along table name in a referential failure error message</nowiki>]<br />
* {{MessageLink|3191.1263306359@sss.pgh.pa.us|Re: NOT NULL violation and error-message}}<br />
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg00213.php <nowiki>the case for machine-readable error fields</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add 64-bit support to /contrib/pgbench<br />
* http://archives.postgresql.org/pgsql-hackers/2010-07/msg00153.php<br />
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg00705.php<br />
}}<br />
<br />
{{TodoItem<br />
|Use sa_mask to close race conditions between signal handlers<br />
* http://www.postgresql.org/message-id/20130911013107.GA225735@tornado.leadboat.com<br />
}}<br />
<br />
== Source Code ==<br />
<br />
{{TodoItemEasy<br />
|Remove warnings created by -Wcast-align}}<br />
<br />
{{TodoItem<br />
|Move platform-specific ps status display info from ps_status.c to ports}}<br />
<br />
{{TodoItem<br />
|Consider a faster CRC32 algorithm<br />
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01112.php<br />
}}<br />
<br />
{{TodoItem<br />
|Allow cross-compiling by generating the zic database on the target system}}<br />
<br />
{{TodoItem<br />
|Improve NLS maintenance of libpgport messages linked onto applications}}<br />
<br />
{{TodoItem<br />
|Use UTF8 encoding for NLS messages so all server encodings can read them properly}}<br />
<br />
{{TodoItem<br />
|Allow creation of universal binaries for Darwin<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00884.php <nowiki>Getting to universal binaries for Darwin</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider GnuTLS if OpenSSL license becomes a problem<br />
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg00892.php<br />
* [http://archives.postgresql.org/pgsql-patches/2006-05/msg00040.php <nowiki>[PATCH] Add support for GnuTLS</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg01213.php <nowiki>TODO: GNU TLS</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider making NAMEDATALEN more configurable in future releases}}<br />
<br />
{{TodoItem<br />
|Research use of signals and sleep wake ups<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00003.php <nowiki>Restartable signals 'n all that</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow C++ code to more easily access backend code<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00302.php <nowiki>Mostly Harmless: Welcoming our C++ friends</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider simplifying how memory context resets handle child contexts<br />
* [http://archives.postgresql.org/pgsql-patches/2007-08/msg00067.php <nowiki>Re: Memory leak in nodeAgg</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Create three versions of libpgport to simplify client code<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00154.php <nowiki>8.4 TODO item: make src/port support libpq and ecpg directly</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve detection of shared memory segments being used by others by checking the SysV shared memory field 'nattch'<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00656.php <nowiki>postgresql in FreeBSD jails: proposal</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00673.php <nowiki>Re: postgresql in FreeBSD jails: proposal</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Implement the non-threaded Avahi service discovery protocol<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00939.php <nowiki>Re: [PATCHES] Avahi support for Postgresql</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2008-02/msg00097.php <nowiki>Re: Avahi support for Postgresql</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg01211.php <nowiki>Re: [PATCHES] Avahi support for Postgresql</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2008-04/msg00001.php <nowiki>Re: [HACKERS] Avahi support for Postgresql</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Reduce data row alignment requirements on some 64-bit systems<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00369.php <nowiki>[WIP] Reduce alignment requirements on 64-bit systems.</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Restructure TOAST internal storage format for greater flexibility<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00049.php <nowiki>Re: PG_PAGE_LAYOUT_VERSION 5 - time for change</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
| Add regression tests for pg_dump/restore<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01967.php <nowiki>"make install-check-pg_dump" target in src/regress]</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
| Research different memory allocation methods for lists<br />
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg01467.php <br />
}}<br />
<br />
{{TodoItem<br />
| Consider removing the attribute options cache<br />
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00039.php<br />
}}<br />
<br />
{{TodoItem<br />
| Restructure /contrib section<br />
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00705.php<br />
}}<br />
<br />
{{TodoItem<br />
| Consider adding explicit huge page support<br />
* http://archives.postgresql.org/pgsql-hackers/2012-07/msg00123.php<br />
}}<br />
<br />
=== /contrib/pg_upgrade ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Handle large object comments<br />
|This is difficult to do because the large object doesn't exist when --schema-only is loaded.<br />
}}<br />
<br />
{{TodoItem<br />
|Consider using pg_depend for checking object usage in version.c<br />
}}<br />
<br />
{{TodoItem<br />
|If reindex is necessary, allow it to be done in parallel with pg_dump custom format<br />
}}<br />
<br />
{{TodoItem<br />
|Migrate pg_statistic by dumping it out as a flat file, so analyze is not necessary<br />
|pg_class.oid is not preserved so schema.tablename must be used.<br />
* [http://archives.postgresql.org/message-id/CAAZKuFaWdLkK8eozSAooZBets9y_mfo2HS6urPAKXEPbd-JLCA@mail.gmail.com pg_upgrade and statistics]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve testing, perhaps using the buildfarm<br />
|The buildfarm has access to multiple versions of PostgreSQL.<br />
}}<br />
<br />
{{TodoItem<br />
|Create machine-readable output of pg_controldata<br />
|This would avoid parsing its output. The problem is we need pg_controldata output from both the old and new clusters so we would need to support both formats.<br />
}}<br />
<br />
{{TodoItem<br />
|Find cleaner way to start/stop dedicated servers for upgrades<br />
* http://archives.postgresql.org/pgsql-hackers/2012-08/msg00275.php<br />
}}<br />
<br />
{{TodoItem<br />
|Consider a way to run pg_upgrade on standby servers<br />
* http://archives.postgresql.org/pgsql-hackers/2012-07/msg00453.php<br />
* http://archives.postgresql.org/pgsql-hackers/2012-09/msg00056.php<br />
}}<br />
<br />
{{TodoItem<br />
|Desired changes that would prevent upgrades with pg_upgrade<br />
* 32-bit page checksums<br />
* Add metapage to GiST indexes<br />
* Clean up hstore's internal representation<br />
* Remove tuple infomask bit HEAP_MOVED_OFF and HEAP_MOVED_IN<br />
* [http://www.postgresql.org/message-id/CAK+WP1xdmyswEehMuetNztM4H199Z1w9KWRHVMKzyyFM+hV%3DzA@mail.gmail.com fix char() index trailing space handling]<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Windows ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Remove configure.in check for link failure when cause is found}}<br />
<br />
{{TodoItem<br />
|Remove readdir() errno patch when runtime/mingwex/dirent.c rev 1.4 is released}}<br />
<br />
{{TodoItem<br />
|Allow psql to use readline once non-US code pages work with backslashes}}<br />
<br />
{{TodoItem<br />
|Fix problem with shared memory on the Win32 Terminal Server}}<br />
<br />
{{TodoItem<br />
|Improve signal handling<br />
* [http://archives.postgresql.org/pgsql-patches/2005-06/msg00027.php <nowiki>Simplify Win32 Signaling code</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Convert MSVC build system to remove most batch files<br />
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00961.php <nowiki>MSVC build system</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Support pgxs when using MSVC}}<br />
<br />
{{TodoItem<br />
|Fix MSVC NLS support, like for to_char()<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00485.php <nowiki>NLS on MSVC strikes back!</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-patches/2008-02/msg00038.php <nowiki>Fix for 8.3 MSVC locale (Was [HACKERS] NLS on MSVC strikes back!)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Find a correct rint() substitute on Windows<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00808.php <nowiki>Minor bug in src/port/rint.c</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Fix global namespace issues when using multiple terminal server sessions<br />
* [http://archives.postgresql.org/message-id/48F3BFCC.8030107@dunslane.net problems with Windows global namespace]}}<br />
<br />
{{TodoItem<br />
|Change from the current autoconf/gmake build system to cmake<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg01869.php <nowiki>About CMake (was Re: [COMMITTERS] pgsql: Append major version number and for libraries soname major)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve consistency of path separator usage<br />
* http://archives.postgresql.org/message-id/49C0BDC5.4010002@hagander.net<br />
}}<br />
<br />
{{TodoItem<br />
|Fix cross-compiling on Windows<br />
* http://archives.postgresql.org/pgsql-bugs/2010-10/msg00110.php<br />
}}<br />
<br />
{{TodoItem<br />
|Reduce file statistics overhead on directory reads<br />
* http://www.postgresql.org/message-id/1338325561.82125.YahooMailNeo@web39304.mail.mud.yahoo.com<br />
}}<br />
<br />
<br />
<br />
{{TodoEndSubsection}}<br />
<br />
=== Wire Protocol Changes ===<br />
{{TodoSubsection}}<br />
<br />
{{TodoItem<br />
|Allow dynamic character set handling}}<br />
<br />
{{TodoItem<br />
|Let the client indicate character encoding of database names, user names, and passwords<br />
* http://www.postgresql.org/message-id/16160.1360540050@sss.pgh.pa.us}}<br />
* http://www.postgresql.org/message-id/20131220030725.GA1411150@tornado.leadboat.com<br />
<br />
{{TodoItem<br />
|Add decoded type, length, precision}}<br />
<br />
{{TodoItem<br />
|Mark result columns as known-not-null when possible<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-11/msg01029.php <nowiki>Adding nullable indicator to Describe</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Provide more control over planner treatment of statements being prepared}}<br />
<br />
{{TodoItem<br />
|Use compression<br />
|If SSL is used, hopefully avoid the overhead of key negotiation and encryption<br />
* http://archives.postgresql.org/pgsql-hackers/2012-06/msg00793.php<br />
}}<br />
<br />
{{TodoItem<br />
|Update clients to use data types, typmod, schema.table.column names of result sets using new statement protocol}}<br />
<br />
{{TodoItem<br />
|Set protocol for wire format negotiation<br />
* [http://archives.postgresql.org/message-id/CACMqXCKkGrGXxQhjHCKCe0B8hn6sTt-1sdgHZOSGQMxrusOsQA@mail.gmail.com GUC_REPORT for protocol tunables]<br />
}}<br />
<br />
{{TodoItem<br />
|Make sure upgrading to a 4.1 protocol version will actually work smoothly<br />
* [http://archives.postgresql.org/message-id/28307.1318255008@sss.pgh.pa.us Re: libpq, PQdescribePrepared -> PQftype, PQfmod, no PQnullable]<br />
}}<br />
<br />
{{TodoItem<br />
|Allow multi-state authentication<br />
* http://www.postgresql.org/message-id/51A44185.5060306@2ndquadrant.com<br />
}}<br />
<br />
{{TodoItem<br />
|Identify the affected object in CommandComplete message?<br />
* http://www.postgresql.org/message-id/CAAfz9KNGVoyM+z_2tnPKTDXG_RdR9a33Y5s+zQ9LdwTTsqqZng@mail.gmail.com<br />
}}<br />
<br />
{{TodoEndSubsection}}<br />
<br />
== Documentation ==<br />
<br />
{{TodoItemEasy <br />
| Add contrib functions to the index<br />
* Add the functions and GUCs in the contrib modules to [http://www.postgresql.org/docs/current/static/sql-createindex.html the documentation index]: [http://archives.postgresql.org/message-id/50A2E173.6030404@2ndQuadrant.com per list discussion]<br />
}}<br />
<br />
{{TodoItem<br />
|Convert single quotes to apostrophes in the PDF documentation<br />
* [http://archives.postgresql.org/pgsql-docs/2007-12/msg00059.php <nowiki>SGML docs and pdf single-quotes</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Provide a manpage for postgresql.conf<br />
* {{messageLink|20080819194311.GH4428@alvh.no-ip.org|A smaller default postgresql.conf}}<br />
* {{messageLink|200808211910.37524.peter_e@gmx.net|A smaller default postgresql.conf}}<br />
}}<br />
<br />
{{TodoItem<br />
|Change the manpage-generating toolchain to use the new XML-based docbook2x tools<br />
* {{messageLink|200808211910.37524.peter_e@gmx.net|A smaller default postgresql.conf}}<br />
}}<br />
<br />
{{TodoItem<br />
|Consider changing documentation format from SGML to XML<br />
* [http://archives.postgresql.org/pgsql-docs/2006-12/msg00152.php <nowiki>Re: Authoring Tools WAS: Switching to XML</nowiki>]<br />
* http://archives.postgresql.org/pgsql-docs/2011-04/msg00020.php<br />
* http://wiki.postgresql.org/wiki/Switching_PostgreSQL_documentation_from_SGML_to_XML<br />
}}<br />
<br />
{{TodoItem<br />
|Document support for N<nowiki>' '</nowiki> national character string literals, if it matches the SQL standard<br />
* http://archives.postgresql.org/message-id/1275895438.1849.1.camel@fsopti579.F-Secure.com<br />
}}<br />
<br />
{{TodoItem<br />
|Add diagrams to the documentation<br />
* http://archives.postgresql.org/pgsql-docs/2010-07/msg00001.php<br />
}}<br />
<br />
== Exotic Features ==<br />
<br />
{{TodoItem<br />
|Add pre-parsing phase that converts non-ISO syntax to supported syntax<br />
|This could allow SQL written for other databases to run without modification.}}<br />
<br />
{{TodoItem<br />
|Allow plug-in modules to emulate features from other databases}}<br />
<br />
{{TodoItem<br />
|Add features of Oracle-style packages<br />
|A package would be a schema with session-local variables, public/private functions, and initialization functions. It is also possible to implement these capabilities in any schema and not use a separate &quot;packages&quot; syntax at all.<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-08/msg00384.php <nowiki>proposal for PL packages for 8.3.</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Consider allowing control of upper/lower case folding of unquoted identifiers<br />
* [http://archives.postgresql.org/pgsql-hackers/2004-04/msg00818.php <nowiki>Bringing PostgreSQL torwards the standard regarding case folding</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2006-10/msg01527.php <nowiki>Re: [SQL] Case Preservation disregarding case sensitivity?</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php <nowiki>TODO Item: Consider allowing control of upper/lower case folding of unquoted, identifiers</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00415.php <nowiki>Identifier case folding notes</nowiki>]<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00415.php <nowiki>Identifier case folding notes</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Add autonomous transactions<br />
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php <nowiki>autonomous transactions</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Give query progress indication<br />
* [[Query progress indication]]<br />
}}<br />
<br />
{{TodoItem<br />
|Rethink our type system<br />
* [[Rethinking datatypes]]<br />
}}<br />
<br />
{{TodoItem<br />
|Improve push-down of joins, aggregates, and sorts to foreign data wrappers<br />
* [http://www.postgresql.org/message-id/20131121150515.GC23976@momjian.us Status of FDW pushdowns]<br />
}}<br />
<br />
== Features We Do ''Not'' Want ==<br />
<br />
The following features have been discussed ad nauseum on the PostgreSQL mailing lists and the consensus has been that the project is not interested in them. As such, if you are going to bring them up as potential features, you will want to be familiar with all of the arguments against these features which have been previously made over the years. If you decide to work on such features anyway, you should be aware that you face a higher-than-normal barrier to get the Project to accept them.<br />
<br />
{{TodoItem<br />
|All backends running as threads in a single process (not wanted)<br />
|This eliminates the process protection we get from the current setup. Thread creation is usually the same overhead as process creation on modern systems, so it seems unwise to use a pure threaded model, and MySQL and DB2 have demonstrated that threads introduce as many issues as they solve. Threading specific operations such as I/O, seq scans, and connection management has been discussed and will probably be implemented to enable specific performance features. Moving to a threaded engine would also require halting all other work on PostgreSQL for one to two years.}}<br />
<br />
{{TodoItem<br />
|"Oracle-style" optimizer hints (not wanted)<br />
|Optimizer hints, as implemented in Oracle and other RDBMSes, are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have such problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. See [[OptimizerHintsDiscussion|Optimizer Hints Discussion]] for further information.}}<br />
<br />
{{TodoItem<br />
|Embedded server (not wanted)<br />
|While PostgreSQL clients runs fine in limited-resource environments, the server requires multiple processes and a stable pool of resources to run reliably and efficiently. Stripping down the PostgreSQL server to run in the same process address space as the client application would add too much complexity and failure cases. Besides, there are several very mature embedded SQL databases already available.}}<br />
<br />
{{TodoItem<br />
|Obfuscated function source code (not wanted)<br />
|Obfuscating function source code has minimal protective benefits because anyone with super-user access can find a way to view the code. At the same time, it would greatly complicate backups and other administrative tasks. To prevent non-super-users from viewing function source code, remove SELECT permission on pg_proc.<br />
* [http://archives.postgresql.org/pgsql-general/2008-09/msg00668.php <nowiki>Obfuscated stored procedures (was Re: Oracle and Postgresql)</nowiki>]<br />
}}<br />
<br />
{{TodoItem<br />
|Indeterminate behavior for the GROUP BY clause (not wanted)<br />
|At least one other database product allows specification of a subset of the result columns which GROUP BY would need to be able to provide predictable results; the server is free to return any value from the group. This is not viewed as a desirable feature. PostgreSQL 9.1 allows result columns that are not referenced by GROUP BY if a primary key for the same table is referenced in GROUP BY.<br />
* [http://archives.postgresql.org/pgsql-hackers/2010-03/msg00297.php <nowiki>Re: SQL compatibility reminder: MySQL vs PostgreSQL</nowiki>]<br />
}}<br />
<br />
</div><br />
<br />
[[Category:Todo]]</div>Amshttps://wiki.postgresql.org/index.php?title=BDR_User_Guide&diff=22340BDR User Guide2014-05-13T08:29:18Z<p>Ams: Add bdr_apply_pause/resume</p>
<hr />
<div>----<br />
This page is the users and administrators guide for BDR. If you're looking for technical details on the project plan and implementation, see [[BDR Project]].<br />
----<br />
<br />
= BDR User Guide =<br />
<br />
BDR (Bi-Directional Replication) is a feature being developed for inclusion in PostgreSQL core that provides greatly enhanced replication capabilities.<br />
<br />
BDR allows users to create a geographically distributed multi-master database using Logical Log Streaming Replication (LLSR) transport. It is designed to provide both high availability and geographically distributed disaster recovery capabilities. <br />
<br />
BDR is not “clustering” as some vendors use the term, in that it doesn't have a distributed lock manager, global transaction co-ordinator, etc. Each member server is separate yet connected, with design choices that allow separation between nodes that would not be possible with global transaction coordination.<br />
<br />
Guidance on getting a testing setup established are in [[#Initial setup]]. Please read the full documentation if you intend to put BDR into production.<br />
<br />
== Logical Log Streaming Replication ==<br />
<br />
Logical log streaming replication (LLSR) allows one PostgreSQL master (the "upstream master") to stream a sequence of changes to another read/write PostgreSQL server (the "downstream master"). Data is sent in one direction only over a normal <tt>libpq</tt> connection.<br />
<br />
Multiple LLSR connections can be used to set up bi-directional replication as discussed later in this guide.<br />
<br />
=== Overview of logical replication ===<br />
<br />
In some ways LLSR is similar to "streaming replication" i.e. physical log streaming replication (PLSR) from a user perspective; both replicate changes from one server to another. However, in LLSR the receiving server is also a full master database that can make changes, unlike the read-only replicas offered by PLSR hot standby. Additionally, LLSR is per-database, whereas PLSR is per-cluster and replicates all databases at once. There are many more differences discussed in the relevant sections of this document.<br />
<br />
In LLSR the data that is replicated is change data in a special format that allows the changes to be logically reconstructed on the downstream master. The changes are generated by reading transaction log (WAL) data, making change capture on the upstream master much more efficient than trigger based replication, hence why we call this "logical log replication". Changes are passed from upstream to downstream using the <tt>libpq</tt> protocol, just like with physical log streaming replication.<br />
<br />
One connection is required for each PostgreSQL database that is replicated. If two servers are connected, each of which has 50 databases then it would require 50 connections to send changes in one direction, from upstream to downstream. Each database to replicate must be explicitly specified so it is possible to filter out unwanted databases by not configuring replication for those databases.<br />
<br />
Setting up replication for new databases is not (yet?) automatic, so additional configuration steps are required after <tt>CREATE DATABASE</tt>. A restart of the downstream master is also required. The upstream master only needs restarting if the <tt>max_replication_slots</tt> parameter is too low to allow a new replica to be added. Adding replication for databases that do not exist yet will cause an ERROR, as will dropping a database that is being replicated. Setup is discussed in more detail below.<br />
<br />
Changes are processed by the downstream master using <tt>bdr</tt> plug-ins. This allows flexible handing of replication input, including:<br />
<br />
* BDR apply process - applies logical changes to the downstream master. The apply process makes changes directly rather than generating SQL text and then parse/plan/executing SQL.<br />
* Textual output plugin - a demo plugin that generates SQL text (but doesn't apply changes)<br />
* <tt>pg_xlogdump</tt> - examines physical WAL records and produces textual debugging output. This server program is included in PostgreSQL 9.3.<br />
<br />
=== Replication of DML changes ===<br />
<br />
All changes are replicated: <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> and <tt>TRUNCATE</tt>. <br />
<br />
(TRUNCATE is not yet implemented, but will be implemented before the feature goes to final release).<br />
<br />
Actions that generate WAL data but don't represent logical changes do not result in data transfer, e.g. full page writes, VACUUMs, hint bit setting. LLSR avoids much of the overhead from physical WAL, though it has overheads that mean that it doesn't always use less bandwidth than PLSR.<br />
<br />
Locks taken by <tt>LOCK</tt> and <tt>SELECT ... FOR UPDATE/SHARE</tt> on the upstream master are not replicated to downstream masters. Locks taken automatically by <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> or <tt>TRUNCATE</tt> *are* taken on the downstream master and may delay replication apply or concurrent transactions - see [[#Lock Conflicts|Lock Conflicts]].<br />
<br />
<tt>TEMPORARY</tt> and <tt>UNLOGGED</tt> tables are not replicated. In contrast to physical standby servers, downstream masters can use temporary and unlogged tables. However, temporary tables remain specific to a particular session so creating a temporary table on the upstream master does not create a similar table on the downstream master.<br />
<br />
<tt>DELETE</tt> and <tt>UPDATE</tt> statements that affect multiple rows on upstream master will cause a series of row changes on downstream master. These are likely to go at same speed as on the origin, as long as an index is defined on the Primary Key of the table on the downstream master. <tt>UPDATE</tt>s and <tt>DELETE</tt>s require some form of unique constraint, either <tt>PRIMARY KEY</tt> or <tt>UNIQUE NOT NULL</tt>. A warning is issued in the downstream master's logs if the expected constraint is absent. <tt>INSERT</tt> on upstream master do not require a unique constraint in order to replicate correctly, though such usage would prevent conflict detection between multiple masters, if that was considered important.<br />
<br />
<tt>UPDATE</tt>s that change the value of the Primary Key of a table will be replicated correctly.<br />
<br />
The values applied are the final values from the <tt>UPDATE</tt> on the upstream master, including any modifications from before-row triggers, rules or functions. Any reflexive conditions, such as N = N+ 1 are resolved to their final value. Volatile or stable functions are evaluated on the master side and the resulting values are replicated. Consequently any function side-effects (writing files, network socket activity, updating internal PostgreSQL variables, etc) will not occur on the replicas as the functions are not run again on the replica.<br />
<br />
All columns are replicated on each table. Large column values that would be placed in TOAST tables are replicated without problem, avoiding de-compression and re-compression. If we update a row but do not change a TOASTed column value, then that data is not sent downstream.<br />
<br />
All data types are handled, not just the built-in datatypes of PostgreSQL core. The only requirement is that user-defined types are installed identically in both upstream and downstream master (see "Limitations").<br />
<br />
The LLSR plugin uses the binary <tt>libpq</tt> protocol where the upstream and downstream masters are binary-compatible, i.e. they have the same PostgreSQL major version, same processor architecture and compatible compilation options. Where the upstream and downstream masters are not binary compatible, replication will fall back to the text protocol normally used for PostgreSQL client/server communication. In case of version differences it may also be necessary to upgrade the <tt>bdr</tt> extension on the older server to match the newer server.<br />
<br />
Sets of changes are accumulated in memory (spilling to disk where required) and then sent to the downstream server at commit time. Aborted transactions are never sent. Application of changes on downstream master is currently single-threaded, though this process is efficiently implemented. Parallel apply is a possible future feature, especially for changes made while holding <tt>AccessExclusiveLock</tt>.<br />
<br />
Changes are applied to the downstream master in the sequence in which they were committed on the upstream master. This is a known-good serialized ordering of changes, so replication serialization failures are not theoretically possible. Such failures are common in systems that use statement based replication (e.g. MySQL) or trigger based replication (e.g. Slony version 2.0). Users should note that this means the original order of locking of tables is not maintained. Although lock order is provably not an issue for the set of locks held on upstream master, additional locking on downstream side could cause lock waits or deadlocking in some cases. (Discussed in further detail later).<br />
<br />
Larger transactions spill to disk on the upstream master once they reach a certain size. Currently, large transactions can cause increased latency. Future enhancement will be to stream changes to downstream master once they fill the upstream memory buffer, though this is likely to be implemented in 9.5.<br />
<br />
<tt>SET</tt> statements and parameter settings are not replicated. This has no effect on replication since we only replicate actual changes, not anything at SQL statement level. We always update the correct tables, whatever the setting of <tt>search_path</tt>. Values are replicated correctly irrespective of the values of <tt>bytea_output</tt>, <tt>TimeZone</tt>, <tt>DateStyle</tt>, etc.<br />
<br />
<tt>NOTIFY</tt> is not supported across log based replication, either physical or logical. <tt>NOTIFY</tt> and <tt>LISTEN</tt> will work fine on the upstream master but an upstream <tt>NOTIFY</tt> will not trigger a downstream <tt>LISTEN</tt>er.<br />
<br />
In some cases, additional deadlocks can occur on apply. This causes an automatic retry of the apply of the replaying transaction and is only an issue if the deadlock recurs repeatedly, delaying replication.<br />
<br />
From a performance and concurrency perspective the BDR apply process is similar to a normal backend. Frequent conflicts with locks from other transactions when replaying changes can slow things down and thus increase replication delay, so reducing the frequency of such conflicts can be a good way to speed things up. Any lock held by another transaction on the downstream master - <tt>LOCK</tt> statements, <tt>SELECT ... FOR UPDATE/FOR SHARE</tt>, or <tt>INSERT</tt>/<tt>UPDATE</tt>/<tt>DELETE</tt> row locks - can delay replication if the replication apply process needs to change the locked table/row.<br />
<br />
=== Table definitions and DDL replication ===<br />
<br />
DML changes are replicated between tables with matching <tt>"Schemaname"."Tablename"</tt> on both upstream and downstream masters. e.g. changes from upstream's <tt>public.mytable</tt> will go to downstream's <tt>public.mytable</tt> while changes to the upstream <tt>mychema.mytable</tt> will go to the downstream <tt>myschema.mytable</tt>. This works even when no schema is specified on the original SQL since we identify the changed table from its internal OIDs in WAL records and then map that to whatever internal identifier is used on the downstream node.<br />
<br />
This requires careful synchronization of table definitions on each node otherwise <tt>ERROR</tt>s will be generated by the replication apply process. In general, tables must be an exact match between upstream and downstream masters. <br />
<br />
There are no plans to implement working replication between dissimilar table definitions.<br />
<br />
Tables must meet the following requirements to be compatible for purposes of LLSR. The best way to ensure an exact match is to define the table on one node and allow DDL replication to copy its definition to the other nodes, or to use <tt>init_replica</tt> to copy the definitions when bringing up a new BDR node. If you don't define a table / type / etc manually on multiple nodes then you won't have to worry, BDR takes care of ensuring compatibility for you.<br />
<br />
The requirements for compatibility are:<br />
<br />
* The downstream master must only have constraints (<tt>CHECK</tt>, <tt>UNIQUE</tt>, <tt>EXCLUSION</tt>, <tt>FOREIGN KEY</tt>, etc) that are also present on the upstream master. Replication may initially work with mismatched constraints but is likely to fail as soon as the downstream master rejects a row the upstream master accepted.<br />
* The table referenced by a FOREIGN KEY on a downstream master must have all the keys present in the upstream master version of the same table.<br />
* Storage parameters must match except for as allowed below<br />
* Inheritance must be the same<br />
* Dropped columns on master must be present on replicas<br />
* Custom types and enum definitions must match exactly<br />
* Composite types and enums must have the same oids on master and replication target<br />
* Extensions defining types used in replicated tables must be of the same version or fully SQL-level compatible and the oids of the types they define must match.<br />
<br />
The following differences are permissible between tables on different nodes:<br />
<br />
* The table's <tt>pg_class</tt> oid, the oid of its associated TOAST table, and the oid of the table's rowtype in <tt>pg_type</tt> may differ;<br />
* Extra or missing non-<tt>UNIQUE</tt> indexes<br />
* Extra keys in downstream lookup tables for <tt>FOREIGN KEY</tt> references that are not present on the upstream master<br />
* The table-level storage parameters for fillfactor and autovacuum<br />
* Triggers and rules may differ (they are not executed by replication apply)<br />
<br />
Replication of DDL changes between nodes is performed using event triggers, with partial support integrated in <tt>bdr-next</tt> (see [[#LLSR Limitations|LLSR Limitations]]).<br />
<br />
Triggers and Rules are NOT executed by apply on downstream side, equivalent to an enforced setting of <tt>session_replication_role = origin</tt>.<br />
<br />
In future it is expected that composite types and enums with non-identical oids will be converted using text output and input functions. This feature is not yet implemented.<br />
<br />
=== LLSR limitations ===<br />
<br />
The current LLSR implementation is subject to some limitations, which are being progressively removed as work progresses.<br />
<br />
==== Data definition compatibility ====<br />
<br />
Table definitions, types, extensions, etc must be near identical between upstream and downstream masters. See [[#Table definitions and DDL replication|Table definitions and DDL replication]].<br />
<br />
==== DDL Replication ====<br />
<br />
DDL replication is not yet fully supported. As of release 0.5 (tag <tt>bdr/0.5</tt>), BDR can replicate <tt>CREATE TABLE</tt>, <tt>CREATE SEQUENCE</tt> and <tt>CREATE INDEX</tt>, but not other DDL.<br />
<br />
The pending release after 0.5 adds <tt>CREATE TABLE</tt>, <tt>DROP TABLE</tt>, partial <tt>ALTER TABLE</tt> and a large number of other statements, allowing most DDL to be run on one node and replicated automatically to all others. <br />
<br />
===== In bdr-0.5 (older) =====<br />
<br />
This only applies to <tt>bdr/0.5</tt>; <tt>bdr-next</tt> has much stronger DDL replication support.<br />
<br />
<tt>CREATE TABLE</tt> will work without problems and will be automatically replicated to downstream nodes.<br />
<br />
Any <tt>ALTER TABLE</tt> may cause the definitions of tables on either end of a link to go out of sync, causing replication to fail.<br />
<br />
<tt>DROP TABLE</tt> of a table on a downstream master or BDR member may cause replication to halt as pending rows for that table cannot be applied.<br />
<br />
Additionally, the <tt>PRIMARY KEY</tt> of a table may not be dropped as BDR requires it for normal operation.<br />
<br />
Other indexes may be added and removed freely as they do not affect replication.<br />
<br />
==== TRUNCATE is not replicated ====<br />
<br />
<tt>TRUNCATE</tt> is not yet supported.<br />
<br />
The safest option is to define a user-level BEFORE trigger on each table that RAISEs an ERROR when TRUNCATE is attempted.<br />
<br />
A simple truncate-blocking trigger is:<br />
<br />
CREATE OR REPLACE FUNCTION deny_truncate() RETURNS trigger AS $$<br />
BEGIN<br />
IF tg_op = 'TRUNCATE' THEN<br />
RAISE EXCEPTION 'TRUNCATE is not supported on this table. Please use DELETE FROM.';<br />
ELSE<br />
RAISE EXCEPTION 'This trigger only supports TRUNCATE';<br />
END IF;<br />
END;<br />
$$ LANGUAGE plpgsql;<br />
<br />
It can be applied to a table with:<br />
<br />
CREATE TRIGGER deny_truncate_on_<tablename> BEFORE TRUNCATE ON <tablename><br />
FOR EACH STATEMENT EXECUTE PROCEDURE deny_truncate();<br />
<br />
A PL/PgSQL DO block that queries <tt>pg_class</tt> and loops over it to <tt>EXECUTE</tt> a dynamic SQL <tt>CREATE TRIGGER</tt> command for each table that does not already have the trigger can be used to apply the trigger to all tables.<br />
<br />
Alternately, there will be a <tt>ProcessUtility_hook</tt> available in the BDR extension to automatically prevent unsupported operations like <tt>TRUNCATE</tt>.<br />
<br />
=== Initial setup ===<br />
<br />
To set up LLSR or BDR you first need a patched PostgreSQL that can support LLSR/BDR, then you need to create one or more LLSR/BDR senders and one or more LLSR/BDR receivers.<br />
<br />
==== Installing the patched PostgreSQL binaries ====<br />
<br />
Currently BDR is only available in builds of the 'bdr' branch on Andres Freund's git repo on git.postgresql.org. PostgreSQL 9.3 and below do not support BDR, and 9.4 requires patches, so this guide will not work for you if you are trying to use a normal install of PostgreSQL.<br />
<br />
First you need to clone, configure, compile and install like normal. Clone the sources from <tt>git://git.postgresql.org/git/2ndquadrant_bdr.git</tt> and checkout the <tt>bdr</tt> branch.<br />
<br />
If you have an existing local PostgreSQL git tree specify it as <tt>--reference /path/to/existing/tree</tt> to greatly speed your git clone.<br />
<br />
Example:<br />
<br />
mkdir -p $HOME/bdr<br />
cd bdr<br />
git clone -b bdr git://git.postgresql.org/git/2ndquadrant_bdr.git $HOME/bdr/postgres-bdr-src<br />
cd postgres-bdr-src<br />
./configure --prefix=$HOME/bdr/postgres-bdr-bin<br />
make install<br />
(cd contrib/btree_gist && make install)<br />
(cd contrib/bdr && make install)<br />
<br />
This will put everything in <tt>$HOME/bdr</tt>, with the source code and build tree in <tt>$HOME/bdr/postgres-bdr-src</tt> and the installed PostgreSQL in <tt>$HOME/bdr/postgres-bdr-bin</tt>. This is a convenient setup for testing and development because it doesn't require you to set up new users, wrangle permissions, run anything as root, etc, but it isn't recommended that you deploy this way in production.<br />
<br />
To actually use these new binaries you will need to:<br />
<br />
export PATH=$HOME/bdr/postgres-bdr-bin/bin:$PATH<br />
<br />
before running <tt>initdb</tt>, <tt>postgres</tt>, etc. You don't have to use the <tt>psql</tt> or <tt>libpq</tt> you compiled but you're likely to get version mismatch warnings if you don't.<br />
<br />
=== Parameter Reference ===<br />
<br />
The following parameters are new or have been changed in PostgreSQL's new logical streaming replication.<br />
<br />
==== <tt>shared_preload_libraries = 'bdr'</tt> ====<br />
<br />
To load support for receiving changes on a downstream master, the <tt>bdr</tt> library must be added to the existing ‘shared_preload_libraries’ parameter. This loads the bdr library during postmaster start-up and allows it to create the required background worker(s).<br />
<br />
Upstream masters don't need to load the bdr library unless they're also operating as a downstream master as is the case in a BDR configuration.<br />
<br />
==== <tt>bdr.connections</tt> ====<br />
<br />
A comma-separated list of upstream master connection names is specified in <tt>bdr.connections</tt>. These names must be simple alphanumeric strings. They are used when naming the connection in error messages, configuration options and logs, but are otherwise of no special meaning.<br />
<br />
A typical two-upstream-master setting might be:<br />
<br />
bdr.connections = 'upstream1, upstream2'<br />
<br />
==== <tt>bdr.&lt;connection_name&gt;_dsn</tt> ====<br />
<br />
Each connection name must have at least a data source name specified using the <tt>bdr.&lt;connection_name&gt;_dsn</tt> parameter. The DSN syntax is the same as that used by libpq so it is not discussed in further detail here. A <tt>dbname</tt> for the database to connect to must be specified; all other parts of the DSN are optional.<br />
<br />
The local (downstream) database name is assumed to be the same as the name of the upstream database being connected to, though future versions will make this configurable.<br />
<br />
For the above two-master setting for <tt>bdr.connections</tt> the DSNs might look like:<br />
<br />
bdr.upstream1_dsn = 'host=10.1.1.2 user=postgres dbname=replicated_db'<br />
bdr.upstream2_dsn = 'host=10.1.1.3 user=postgres dbname=replicated_db'<br />
<br />
==== <tt>bdr.synchronous_commit</tt> ====<br />
<br />
This boolean option controls the <tt>synchronous_commit</tt> setting for BDR apply workers. It defaults to <tt>on</tt>.<br />
<br />
If set to <tt>off</tt>, BDR apply workers will perform async commits, allowing PostgreSQL to considerably improve throughput. It is safe unless you intend to run BDR with synchronous replication, in which case <tt>bdr.synchronous_commit</tt> must be left <tt>on</tt>.<br />
<br />
==== <tt>bdr.default_apply_delay</tt> ====<br />
<br />
Sets the default for <tt>bdr.<conname>_apply_delay</tt> for all configured connections.<br />
<br />
Primarily useful for debugging.<br />
<br />
Added after 0.5.<br />
<br />
==== <tt>bdr.<connection_name>_apply_delay</tt> ====<br />
<br />
This parameter, which defaults to zero, causes the application of received transactions to be delayed by the specified number of milliseconds.<br />
<br />
It is mostly useful for testing and debugging purposes, but may also be used to provide a replica of the database at a known point in the recent past.<br />
<br />
==== <tt>bdr.log_conflicts_to_table</tt> ====<br />
<br />
Boolean, controls whether detected BDR conflicts get logged to a <tt>bdr.bdr_conflict_history</tt> table.<br />
<br />
Added after 0.5, subject to change.<br />
<br />
==== <tt>bdr.<connection_name>_init_replica</tt> ====<br />
<br />
Added after BDR 0.5.<br />
<br />
This parameter defaults to <tt>off</tt>. If set to <tt>on</tt>, it will cause BDR to dump the database pointed to by this <tt>bdr.<connection_name>.dsn</tt> before starting replication and apply the dump to the local database, which must be empty.<br />
<br />
The dump is guaranteed to be consistent with the start point for replication.<br />
<br />
The following parameters become required for this connection if and only if <tt>init_replica</tt> is enabled.<br />
<br />
* <tt>bdr.<connection_name>_replica_local_dsn</tt><br />
<br />
==== <tt>bdr.<connection_name>_replica_local_dsn</tt> ====<br />
<br />
Added after BDR 0.5. Ignored unless <tt>_init_replica</tt> is <tt>on</tt> for this connection, required if it is <tt>on</tt>.<br />
<br />
A connection string that is passed to the script at <tt>bdr.<connection_name>_replica_script_path</tt>, telling it which local database to connect to in order to apply the dump of the remote DB.<br />
<br />
The connection string is only visible to superusers, and should specify a superuser connection. You may include a password in the connection string if required, or put it in the separate <tt>.pgpass</tt> file for the <tt>postgres</tt> user.<br />
<br />
==== <tt>bdr.temp_dump_directory</tt> ====<br />
<br />
Added after BDR 0.5. Has no effect without <tt>bdr.<conname>_init_replica=on</tt> for one or more connections.<br />
<br />
Specifies the path to a temporary storage location, writable by the <tt>postgres</tt> user, that has enough storage space to contain a complete dump of the database at <tt>bdr.<connection_name>_dsn</tt> for each configured connection with <tt>init_replica</tt> enabled.<br />
<br />
Only used during initial bringup.<br />
<br />
==== <tt>bdr.max_workers</tt> ====<br />
<br />
Allocates shared memory space for BDR worker configuration information. You can ignore this parameter at the moment.<br />
<br />
This parameter is auto-calculated from the number of <tt>bdr.connections</tt>, with the assumption that each connection has a separate database and thus needs two connections. This wastes a small amount of shared memory, but the impact is minimal. It isn't otherwise useful - it'll be important when BDR is enhanced to allow new connections to be added at runtime, but isn't currently worth paying attention to.<br />
<br />
Added after BDR 0.5.<br />
<br />
==== <tt>max_replication_slots</tt> ====<br />
<br />
The new parameter <tt>max_replication_slots</tt> has been added for use on both upstream and downstream masters. This parameter controls the maximum number of logical replication slots - upstream or downstream - that this cluster may have at a time. It must be set at postmaster start time.<br />
<br />
As logical replication slots are persistent, slots are consumed even by replicas that are not currently connected. Slot management is discussed in Starting, Stopping and Managing Replication.<br />
<br />
<tt>max_replication_slots</tt> should be set to the sum of the number of logical replication upstream masters this server will have plus the number of logical replication downstream masters will connect to it it.<br />
<br />
==== <tt>wal_level = 'logical'</tt> ====<br />
<br />
A new setting, <tt>'logical'</tt>, has been added for the existing <tt>wal_level</tt> parameter. <tt>‘logical’</tt> includes everything that the existing <tt>hot_standby</tt> setting does and adds additional details required for logical changeset decoding to the write-ahead logs. <br />
<br />
This additional information is consumed by the upstream-master-side xlog decoding worker. Downstream masters that do not also act as upstream masters do not require <tt>wal_level</tt> to be increased above the default <tt>'minimal'</tt>.<br />
<br />
<tt>wal_level</tt>, except for the new <tt>'logical'</tt> setting, is [http://www.postgresql.org/docs/current/static/runtime-config-wal.html documented in the main PostgreSQL manual].<br />
<br />
==== <tt>max_wal_senders</tt> ====<br />
<br />
Logical replication hasn't altered the <tt>max_wal_senders</tt> parameter, but it is important in upstream masters for logical replication and BDR because every logical sender consumes a <tt>max_wal_senders</tt> entry.<br />
<br />
You should configure <tt>max_wal_senders</tt> to the sum of the number of physical and logical replicas you want to allow an upstream master to serve. If you intend to use <tt>pg_basebackup</tt> you should add at least two more senders to allow for its use.<br />
<br />
Like <tt>max_replication_slots</tt>, <tt>max_wal_senders</tt> entries don't cost a large amount of memory, so you can overestimate fairly safely.<br />
<br />
<tt>max_wal_senders</tt> is documented in [http://www.postgresql.org/docs/current/static/runtime-config-replication.html the main PostgreSQL documentation].<br />
<br />
==== <tt>track_commit_timestamp</tt> ====<br />
<br />
Setting this parameter to "on" enables commit timestamp tracking, which is used to implement last-UPDATE-wins conflict resolution.<br />
<br />
It is also required for use of the <tt>pg_get_transaction_committime</tt> function.<br />
<br />
=== Function reference ===<br />
<br />
BDR / LSLR adds a number of functions. Some of them have been integrated into PostgreSQL 9.4 and [http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-REPLICATION can be found in the 9.4 documentation]. Others are pending integration; those are listed here.<br />
<br />
==== <tt>pg_get_transaction_committime</tt> ====<br />
<br />
<tt>pg_get_transaction_committime(txid integer)</tt>: Get the timestamp at which the specified transaction, as identified by transaction ID, committed. This function can be useful when monitoring replication lag.<br />
<br />
This function is added by [https://commitfest.postgresql.org/action/patch_view?id=1265 the commit timestamp patch set] and is included in the <tt>bdr</tt> branch.<br />
<br />
==== <tt>pg_xlog_wait_remote_apply</tt> ====<br />
<br />
The <tt>pg_xlog_wait_remote_apply(lsn text, pid integer)</tt> function allows you to wait on an upstream master until all downstream masters' replication has caught up to a certain point.<br />
<br />
The <tt>lsn</tt> argument is a Log Sequence Number, an identifier for the WAL (Write-Ahead Log) record you want to make sure has been applied on all nodes. The most useful record you will want to wait for is <tt>pg_current_xlog_location()</tt>, as discussed in [http://www.postgresql.org/docs/current/static/functions-admin.html PostgreSQL Admin Functions] in the manual.<br />
<br />
The <tt>pid</tt> argument specifies the process ID of a walsender to wait for. It may be set to zero to wait until the receivers associated with all walsenders on this upstream master have caught up to the specified <tt>lsn</tt>, or to a process ID obtained from <tt>pg_stat_replication.pid</tt> to wait for just one downstream to catch up.<br />
<br />
The most common use is:<br />
<br />
select pg_xlog_wait_remote_apply(pg_current_xlog_location(), 0)<br />
<br />
which will wait until all downstream masters have applied changes up to the time on the upstream master at which <tt>pg_xlog_wait_remote_apply</tt> was called. <br />
<br />
<tt>pg_current_xlog_location</tt> is not transactional, so unlike things like <tt>current_timestamp</tt> it'll always return the very latest status server-wide, irrespective of how long the current transaction has been running for and when it started.<br />
<br />
==== <tt>pg_xlog_wait_remote_receive</tt> ====<br />
<br />
<tt>pg_xlog_wait_remote_receive</tt> is the same as <tt>pg_xlog_wait_remote_apply</tt>, except that it only waits until the remote node has confirmed that it's received the given LSN, not until it has actually applied it after receiving it.<br />
<br />
==== <tt>bdr_apply_pause</tt> ====<br />
<br />
The <tt>bdr_apply_pause()</tt> function allows you to temporarily stop the application of changes from upstream masters. Running:<br />
<br />
select bdr_apply_pause()<br />
<br />
will pause until the <tt>bdr_apply_resume()</tt> function is executed.<br />
<br />
==== <tt>bdr_apply_resume</tt> ====<br />
<br />
The <tt>bdr_apply_resume()</tt> function resumes application of changes after it has been paused by <tt>bdr_apply_pause()</tt>.<br />
<br />
=== Catalog changes ===<br />
<br />
BDR has a number of its own catalogs for metadata and state. It also introduces a few changes to core system catalogs.<br />
<br />
==== <tt>pg_catalog.pg_seqam</tt> ====<br />
<br />
This is a new core system catalog added in the BDR patchset, in the sequence access methods patch. It is due to be submitted for inclusion in PostgreSQL core in 9.5 but is currently maintained as part of BDR.<br />
<br />
To support [[#Distributed_Sequences|distributed sequences]], BDR adds an access method abstraction for sequences. It serves a similar purpose to index access methods - it abstracts the implementation of sequence storage from usage of sequences, so the client doesn't need to care whether it's using a distributed sequence, a local sequence, or something else entirely.<br />
<br />
This access method is described by the <tt>pg_seqam</tt> table. Two entries are defined:<br />
<br />
postgres=# select * from pg_seqam ;<br />
seqamname | seqamalloc | seqamsetval | seqamoptions <br />
-----------+----------------------+-----------------------+------------------------<br />
local | sequence_local_alloc | sequence_local_setval | sequence_local_options<br />
bdr | bdr_sequence_alloc | bdr_sequence_setval | bdr_sequence_options<br />
(2 rows)<br />
<br />
<tt>local</tt> is the traditional local-only sequence access method.<br />
<br />
<tt>bdr</tt> is for distributed sequences. For more information, see the [[#Distributed_Sequences|distributed sequences]] section.<br />
<br />
==== <tt>bdr.bdr_conflict_handlers</tt> ====<br />
<br />
The <tt>bdr_conflict_handlers</tt> table contains user defined conflict handlers ("conflict triggers") that can be used to implement application-specific conflict resolution.<br />
<br />
See "Conflict Resolution by user-defined handlers", below.<br />
<br />
==== <tt>bdr.bdr_conflict_history</tt> ====<br />
<br />
The <tt>bdr_conflict_history</tt> table is a log table that records detected conflicts. See "Monitoring" for details.<br />
<br />
==== <tt>bdr.bdr_nodes</tt> ====<br />
<br />
<tt>bdr_nodes</tt> is a global state table that tracks all known members of the BDR group, online or not.<br />
<br />
If BDR is not set up in a star topology then each node still needs information about the existence and state of the other nodes. It cannot rely on the locally configured connections or slots. <tt>bdr_nodes</tt> maintains that information.<br />
<br />
In general you don't need to work directly with <tt>bdr.bdr_nodes</tt>. See the source code for details on its use.<br />
<br />
==== <tt>bdr.bdr_queued_commands</tt> and <tt>bdr.bdr_queued_drops</tt> ====<br />
<br />
<tt>bdr_queued_commands</tt> and <tt>bdr.bdr_queued_drops</tt> are an implementation detail that is generally not of concern to users.<br />
<br />
Rows are inserted into <tt>bdr_queued_commands</tt> when an event trigger detects a DDL command that can be replicated. The rows get replicated to other nodes, which detect the special case of an insertion into this table and execute the DDL command described in the row.<br />
<br />
A similar principle applies for <tt>bdr.bdr_queued_drops</tt>.<br />
<br />
You don't need to work directly with these tables. See the source code for details on their use.<br />
<br />
==== <tt>bdr.bdr_sequence_elections</tt>, <tt>bdr.bdr_sequence_values</tt> and <tt>bdr.bdr_votes</tt> ====<br />
<br />
These tables are implementation detail for global sequences. <br />
<br />
You don't need to work directly with these tables. See the source code for details on their use.<br />
<br />
=== Distributed Sequences ===<br />
<br />
Distributed sequences, or global sequences, are a sequence that is synchronized across all the nodes in a BDR cohort. A distributed sequence is more expensive to access than a purely local sequence, but it produces values that are guaranteed unique across the entire cohort.<br />
<br />
Using distributed sequences allows you to avoid the problems with inserts conflicts. If you define a <tt>PRIMARY KEY</tt> or <tt>UNIQUE</tt> column with a <tt>DEFAULT nextval(...)</tt> expression that refers to a global sequence shared across all nodes in a BDR cohort, it is not possible for any node to ever get the same value as any other node. When BDR synchronizes inserts between the nodes, they can never conflict.<br />
<br />
There is no need to use a distributed sequence if:<br />
<br />
* You are ensuring global uniqueness using another method such as:<br />
** Local sequences with an offset and increment;<br />
** UUIDs;<br />
** An externally co-ordinated natural key<br />
<br />
* You are using the data in a <tt>TEMPORARY</tt> or <tt>UNLOGGED</tt> table, as these are never visible outside the current node.<br />
<br />
(All of the following is subject to change and requires periodic review).<br />
<br />
You can get a listing of distributed sequences defined in a database with:<br />
<br />
SELECT *<br />
FROM pg_class <br />
INNER JOIN pg_seqam ON (pg_class.relam = pg_seqam.oid) <br />
WHERE pg_seqam.seqamname = 'bdr' AND relkind = 'S';<br />
<br />
(See <tt>[[#pg_seqam|pg_seqam]]</tt> for information on the new <tt>pg_seqam</tt> catalog table).<br />
<br />
New distributed sequences may be created with the <tt>USING</tt> clause to <tt>CREATE SEQUENCE</tt>:<br />
<br />
CREATE SEQUENCE test_seq USING bdr;<br />
<br />
Once you've created a distributed sequence you may use it with <tt>nextval</tt> like any other sequence.<br />
<br />
A few limitations and caveats apply to global sequences at time of writing:<br />
<br />
* Only an <tt>INCREMENT</tt> of 1 (the default) is supported. Client applications that expect a different increment must be configured to handle increment 1. An extended variant of <tt>nextval</tt> that takes the number of values to obtain as an argument and returns a set of values is planned as an extension to aid in porting.<br />
<br />
* <tt>MINVALUE</tt> and <tt>MAXVALUE</tt> are locked at their defaults and may not be changed.<br />
<br />
* <tt>START WITH</tt> may not be specified; however, <tt>setval</tt> may be used to set the start value after the sequence is created.<br />
<br />
* The <tt>CACHE</tt> directive is not supported.<br />
<br />
* Sequence values are handed out in chunks, so if three different nodes all call <tt>nextval</tt> at the same time they might get values 50, 150 and 250. Thus, at time 't' <tt>nextval</tt> on one node may return a value higher than a <tt>nextval</tt> call at time 't+1' on another node. Within a single node the usual rules for <tt>nextval</tt> still apply.<br />
<br />
The details used by BDR to manage global sequences are in the <tt>bdr_sequence_values</tt>, <tt>bdr_sequence_elections</tt> and <tt>bdr_votes</tt> tables in the <tt>public</tt> schema, though these details are subject to change.<br />
<br />
=== Configuration ===<br />
<br />
Details on individual parameters are described in the [[#Parameter_Reference|parameter reference]] section.<br />
<br />
The following configuration is an example of a simple one-way LLSR replication setup - a single upstream master to a single downstream master.<br />
<br />
The upstream master (sender)'s <tt>postgresql.conf</tt> should contain settings like:<br />
<br />
wal_level = 'logical' # Include enough info for logical replication<br />
max_replication_slots = X # Number of LLSR senders + any receivers<br />
max_wal_senders = Y # Y = max_replication_slots plus any physical <br />
# streaming requirements<br />
track_commit_timestamp = on # Not strictly required for LLSR, only for BDR<br />
# conflict resolution.<br />
<br />
Downstream (receiver) <tt>postgresql.conf</tt>:<br />
<br />
shared_preload_libraries = 'bdr'<br />
<br />
bdr.connections="name_of_upstream_master" # list of upstream master nodenames<br />
bdr.<nodename>_dsn = 'dbname=postgres' # connection string for connection<br />
# from downstream to upstream master<br />
bdr.<nodename>_local_dbname = 'xxx' # optional parameter to cover the case <br />
# where the databasename on upstream <br />
# and downstream master differ. <br />
# (Not yet implemented)<br />
bdr.<nodename>_apply_delay # optional parameter to delay apply of<br />
# transactions, time in milliseconds <br />
bdr.synchronous_commit = off; # optional parameter to set the<br />
# synchronous_commit parameter the<br />
# apply processes will be using.<br />
# Safe to set to 'off' unless you're<br />
# doing synchronous replication.<br />
max_replication_slots = X # set to the number of remotes<br />
track_commit_timestamp = on # Not strictly required for LLSR,<br />
# only for BDR conflict resolution.<br />
<br />
Note that a server can be both sender and receiver, either two servers to each other or more complex configurations like replication chains/trees.<br />
<br />
The upstream (sender) <tt>pg_hba.conf</tt> must be configured to allow the downstream master to connect for replication. Otherwise you'll see errors like the following on the downstream master:<br />
<br />
FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "[local]", user "postgres"<br />
<br />
A suitable <tt>pg_hba.conf</tt> entry for a replication connection from the replica server 10.1.4.8 might be:<br />
<br />
host replication postgres 10.1.4.8/32 trust<br />
<br />
(the user name should match the user name configured in the downstream master's dsn. md5 password authentication is supported.)<br />
<br />
For more details on these parameters, see [[#Parameter Reference|Parameter Reference]].<br />
<br />
=== Troubleshooting ===<br />
<br />
==== Could not access file "bdr": No such file or directory ====<br />
<br />
If you see the error:<br />
<br />
FATAL: could not access file "bdr": No such file or directory<br />
<br />
when starting a database set up to receive BDR replication, you probably forgot to install <tt>contrib/bdr</tt>. See above.<br />
<br />
==== Invalid value for parameter ====<br />
<br />
An error like:<br />
<br />
LOG: invalid value for parameter ...<br />
<br />
when setting one of these parameters means your server doesn't support logical replication and will need to be patched or updated.<br />
<br />
==== Couldn't find logical slot ====<br />
<br />
An error like:<br />
<br />
ERROR: couldn't find logical slot "bdr: 16384:5873181566046043070-1-24596:"<br />
<br />
on the upstream master suggests that a downstream master is trying to connect to a logical replication slot that no longer exists. The slot can not be re-created, so it is necessary to re-seed the downstream replica database.<br />
<br />
=== Operational Issues and Debugging ===<br />
<br />
In LLSR there are no user-level (ie SQL visible) ERRORs that have special meaning. Any ERRORs generated are likely to be serious problems of some kind, apart from apply deadlocks, which are automatically re-tried.<br />
<br />
=== Monitoring ===<br />
<br />
The following tables and views are available for monitoring replication activity:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE pg_stat_replication]</tt><br />
* <tt>[http://www.postgresql.org/docs/devel/static/catalog-pg-replication-slots.html pg_replication_slots]</tt><br />
* <tt>pg_stat_bdr</tt> (described below)<br />
* <tt>bdr_nodes</tt><br />
<br />
The following configuration and logging parameters are useful for monitoring replication:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt><br />
<br />
==== <tt>pg_replication_slots</tt> ====<br />
<br />
The <tt>pg_replication_slots</tt> view is specific to logical replication. It was incorporated into PostgreSQL 9.4 after the release of BDR 0.5 so the documentation for it has been removed from here; see <tt>[http://www.postgresql.org/docs/devel/static/catalog-pg-replication-slots.html pg_replication_slots]</tt> in the PostgreSQL manual.<br />
<br />
==== <tt>bdr.pg_stat_bdr</tt> ====<br />
<br />
The <tt>bdr.pg_stat_bdr</tt> view is supplied by the <tt>bdr</tt> extension. It provides information on a server's connection(s) to its upstream master(s).<br />
<br />
The primary purpose of this view is to report statistics on the progress of LLSR apply on a per-upstream master connection basis.<br />
<br />
View structure:<br />
<br />
View "public.pg_stat_bdr"<br />
Column | Type | Modifiers <br />
--------------------+--------+-----------<br />
rep_node_id | oid | <br />
riremotesysid | name | <br />
riremotedb | oid | <br />
rilocaldb | oid | <br />
nr_commit | bigint | <br />
nr_rollback | bigint | <br />
nr_insert | bigint | <br />
nr_insert_conflict | bigint | <br />
nr_update | bigint | <br />
nr_update_conflict | bigint | <br />
nr_delete | bigint | <br />
nr_delete_conflict | bigint | <br />
nr_disconnect | bigint | <br />
<br />
Fields:<br />
<br />
* <tt>rep_node_id</tt>: An internal identifier for the replication slot.<br />
<br />
* <tt>riremotesysid</tt>: The remote database system identifier, as reported by the <tt>Database system identifier</tt> line of <tt>pg_controldata /path/to/datadir</tt><br />
<br />
* <tt>riremotedb</tt>: The remote database OID, ie the <tt>oid</tt> column of the remote server's <tt>pg_catalog.pg_database</tt> entry for the replicated database. You can get the database name with <tt>select datname from pg_database where oid = 12345</tt> (where '12345' is the <tt>riremotedb</tt> oid).<br />
<br />
* <tt>rilocaldb </tt>: The local database OID, with the same meaning as <tt>riremotedb</tt> but with oids from the local system.<br />
<br />
''The rest of the rows are statistics about this upstream master slot'':<br />
<br />
* <tt>nr_commit</tt>: Number of commits applied to date from this master<br />
<br />
* <tt>nr_rollback</tt>: Number of rollbacks performed by this apply process due to recoverable errors (deadlock retries, lost races, etc) or unrecoverable errors like mismatched constraint errors.<br />
<br />
* <tt>nr_insert</tt>: Number of <tt>INSERT</tt>s performed<br />
<br />
* <tt>nr_insert_conflict</tt>: Number of <tt>INSERT</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_update</tt>: Number of <tt>UPDATE</tt>s performed<br />
<br />
* <tt>nr_update_conflict</tt>: Number of <tt>UPDATE</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_delete</tt>: Number of deletes performed<br />
<br />
* <tt>nr_delete_conflict</tt>: Number of deletes that resulted in conflicts.<br />
<br />
* <tt>nr_disconnect</tt>: Number of times this apply process has lost its connection to the upstream master since it was started.<br />
<br />
<br />
This view does not contain any information about how far behind the upstream master this downstream master is. The upstream master's <tt>pg_stat_logical_decoding</tt> and <tt>pg_stat_replication</tt> views must be queried to determine replication lag.<br />
<br />
==== Monitoring uses of <tt>bdr.bdr_nodes</tt> ====<br />
<br />
While generally not intended for end-user access, <tt>bdr.bdr_nodes</tt> may be queried to see the current state of any node initialization.<br />
<br />
A node writes a row for its <tt>(sysid, dbname, timelineid)</tt> in <tt>bdr.bdr_nodes</tt> when it is joining the BDR group. The only field of interest to users is <tt>status</tt>, which may have the values:<br />
<br />
* <tt>i</tt>: The node is doing initial slot creation or an initial dump and load (see <tt>init_replica</tt>, above).<br />
<br />
* <tt>c</tt>: The node is catching up to its <tt>init_replica</tt> target node and is not yet ready to participate fully in BDR.<br />
<br />
* <tt>r</tt>: The node is fully ready. Slots may be created on this node and it may participate fully in BDR.<br />
<br />
Note that the status doesn't indicate whether the node is actually up right now. A node may be shut down, isolated from the network, or crashed and still appear as <tt>r</tt> in <tt>bdr.bdr_nodes</tt> because it's still conceptually part of the BDR group.<br />
<br />
At this time there are no SQL-level functions for adding/removing nodes. <b>Do not directly modify <tt>bdr.bdr_nodes</tt>.</b>.<br />
<br />
==== Monitoring <tt>bdr.bdr_conflict_history</tt> ====<br />
<br />
<tt>bdr.bdr_conflict_history</tt> tracks conflicts that arise in replication. To learn more about conflicts see [[#Conflict Detection & Resolution]].<br />
<br />
At this time only detected conflicts are recorded in <tt>bdr.bdr_conflict_history</tt>. In future logging of unhandled conflicts where the conflict isn't detected until an apply statement fails with an <tt>ERROR</tt> will be added.<br />
<br />
Unlike most tables, the contents of <tt>bdr.bdr_conflict_history</tt> are ''not'' replicated between nodes. Each node has a local copy of the table with distinct data. This is a technical limitation that may be lifted in a future release, but it also saves on unnecessary replication overhead.<br />
<br />
You can use the conflict history table to determine how rapidly your application creates conflicts and where those conflicts occur, allowing you to improve the application to reduce conflict rates. It also helps detect cases where conflict resolutions may not have produced the desired results, allowing you to identify places where a user defined conflict trigger or an application design change may be desirable.<br />
<br />
Row values may optionally be logged for row conflicts. This is controlled by the global database-wide option <tt>bdr.log_conflicts_to_table</tt>. There is no per-table control over row value logging at this time. Nor is there any limit applied on the number of fields a row may have, number of elements dumped in arrays, length of fields, etc, so it may not be wise to enable this if you regularly work with multi-megabyte rows that may trigger conflicts.<br />
<br />
Because the conflict history table contains data on every table in the database so each row's schema might be different, if row values are logged they are stored as <tt>json</tt> fields. The json is created with <tt>[http://www.postgresql.org/docs/current/static/functions-json.html row_to_json]</tt>, just like if you'd called it on the row yourself from SQL. There is no corresponding <tt>json_to_row</tt> function in PostgreSQL at this time, so you'll need table-specific code (pl/pgsql, pl/python, pl/perl, whatever) if you want to reconstruct a composite-typed tuple from the logged json.<br />
<br />
The structure of <tt>bdr_conflict_history</tt> is:<br />
<br />
Column | Type |<br />
--------------------------+-----------------------------+<br />
conflict_id | bigint |<br />
local_node_sysid | text |<br />
local_conflict_xid | xid |<br />
local_conflict_lsn | pg_lsn |<br />
local_conflict_time | timestamp with time zone |<br />
object_schema | text |<br />
object_name | text |<br />
remote_node_sysid | text |<br />
remote_txid | xid |<br />
remote_commit_time | timestamp with time zone |<br />
remote_commit_lsn | pg_lsn |<br />
conflict_type | bdr.bdr_conflict_type |<br />
conflict_resolution | bdr.bdr_conflict_resolution |<br />
local_tuple | json |<br />
remote_tuple | json |<br />
local_tuple_xmin | xid |<br />
local_tuple_origin_sysid | text |<br />
error_message | text |<br />
error_sqlstate | text |<br />
error_querystring | text |<br />
error_cursorpos | integer |<br />
error_detail | text |<br />
error_hint | text |<br />
error_context | text |<br />
error_columnname | text |<br />
error_typename | text |<br />
error_constraintname | text |<br />
error_filename | text |<br />
error_lineno | integer |<br />
error_funcname | text |<br />
<br />
<br />
The primary key is <tt>(conflict_id, local_node_sysid)</tt>, where the conflict ID is auto-generated. The composite key allows replication between nodes to be added later without conflicts arising.<br />
<br />
Fields:<br />
<br />
* <tt>conflict_id</tt>: A locally unique key identifying each conflict.<br />
* <tt>local_node_sysid</tt>: The unique system identifier of the node that was applying the change and encountered the conflict. This is the receiving end of the slot.<br />
* <tt>local_conflict_xid</tt>: The transaction identifier of the apply transaction that detected the conflict.<br />
* <tt>local_conflict_lsn</tt>: The transaction log position the applying server was at when the conflict was detected. This can be used to order conflicts in time series.<br />
* <tt>local_conflict_time</tt>: The wall-clock time at which the conflict was detected on the local machine. This is ''not'' the time the conflict was originally created by a user SQL command.<br />
* <tt>object_schema</tt>: If this conflict applies to a particular database object (usually a table), the schema that object is in.<br />
* <tt>object_name</tt>: If this conflict applies to a particular database object, the name of that object, e.g. "my_table".<br />
* <tt>remote_node_sysid</tt>: The unique system identifier of the node that the change was received from. Unless catchup mode is active (where one node relays for another), this is the node that actually created the conflicting change.<br />
* <tt>remote_txid</tt>: The transaction ID on the remote node that created the conflicting change.<br />
* <tt>remote_commit_time</tt>: The wall-clock time that <tt>remote_txid</tt> committed on the remote node. This is read from the remote node's clock.<br />
* <tt>remote_commit_lsn</tt>: The transaction log position of the commit that ended the transaction this conflict was created in.<br />
* <tt>conflict_type</tt>: The type of conflict detected, e.g. <tt>insert_insert</tt>. For details on conflict types see the documentation on apply conflicts linked above. Possible values are:<br />
** <tt>insert_insert</tt>: Two <tt>INSERT</tt>s created the same key.<br />
** <tt>update_update</tt>: Two <tt>UPDATE</tt>s tried to modify the same row version.<br />
** <tt>update_delete</tt>: An <tt>UPDATE</tt> tried to modify a row that was concurrently <tt>DELETE</tt>d on another node. This conflict is only detected on the side that executed the conflicting <tt>DELETE</tt>; the side that did the <tt>UPDATE</tt> first doesn't see any conflict.<br />
** <tt>unhandled_tx_abort</tt>.<br />
* <tt>conflict_resolution</tt>: How BDR resolved this conflict:<br />
** <tt>conflict_trigger_skip_change</tt>: A user defined conflict handler decided to ignore this change completely, so this change was discarded.<br />
** <tt>conflict_trigger_returned_tuple</tt>: A user defined conflict handler decided to generate a replacement row instead of applying the local or remote rows. The replacement row was applied.<br />
** <tt>last_update_wins_keep_local</tt>: Timestamps were used to resolve the change in favour of the most recent change, which was the row already present on the local node, so this change was discarded.<br />
** <tt>last_update_wins_keep_remote</tt>: Timestamps were used to resolve the change in favour of the most recent change, which was the row sent by the remote node, so this change was applied.<br />
** <tt>unhandled_tx_abort</tt>: BDR did not have any way to resolve this conflict, a conflict trigger threw an exception, or another unhandled (possibly transient) error occurred. Examine the <tt>error_</tt> fields for details. Many non-error field will be unset in this case.<br />
* <tt>local_tuple</tt>: If tuple logging enabled, a json representation of the conflicting local tuple already present on this node if any.<br />
* <tt>remote_tuple</tt>: If tuple logging enabled, a json representation of the conflicting remote tuple received, if any.<br />
* <tt>local_tuple_xmin</tt>: The transaction ID that created the most recent version of the conflicting local tuple, if known.<br />
* <tt>local_tuple_origin_sysid</tt>: If the local tuple was replicated from a remote node using BDR, the system identifier of the real origin node. Null if the tuple was created directly on the local node.<br />
* <tt>error_message</tt>: For unhandled errors, the main error message<br />
* <tt>error_sqlstate</tt>: For unhandled errors, the SQLSTATE; see [http://www.postgresql.org/docs/current/static/errcodes-appendix.html error codes].<br />
* <tt>error_querystring</tt>: For unhandled errors, the text of the query that failed if available. (Currently never populated).<br />
* <tt>error_cursorpos</tt>: For unhandled errors, the position of the error within the query, if supplied by the server. (Currently never populated).<br />
* <tt>error_detail</tt>: For unhandled errors, any additional <tt>DETAIL</tt> section included in the error message.<br />
* <tt>error_hint</tt>: For unhandled errors, any additional <tt>HINT</tt> section included in the error message.<br />
* <tt>error_context</tt>: For unhandled errors, a call context like a pl/pgsql stack, containing function, etc.<br />
* <tt>error_columnname</tt>: For unhandled errors, the column of the target table if a specific column was affected. The table schema and name are in <tt>object_schema</tt> and <tt>object_name</tt>.<br />
* <tt>error_typename</tt>: For unhandled errors applying to a particular data type (like cast / conversion errors), the type name.<br />
* <tt>error_constraintname</tt>: For unhandled errors applying to a particular table constraint, the name of the constraint that was violated.<br />
* <tt>error_filename</tt>: For unhandled errors, the PostgreSQL source file name of the location that raised the error.<br />
* <tt>error_lineno</tt>: For unhandled errors, the line number within <tt>error_filename</tt> that raised the error.<br />
* <tt>error_funcname</tt>: For unhandled errors, the name of the PostgreSQL C-level function that raised the error (if known).<br />
<br />
At this time none of the <tt>error_</tt> fields are used. In future they will be populated with the fields from an unhandled error, matching those error fields output by <tt>psql</tt> like <tt>HINT</tt>, etc.<br />
<br />
==== Monitoring replication status and lag ====<br />
<br />
As with any replication setup, it is vital to monitor replication status on all BDR nodes to ensure no node is lagging severely behind the others or is stuck.<br />
<br />
In the case of BDR a stuck or crashed node will eventually cause disk space and table bloat problems on other masters so stuck nodes should be detected and removed or repaired in a reasonably timely manner. Exactly how urgent this is depends on the workload of the BDR group.<br />
<br />
The <tt>pg_stat_logical_decoding</tt> view described above may be used to verify that a downstream master is connected to its upstream master by querying it on the upstream side - the <tt>active</tt> boolean column is <tt>t</tt> if there's a downstream master connected to this upstream.<br />
<br />
The <tt>xmin</tt> column provides an indication of whether replication is advancing; it should increase as replication progresses. You can turn this into the time the transaction was committed on the master by running <tt>pg_get_transaction_committime(xmin)</tt> ''on the upstream master''. Since txids are different between upstream and downstream masters, running it on a downstream master with a txid from the upstream master as input would result in an error or and incorrect result.<br />
<br />
Example:<br />
<br />
postgres=# select slot_name, plugin, database, active, xmin,<br />
pg_get_transaction_committime(xmin)<br />
FROM pg_stat_logical_decoding ;<br />
-[ RECORD 1 ]-----------------+----------------------------------------<br />
slot_name | bdr: 12910:5882534759278050995-1-12910:<br />
plugin | bdr_output<br />
database | 12910<br />
active | f<br />
xmin | 1827<br />
pg_get_transaction_committime | 2013-05-27 06:14:36.851423+00<br />
<br />
=== Table and index usage statistics ===<br />
<br />
Statistics on table and index usage are updated normally by the downstream master. This is essential for correct function of auto-vacuum. If there are no local writes on the downstream master and stats have not been reset these two views should show matching results between upstream and downstream:<br />
<br />
* <tt>pg_stat_user_tables</tt><br />
* <tt>pg_statio_user_tables</tt><br />
<br />
Since indexes are used to apply changes, the identifying indexes on downstream side may appear more heavily used with workloads that perform <tt>UPDATE</tt>s and <tt>DELETE</tt>s than non-identifying indexes are. <br />
<br />
The built-in index monitoring views are:<br />
<br />
* <tt>pg_stat_user_indexes</tt><br />
* <tt>pg_statio_user_indexes</tt><br />
<br />
All these views are discussed in [http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE the PostgreSQL documentation on the statistics views].<br />
<br />
=== Starting, stopping and managing replication ===<br />
<br />
Replication is managed with the <tt>postgresql.conf</tt> settings described in "Parameter Reference" and "Configuration" above, and using the <tt>pg_receivellog</tt> utility command.<br />
<br />
==== Starting a new LLSR connection ====<br />
<br />
Logical replication is started automatically when a database is configured as a downstream master in <tt>postgresql.conf</tt> (see [[#Configuration|Configuration]]) and the postmaster is started. No explicit action is required to start replication, but replication will not actually work unless the upstream and downstream databases are identical within the requirements set by LLSR in the [[#Table definitions and DDL replication||Table definitions and DDL replication]] section.<br />
<br />
<tt>pg_dump</tt> and <tt>pg_restore</tt> may be used to set up the new replica's database.<br />
<br />
The development version in the <tt>bdr-next</tt> branch will automatically dump the upstream database and populate the local database if the <tt>bdr.<nodename>.init_replica</tt> setting is configured; see the parameter reference above.<br />
<br />
==== Viewing logical replication slots ====<br />
<br />
Examining the state of logical replication is discussed in [[#Monitoring|Monitoring]].<br />
<br />
==== Pausing and resuming logical replication ====<br />
<br />
You can execute the <tt>bdr_apply_pause()</tt> function to temporarily pause logical replication. Changes will once again be applied once you execute <tt>bdr_apply_resume()</tt>.<br />
<br />
==== Temporarily stopping an LLSR replica ====<br />
<br />
LLSR replicas can be temporarily stopped by shutting down the downstream master's postmaster.<br />
<br />
A stopped replica will still cause the upstream master to retain WAL for it, eventually causing the upstream master to run out of disk space in <tt>pg_xlog</tt>. Do not leave a replica shut down for too long - if it's going to be out of service for an extended period, consider dropping the upstream slot and retiring the replica, then creating a new one later.<br />
<br />
Once you remove an upstream slot you cannot simply rejoin a replica and have it catch up. It must be rebuilt.<br />
<br />
==== Removing an LLSR replica permanently ====<br />
<br />
To remove a replication connection permanently, remove its entries from the downstream master's <tt>postgresql.conf</tt>, restart the downstream master, then remove its slot from the upstream master with <tt>SELECT pg_drop_replication_slot('slotname')</tt> as a superuser. See [http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-REPLICATION the main PostgreSQL documentation].<br />
<br />
Alternately, you can use <tt>pg_receivellog</tt>:<br />
<br />
pg_receivellog -p 5434 -h master-hostname -d dbname \<br />
--slot='bdr: 16384:5873181566046043070-1-16384:' --stop<br />
<br />
It is important to remove the replication slot from the upstream master(s) to prevent xid wrap-around problems and issues with table bloat caused by delayed vacuum, and to prevent the upstream master from retaining WAL for the dead replica until it runs out of <tt>pg_xlog</tt> space.<br />
<br />
== Bi-Directional Replication ==<br />
<br />
Bi-Directional replication is built directly on LLSR by configuring two or more servers as both upstream ''and'' downstream masters of each other.<br />
<br />
All of the Log Level Streaming Replication documentation applies to BDR and should be read before moving on to reading about and configuring BDR.<br />
<br />
=== Bi-Directional Replication Use Cases ===<br />
<br />
Bi-Directional Replication is designed to allow a very wide range of server connection topologies. The simplest to understand would be two servers each sending their changes to the other, which would be produced by making each server the downstream master of the other and so using two connections for each database.<br />
<br />
Logical and physical streaming replication are designed to work side-by-side. This means that a master can be replicating using physical streaming replication to a local standby server, while at the same time replicating logical changes to a remote downstream master. Logical replication works alongside cascading replication also, so a physical standby can feed changes to a downstream master, allowing upstream master sending to physical standby sending to downstream master.<br />
<br />
==== Simple multi-master pair ====<br />
<br />
A simple mulit-master "HA Cluster" with two servers:<br />
<br />
* Server "Alpha" - Master<br />
* Server "Bravo" - Master<br />
<br />
===== Configuration =====<br />
<br />
Alpha:<br />
<br />
wal_level = 'logical'<br />
max_replication_slots = 3<br />
max_wal_senders = 4<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="bravo"<br />
bdr.bravo_dsn = 'dbname=dbtoreplicate'<br />
track_commit_timestamp = on<br />
<br />
Bravo:<br />
<br />
wal_level = 'logical'<br />
max_replication_slots = 3<br />
max_wal_senders = 4<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="alpha"<br />
bdr.alpha_dsn = 'dbname=dbtoreplicate'<br />
track_commit_timestamp = on<br />
<br />
See [[#Configuration|Configuration]] for an explanation of these parameters.<br />
<br />
==== HA and Logical Standby ====<br />
Downstream masters allow users to create temporary tables, so they can be used as reporting servers.<br />
<br />
"HA Cluster":<br />
<br />
* Server "Alpha" - Current Master<br />
* Server "Bravo" - Physical Standby - unused, apart from as failover target for Alpha - potentially specified in synchronous_standby_names<br />
* Server "Charlie" - "Logical Standby" - downstream master<br />
<br />
==== Very High Availability Multi-Master ====<br />
A typical configuration for remote multi-master would then be:<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha using logical streaming<br />
<br />
Bandwidth between Site 1 and Site 2 is minimised<br />
<br />
==== 3-remote site simple Multi-Master Plex ====<br />
<br />
BDR supports "all to all" connections, so the latency for any change being applied on other masters is minimised. (Note that early designs of multi-master were arranged for circular replication, which has latency issues with larger numbers of nodes)<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Alpha, Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha, Charlie using logical streaming replication<br />
<br />
===== Configuration =====<br />
<br />
If you wanted to test this configuration locally you could run three PostgreSQL instances on different ports. Such a configuration would look like the following if the port numbers were used as node names for the sake of notational clarity:<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441,node_5442'<br />
bdr.node_5441_dsn='port=5441 dbname=postgres'<br />
bdr.node_5442_dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5440,node_5442'<br />
bdr.node_5440_dsn='port=5440 dbname=postgres'<br />
bdr.node_5442_dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440,node_5441'<br />
bdr.node_5440_dsn='port=5440 dbname=postgres'<br />
bdr.node_5441_dsn='port=5441 dbname=postgres'<br />
<br />
In a typical real-world configuration each server would be on the same port on a different host instead.<br />
<br />
==== 3-remote site simple Multi-Master Circular Replication ====<br />
<br />
Simpler config uses "circular replication". This is simpler but results in higher latency for changes as the number of nodes increases. It's also less resilient to network disruptions and node faults.<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie using logical streaming replication<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha using logical streaming replication<br />
<br />
TODO: Regrettably this doesn't actually work yet because we don't cascade logical changes (yet).<br />
<br />
===== Configuration =====<br />
<br />
Using node names that match port numbers, for clarity<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441'<br />
bdr.node_5441_dsn='port=5441 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5442'<br />
bdr.node_5442_dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440'<br />
bdr.node_5440_dsn='port=5440 dbname=postgres'<br />
<br />
This would usually be done in the real world with databases on different hosts, all running on the same port.<br />
<br />
==== 3-remote site Max Availability Multi-Master Plex ====<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha, Echo using logical streaming<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Foxtrot using physical streaming with sync replication<br />
** Server "Foxtrot" - Physical Standby - feeds changes to Alpha, Charlie using logical streaming<br />
<br />
Bandwidth and latency between sites is minimised.<br />
<br />
Config left as an exercise for the reader.<br />
<br />
==== N-site symmetric cluster replication ====<br />
<br />
Symmetric cluster is where all masters are connected to each other.<br />
<br />
N=19 has been tested and works fine.<br />
<br />
N masters requires N-1 connections to other masters, so practical limits are <100 servers, or less if you have many separate databases.<br />
<br />
The amount of work caused by each change is O(N), so there is a much lower practical limit based upon resource limits. A future option to limit to filter rows/tables for replication becomes essential with larger or more heavily updated databases, which is planned.<br />
<br />
==== Complex/Assymetric Replication ====<br />
<br />
Variety of options are possible.<br />
<br />
=== Conflict Avoidance ===<br />
<br />
==== Distributed Locking ====<br />
<br />
Some clustering systems use distributed lock mechanisms to prevent concurrent access to data. These can perform reasonably when servers are very close but cannot support geographically distributed applications as very low latency is critical for acceptable performance.<br />
<br />
Distributed locking is essentially a pessimistic approach, whereas BDR advocates an optimistic approach: avoid conflicts where possible but allow some types of conflict to occur and and resolve them when they arise.<br />
<br />
==== Global Sequences ====<br />
<br />
Many applications require unique values be assigned to database entries. Some applications use GUIDs generated by external programs, some use database-supplied values. This is important with optimistic conflict resolution schemes because uniqueness violations are "divergent errors" and are not easily resolvable.<br />
<br />
The SQL standard requires Sequence objects which provide unique values, though these are isolated to a single node. These can then used to supply default values using <tt>DEFAULT nextval('mysequence')</tt>, as with PostgreSQL's <tt>SERIAL</tt> pseudo-type.<br />
<br />
BDR requires sequences to work together across multiple nodes. This is implemented as a new <tt>SequenceAccessMethod</tt> API (SeqAM), which allows plugins that provide get/set functions for sequences. Global Sequences are then implemented as a plugin which implements the SeqAM API and communicates across nodes to allow new ranges of values to be stored for each sequence.<br />
<br />
=== Conflict Detection & Resolution ===<br />
<br />
Because local writes can occur on a master, conflict detection and avoidance is a concern for basic LLSR setups as well as full BDR configurations.<br />
<br />
==== Lock Conflicts ====<br />
<br />
Changes from the upstream master are applied on the downstream master by a single apply process. That process needs to RowExclusiveLock on the changing table and be able to write lock the changing tuple(s). Concurrent activity will prevent those changes from being immediately applied because of lock waits. Use the <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt> facility to look for issues with apply blocking on locks.<br />
<br />
By concurrent activity on a row, we include <br />
<br />
* explicit row level locking (<tt>SELECT ... FOR UPDATE/FOR SHARE</tt>)<br />
* locking from foreign keys<br />
* implicit locking because of row <tt>UPDATE</tt>s, <tt>INSERT</tt>s or <tt>DELETE</tt>s, either from local activity or apply from other servers<br />
<br />
==== Data Conflicts ====<br />
<br />
Concurrent inserts, updates and deletes may also cause data-level conflicts to occur, which then require conflict resolution. It is important that these conflicts are resolved in a consistent and idempotent manner so that all servers end up with identical results.<br />
<br />
Concurrent <tt>UPDATE</tt>s are detected by BDR and resolved using last-update-wins strategy using timestamps. Should timestamps be identical, the tie is broken using system identifier from <tt>pg_control</tt> though this may change in a future release.<br />
<br />
<tt>INSERT</tt>s may cause uniqueness violation errors because of primary keys when applied at remote nodes. This conflict is detected by BDR. Concurrent inserts to the same key are resolved on a last-update-wins basis.<br />
<br />
Additionally, <tt>UPDATE</tt>s and <tt>INSERT</tt>s may cause violations of exclusion constraints or unique indexes when they are applied. These are not easily detectable or resolvable and represent severe application errors that cause the database contents of multiple servers to diverge from each other. Hence these are known as "divergent conflicts". Currently, replication stops should a divergent conflict occur. The errors causing the conflict can be seen in the error log of the downstream master with the problem.<br />
<br />
Updates which cannot locate a row are presumed to be <tt>DELETE</tt>/<tt>UPDATE</tt> conflicts. These are accepted as successful operations but in the case of <tt>UPDATE</tt> the data in the <tt>UPDATE</tt> is discarded. (Future improvements may allow replay to be forced until all nodes are caught up past the conflicting change so it can be definitively identified as a conflict, not just asynchronous changes). <br />
<br />
All conflicts are resolved at row level. Concurrent updates that touch completely separate columns can result in "false conflicts", where there is conflict in terms of the data, just in terms of the row update. Such conflicts will result in just one of those changes being made, the other discarded according to last update wins. It is not practical to automatically decide when a row should be merged and when a last-update-wins stragegy should be used at the database level. User-defined conflict resolution functions (see below) may be used where this is required.<br />
<br />
Changing unlogged and logged tables in the same transaction can result in apparently strange outcomes since the unlogged tables aren't replicated.<br />
<br />
==== Examples ====<br />
<br />
As an example, lets say we have two tables Activity and Customer. There is a Foreign Key from Activity to Customer, constraining us to only record activity rows that have a matching customer row. <br />
<br />
* We update a row on Customer table on NodeA. The change from NodeA is applied to NodeB just as we are inserting an activity on NodeB. The inserted activity causes a FK check.... <br />
<br />
<br />
=== Conflict Resolution by user-defined handlers (aka conflict handlers) ===<br />
<br />
For various conflicts the ability to resolve conflicts by handlers exists. Conflict handers are user-defined functions in for example [http://www.postgresql.org/docs/current/static/plpgsql.html PL/pgSQL]. They follow a specific API. Each handler function has to follow this signature:<br />
<br />
handler_fun(local_row tbltype, remote_row tbltype, command_tag text, rel regclass, event bdr.bdr_handler_types, OUT resolution_row tbltype, OUT resolution bdr.bdr_conflict_handler_action) RETURNS RECORD<br />
<br />
where the parameters are:<br />
<br />
* <code>local_row</code> and <code>remote_row</code>: the conflicting rows. Either one can be <code>NULL</code>, depending on <tt>event</tt> (see below). Their type is always the row-type of the table the conflict trigger applies to.<br />
* <code>command_tag</code>: Contains the executed command name, e.g. <code>UPDATE</code> or <code>INSERT</code>.<br />
* <code>rel</code>: The oid of the relation the conflict appears in (matching <tt>pg_class.oid</tt>).<br />
* <code>event</code>: The conflict event type (e.g. <code>UPDATE_VS_UPDATE</code>)<br />
* ''OUT'' <code>resolution_row</code>: A row chosen or created by the trigger to resolve the conflict with, if called for by <tt>resolution</tt> below, or NULL if not required.<br />
* ''OUT'' <code>resolution</code>: The trigger's decision about the resolution of the conflict. It may contain the following values:<br />
<br />
** <code>IGNORE</code>: ignore this handler's result and continue to the next one. In this case the <code>resolution_row</code> is ignored and may be <code>NULL</code><br />
** <code>ROW</code>: take the <code>resolution_row</code> to resolve this conflict, using this row to replace the row the conflict occurred for. <tt>resolution_row</tt> may not be <tt>NULL</tt>.<br />
** <code>SKIP</code>: simply ignore this conflict and don't apply anything. In this case <code>resolution_row</code> is ignored and may be <code>NULL</code>.<br />
<br />
Currently the following conflicts are supported:<br />
<br />
* <code>UPDATE</code> vs <code>UPDATE</code><br />
* <code>UPDATE</code> vs <code>DELETE</code><br />
<br />
The following conflict types are defined by <code>bdr.bdr_handler_types</code>:<br />
<br />
* <code>UPDATE_VS_UPDATE</code><br />
* <code>UPDATE_VS_DELETE</code><br />
* <code>INSERT_VS_INSERT</code><br />
* <code>INSERT_VS_UPDATE</code><br />
<br />
The conflict handler function has to use the exact row type of the table it applies to. As a consequence it is not possible to write one conflict handler that applies to multiple different table types at this time.<br />
<br />
==== Registering Conflict Handlers ====<br />
<br />
Conflict handlers are registered by a function provided by bdr:<br />
<br />
bdr.bdr_create_conflict_handler(ch_rel REGCLASS, ch_name NAME, ch_proc REGPROCEDURE, ch_type bdr.bdr_handler_types, ch_timeframe INTERVAL DEFAULT NULL) RETURNS VOID<br />
<br />
<code>ch_rel REGCLASS</code> defines the relation the conflict handler is responsible for, <code>ch_name NAME</code> defines the handler name for identification, <code>ch_proc REGPROCEDURE</code> defines the conflict handler procedure and <code>ch_type bdr.bdr_handler_types</code> defines the handler type (e.g. <code>UPDATE_VS_UPDATE</code>). The last parameter <code>ch_timeframe INTERVAL</code> is optional and defaults to 0. It defines the timeframe the conflict handler should be called in. E.g. if the conflicting row appeared 10 seconds in the past and you have a timeframe of 10ms, the handler wouldn't be called. If the conflicting row appeared 1ms ago, it would be called since it is in the time frame of 10ms.<br />
<br />
The conflict handler name has to be unique per table. Creating a conflict handler adds a dependency on the target relation. There can be more than one conflict handler per table and conflict type. In this case the handlers get called in alphabetical order, stopping at and returning the first resolution different from <code>IGNORE</code>.<br />
<br />
==== Removing a Conflict Handler ====<br />
<br />
Conflict handlers can be removed by a function provided by bdr:<br />
<br />
bdr.bdr_drop_conflict_handler(ch_rel REGCLASS, ch_name NAME) RETURNS VOID<br />
<br />
The <code>ch_rel REGCLASS</code> parameter defines the relation the conflict handler is responsible for, <code>ch_name NAME</code> defines the name of the conflict handler to drop.<br />
<br />
Dropping the conflict handler removes the dependency again.<br />
<br />
==== Listing Conflict Handlers ====<br />
<br />
There's also a list of registered conflict handlers available as a view:<br />
<br />
CREATE VIEW bdr_list_conflict_handlers(ch_name, ch_type, ch_reloid, ch_fun)<br />
<br />
<code>ch_name TEXT</code> is the name of the handler, <code>ch_type bdr.bdr_handler_types</code> is the conflict handler type, <code>ch_reloid OID</code> is the relation Oid, <code>ch_fun REGPROCEDURE</code> is the conflict handler function and <code>ch_timeframe INTERVAL</code> is the timeframe the handler is valid in.<br />
<br />
==== Parameter Definition by Conflict Type ====<br />
<br />
The parameters and return values for conflict handlers differ for the different conflict types:<br />
<br />
===== <code>UPDATE</code> vs <code>UPDATE</code> =====<br />
<br />
In this case all resolutions as defined above are valid. <code>local_row</code> and <code>remote_row</code> are both non-<code>NULL</code>.<br />
<br />
===== <code>UPDATE</code> vs <code>DELETE</code> =====<br />
<br />
In this case only <code>SKIP</code> and <code>IGNORE</code> resolutions are valid. <code>local_row</code> is <code>NULL</code>, <code>remote_row</code> contains the conflicting remote row. This case mainly exists to be able to fatally error out via <code>RAISE EXCEPTION</code>.<br />
<br />
<br />
[[Category:Replication]]</div>Amshttps://wiki.postgresql.org/index.php?title=BDR_User_Guide&diff=22339BDR User Guide2014-05-13T08:24:22Z<p>Ams: Document bdr_apply_pause/resume</p>
<hr />
<div>----<br />
This page is the users and administrators guide for BDR. If you're looking for technical details on the project plan and implementation, see [[BDR Project]].<br />
----<br />
<br />
= BDR User Guide =<br />
<br />
BDR (Bi-Directional Replication) is a feature being developed for inclusion in PostgreSQL core that provides greatly enhanced replication capabilities.<br />
<br />
BDR allows users to create a geographically distributed multi-master database using Logical Log Streaming Replication (LLSR) transport. It is designed to provide both high availability and geographically distributed disaster recovery capabilities. <br />
<br />
BDR is not “clustering” as some vendors use the term, in that it doesn't have a distributed lock manager, global transaction co-ordinator, etc. Each member server is separate yet connected, with design choices that allow separation between nodes that would not be possible with global transaction coordination.<br />
<br />
Guidance on getting a testing setup established are in [[#Initial setup]]. Please read the full documentation if you intend to put BDR into production.<br />
<br />
== Logical Log Streaming Replication ==<br />
<br />
Logical log streaming replication (LLSR) allows one PostgreSQL master (the "upstream master") to stream a sequence of changes to another read/write PostgreSQL server (the "downstream master"). Data is sent in one direction only over a normal <tt>libpq</tt> connection.<br />
<br />
Multiple LLSR connections can be used to set up bi-directional replication as discussed later in this guide.<br />
<br />
=== Overview of logical replication ===<br />
<br />
In some ways LLSR is similar to "streaming replication" i.e. physical log streaming replication (PLSR) from a user perspective; both replicate changes from one server to another. However, in LLSR the receiving server is also a full master database that can make changes, unlike the read-only replicas offered by PLSR hot standby. Additionally, LLSR is per-database, whereas PLSR is per-cluster and replicates all databases at once. There are many more differences discussed in the relevant sections of this document.<br />
<br />
In LLSR the data that is replicated is change data in a special format that allows the changes to be logically reconstructed on the downstream master. The changes are generated by reading transaction log (WAL) data, making change capture on the upstream master much more efficient than trigger based replication, hence why we call this "logical log replication". Changes are passed from upstream to downstream using the <tt>libpq</tt> protocol, just like with physical log streaming replication.<br />
<br />
One connection is required for each PostgreSQL database that is replicated. If two servers are connected, each of which has 50 databases then it would require 50 connections to send changes in one direction, from upstream to downstream. Each database to replicate must be explicitly specified so it is possible to filter out unwanted databases by not configuring replication for those databases.<br />
<br />
Setting up replication for new databases is not (yet?) automatic, so additional configuration steps are required after <tt>CREATE DATABASE</tt>. A restart of the downstream master is also required. The upstream master only needs restarting if the <tt>max_replication_slots</tt> parameter is too low to allow a new replica to be added. Adding replication for databases that do not exist yet will cause an ERROR, as will dropping a database that is being replicated. Setup is discussed in more detail below.<br />
<br />
Changes are processed by the downstream master using <tt>bdr</tt> plug-ins. This allows flexible handing of replication input, including:<br />
<br />
* BDR apply process - applies logical changes to the downstream master. The apply process makes changes directly rather than generating SQL text and then parse/plan/executing SQL.<br />
* Textual output plugin - a demo plugin that generates SQL text (but doesn't apply changes)<br />
* <tt>pg_xlogdump</tt> - examines physical WAL records and produces textual debugging output. This server program is included in PostgreSQL 9.3.<br />
<br />
=== Replication of DML changes ===<br />
<br />
All changes are replicated: <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> and <tt>TRUNCATE</tt>. <br />
<br />
(TRUNCATE is not yet implemented, but will be implemented before the feature goes to final release).<br />
<br />
Actions that generate WAL data but don't represent logical changes do not result in data transfer, e.g. full page writes, VACUUMs, hint bit setting. LLSR avoids much of the overhead from physical WAL, though it has overheads that mean that it doesn't always use less bandwidth than PLSR.<br />
<br />
Locks taken by <tt>LOCK</tt> and <tt>SELECT ... FOR UPDATE/SHARE</tt> on the upstream master are not replicated to downstream masters. Locks taken automatically by <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> or <tt>TRUNCATE</tt> *are* taken on the downstream master and may delay replication apply or concurrent transactions - see [[#Lock Conflicts|Lock Conflicts]].<br />
<br />
<tt>TEMPORARY</tt> and <tt>UNLOGGED</tt> tables are not replicated. In contrast to physical standby servers, downstream masters can use temporary and unlogged tables. However, temporary tables remain specific to a particular session so creating a temporary table on the upstream master does not create a similar table on the downstream master.<br />
<br />
<tt>DELETE</tt> and <tt>UPDATE</tt> statements that affect multiple rows on upstream master will cause a series of row changes on downstream master. These are likely to go at same speed as on the origin, as long as an index is defined on the Primary Key of the table on the downstream master. <tt>UPDATE</tt>s and <tt>DELETE</tt>s require some form of unique constraint, either <tt>PRIMARY KEY</tt> or <tt>UNIQUE NOT NULL</tt>. A warning is issued in the downstream master's logs if the expected constraint is absent. <tt>INSERT</tt> on upstream master do not require a unique constraint in order to replicate correctly, though such usage would prevent conflict detection between multiple masters, if that was considered important.<br />
<br />
<tt>UPDATE</tt>s that change the value of the Primary Key of a table will be replicated correctly.<br />
<br />
The values applied are the final values from the <tt>UPDATE</tt> on the upstream master, including any modifications from before-row triggers, rules or functions. Any reflexive conditions, such as N = N+ 1 are resolved to their final value. Volatile or stable functions are evaluated on the master side and the resulting values are replicated. Consequently any function side-effects (writing files, network socket activity, updating internal PostgreSQL variables, etc) will not occur on the replicas as the functions are not run again on the replica.<br />
<br />
All columns are replicated on each table. Large column values that would be placed in TOAST tables are replicated without problem, avoiding de-compression and re-compression. If we update a row but do not change a TOASTed column value, then that data is not sent downstream.<br />
<br />
All data types are handled, not just the built-in datatypes of PostgreSQL core. The only requirement is that user-defined types are installed identically in both upstream and downstream master (see "Limitations").<br />
<br />
The LLSR plugin uses the binary <tt>libpq</tt> protocol where the upstream and downstream masters are binary-compatible, i.e. they have the same PostgreSQL major version, same processor architecture and compatible compilation options. Where the upstream and downstream masters are not binary compatible, replication will fall back to the text protocol normally used for PostgreSQL client/server communication. In case of version differences it may also be necessary to upgrade the <tt>bdr</tt> extension on the older server to match the newer server.<br />
<br />
Sets of changes are accumulated in memory (spilling to disk where required) and then sent to the downstream server at commit time. Aborted transactions are never sent. Application of changes on downstream master is currently single-threaded, though this process is efficiently implemented. Parallel apply is a possible future feature, especially for changes made while holding <tt>AccessExclusiveLock</tt>.<br />
<br />
Changes are applied to the downstream master in the sequence in which they were committed on the upstream master. This is a known-good serialized ordering of changes, so replication serialization failures are not theoretically possible. Such failures are common in systems that use statement based replication (e.g. MySQL) or trigger based replication (e.g. Slony version 2.0). Users should note that this means the original order of locking of tables is not maintained. Although lock order is provably not an issue for the set of locks held on upstream master, additional locking on downstream side could cause lock waits or deadlocking in some cases. (Discussed in further detail later).<br />
<br />
Larger transactions spill to disk on the upstream master once they reach a certain size. Currently, large transactions can cause increased latency. Future enhancement will be to stream changes to downstream master once they fill the upstream memory buffer, though this is likely to be implemented in 9.5.<br />
<br />
<tt>SET</tt> statements and parameter settings are not replicated. This has no effect on replication since we only replicate actual changes, not anything at SQL statement level. We always update the correct tables, whatever the setting of <tt>search_path</tt>. Values are replicated correctly irrespective of the values of <tt>bytea_output</tt>, <tt>TimeZone</tt>, <tt>DateStyle</tt>, etc.<br />
<br />
<tt>NOTIFY</tt> is not supported across log based replication, either physical or logical. <tt>NOTIFY</tt> and <tt>LISTEN</tt> will work fine on the upstream master but an upstream <tt>NOTIFY</tt> will not trigger a downstream <tt>LISTEN</tt>er.<br />
<br />
In some cases, additional deadlocks can occur on apply. This causes an automatic retry of the apply of the replaying transaction and is only an issue if the deadlock recurs repeatedly, delaying replication.<br />
<br />
From a performance and concurrency perspective the BDR apply process is similar to a normal backend. Frequent conflicts with locks from other transactions when replaying changes can slow things down and thus increase replication delay, so reducing the frequency of such conflicts can be a good way to speed things up. Any lock held by another transaction on the downstream master - <tt>LOCK</tt> statements, <tt>SELECT ... FOR UPDATE/FOR SHARE</tt>, or <tt>INSERT</tt>/<tt>UPDATE</tt>/<tt>DELETE</tt> row locks - can delay replication if the replication apply process needs to change the locked table/row.<br />
<br />
=== Table definitions and DDL replication ===<br />
<br />
DML changes are replicated between tables with matching <tt>"Schemaname"."Tablename"</tt> on both upstream and downstream masters. e.g. changes from upstream's <tt>public.mytable</tt> will go to downstream's <tt>public.mytable</tt> while changes to the upstream <tt>mychema.mytable</tt> will go to the downstream <tt>myschema.mytable</tt>. This works even when no schema is specified on the original SQL since we identify the changed table from its internal OIDs in WAL records and then map that to whatever internal identifier is used on the downstream node.<br />
<br />
This requires careful synchronization of table definitions on each node otherwise <tt>ERROR</tt>s will be generated by the replication apply process. In general, tables must be an exact match between upstream and downstream masters. <br />
<br />
There are no plans to implement working replication between dissimilar table definitions.<br />
<br />
Tables must meet the following requirements to be compatible for purposes of LLSR. The best way to ensure an exact match is to define the table on one node and allow DDL replication to copy its definition to the other nodes, or to use <tt>init_replica</tt> to copy the definitions when bringing up a new BDR node. If you don't define a table / type / etc manually on multiple nodes then you won't have to worry, BDR takes care of ensuring compatibility for you.<br />
<br />
The requirements for compatibility are:<br />
<br />
* The downstream master must only have constraints (<tt>CHECK</tt>, <tt>UNIQUE</tt>, <tt>EXCLUSION</tt>, <tt>FOREIGN KEY</tt>, etc) that are also present on the upstream master. Replication may initially work with mismatched constraints but is likely to fail as soon as the downstream master rejects a row the upstream master accepted.<br />
* The table referenced by a FOREIGN KEY on a downstream master must have all the keys present in the upstream master version of the same table.<br />
* Storage parameters must match except for as allowed below<br />
* Inheritance must be the same<br />
* Dropped columns on master must be present on replicas<br />
* Custom types and enum definitions must match exactly<br />
* Composite types and enums must have the same oids on master and replication target<br />
* Extensions defining types used in replicated tables must be of the same version or fully SQL-level compatible and the oids of the types they define must match.<br />
<br />
The following differences are permissible between tables on different nodes:<br />
<br />
* The table's <tt>pg_class</tt> oid, the oid of its associated TOAST table, and the oid of the table's rowtype in <tt>pg_type</tt> may differ;<br />
* Extra or missing non-<tt>UNIQUE</tt> indexes<br />
* Extra keys in downstream lookup tables for <tt>FOREIGN KEY</tt> references that are not present on the upstream master<br />
* The table-level storage parameters for fillfactor and autovacuum<br />
* Triggers and rules may differ (they are not executed by replication apply)<br />
<br />
Replication of DDL changes between nodes is performed using event triggers, with partial support integrated in <tt>bdr-next</tt> (see [[#LLSR Limitations|LLSR Limitations]]).<br />
<br />
Triggers and Rules are NOT executed by apply on downstream side, equivalent to an enforced setting of <tt>session_replication_role = origin</tt>.<br />
<br />
In future it is expected that composite types and enums with non-identical oids will be converted using text output and input functions. This feature is not yet implemented.<br />
<br />
=== LLSR limitations ===<br />
<br />
The current LLSR implementation is subject to some limitations, which are being progressively removed as work progresses.<br />
<br />
==== Data definition compatibility ====<br />
<br />
Table definitions, types, extensions, etc must be near identical between upstream and downstream masters. See [[#Table definitions and DDL replication|Table definitions and DDL replication]].<br />
<br />
==== DDL Replication ====<br />
<br />
DDL replication is not yet fully supported. As of release 0.5 (tag <tt>bdr/0.5</tt>), BDR can replicate <tt>CREATE TABLE</tt>, <tt>CREATE SEQUENCE</tt> and <tt>CREATE INDEX</tt>, but not other DDL.<br />
<br />
The pending release after 0.5 adds <tt>CREATE TABLE</tt>, <tt>DROP TABLE</tt>, partial <tt>ALTER TABLE</tt> and a large number of other statements, allowing most DDL to be run on one node and replicated automatically to all others. <br />
<br />
===== In bdr-0.5 (older) =====<br />
<br />
This only applies to <tt>bdr/0.5</tt>; <tt>bdr-next</tt> has much stronger DDL replication support.<br />
<br />
<tt>CREATE TABLE</tt> will work without problems and will be automatically replicated to downstream nodes.<br />
<br />
Any <tt>ALTER TABLE</tt> may cause the definitions of tables on either end of a link to go out of sync, causing replication to fail.<br />
<br />
<tt>DROP TABLE</tt> of a table on a downstream master or BDR member may cause replication to halt as pending rows for that table cannot be applied.<br />
<br />
Additionally, the <tt>PRIMARY KEY</tt> of a table may not be dropped as BDR requires it for normal operation.<br />
<br />
Other indexes may be added and removed freely as they do not affect replication.<br />
<br />
==== TRUNCATE is not replicated ====<br />
<br />
<tt>TRUNCATE</tt> is not yet supported.<br />
<br />
The safest option is to define a user-level BEFORE trigger on each table that RAISEs an ERROR when TRUNCATE is attempted.<br />
<br />
A simple truncate-blocking trigger is:<br />
<br />
CREATE OR REPLACE FUNCTION deny_truncate() RETURNS trigger AS $$<br />
BEGIN<br />
IF tg_op = 'TRUNCATE' THEN<br />
RAISE EXCEPTION 'TRUNCATE is not supported on this table. Please use DELETE FROM.';<br />
ELSE<br />
RAISE EXCEPTION 'This trigger only supports TRUNCATE';<br />
END IF;<br />
END;<br />
$$ LANGUAGE plpgsql;<br />
<br />
It can be applied to a table with:<br />
<br />
CREATE TRIGGER deny_truncate_on_<tablename> BEFORE TRUNCATE ON <tablename><br />
FOR EACH STATEMENT EXECUTE PROCEDURE deny_truncate();<br />
<br />
A PL/PgSQL DO block that queries <tt>pg_class</tt> and loops over it to <tt>EXECUTE</tt> a dynamic SQL <tt>CREATE TRIGGER</tt> command for each table that does not already have the trigger can be used to apply the trigger to all tables.<br />
<br />
Alternately, there will be a <tt>ProcessUtility_hook</tt> available in the BDR extension to automatically prevent unsupported operations like <tt>TRUNCATE</tt>.<br />
<br />
=== Initial setup ===<br />
<br />
To set up LLSR or BDR you first need a patched PostgreSQL that can support LLSR/BDR, then you need to create one or more LLSR/BDR senders and one or more LLSR/BDR receivers.<br />
<br />
==== Installing the patched PostgreSQL binaries ====<br />
<br />
Currently BDR is only available in builds of the 'bdr' branch on Andres Freund's git repo on git.postgresql.org. PostgreSQL 9.3 and below do not support BDR, and 9.4 requires patches, so this guide will not work for you if you are trying to use a normal install of PostgreSQL.<br />
<br />
First you need to clone, configure, compile and install like normal. Clone the sources from <tt>git://git.postgresql.org/git/2ndquadrant_bdr.git</tt> and checkout the <tt>bdr</tt> branch.<br />
<br />
If you have an existing local PostgreSQL git tree specify it as <tt>--reference /path/to/existing/tree</tt> to greatly speed your git clone.<br />
<br />
Example:<br />
<br />
mkdir -p $HOME/bdr<br />
cd bdr<br />
git clone -b bdr git://git.postgresql.org/git/2ndquadrant_bdr.git $HOME/bdr/postgres-bdr-src<br />
cd postgres-bdr-src<br />
./configure --prefix=$HOME/bdr/postgres-bdr-bin<br />
make install<br />
(cd contrib/btree_gist && make install)<br />
(cd contrib/bdr && make install)<br />
<br />
This will put everything in <tt>$HOME/bdr</tt>, with the source code and build tree in <tt>$HOME/bdr/postgres-bdr-src</tt> and the installed PostgreSQL in <tt>$HOME/bdr/postgres-bdr-bin</tt>. This is a convenient setup for testing and development because it doesn't require you to set up new users, wrangle permissions, run anything as root, etc, but it isn't recommended that you deploy this way in production.<br />
<br />
To actually use these new binaries you will need to:<br />
<br />
export PATH=$HOME/bdr/postgres-bdr-bin/bin:$PATH<br />
<br />
before running <tt>initdb</tt>, <tt>postgres</tt>, etc. You don't have to use the <tt>psql</tt> or <tt>libpq</tt> you compiled but you're likely to get version mismatch warnings if you don't.<br />
<br />
=== Parameter Reference ===<br />
<br />
The following parameters are new or have been changed in PostgreSQL's new logical streaming replication.<br />
<br />
==== <tt>shared_preload_libraries = 'bdr'</tt> ====<br />
<br />
To load support for receiving changes on a downstream master, the <tt>bdr</tt> library must be added to the existing ‘shared_preload_libraries’ parameter. This loads the bdr library during postmaster start-up and allows it to create the required background worker(s).<br />
<br />
Upstream masters don't need to load the bdr library unless they're also operating as a downstream master as is the case in a BDR configuration.<br />
<br />
==== <tt>bdr.connections</tt> ====<br />
<br />
A comma-separated list of upstream master connection names is specified in <tt>bdr.connections</tt>. These names must be simple alphanumeric strings. They are used when naming the connection in error messages, configuration options and logs, but are otherwise of no special meaning.<br />
<br />
A typical two-upstream-master setting might be:<br />
<br />
bdr.connections = 'upstream1, upstream2'<br />
<br />
==== <tt>bdr.&lt;connection_name&gt;_dsn</tt> ====<br />
<br />
Each connection name must have at least a data source name specified using the <tt>bdr.&lt;connection_name&gt;_dsn</tt> parameter. The DSN syntax is the same as that used by libpq so it is not discussed in further detail here. A <tt>dbname</tt> for the database to connect to must be specified; all other parts of the DSN are optional.<br />
<br />
The local (downstream) database name is assumed to be the same as the name of the upstream database being connected to, though future versions will make this configurable.<br />
<br />
For the above two-master setting for <tt>bdr.connections</tt> the DSNs might look like:<br />
<br />
bdr.upstream1_dsn = 'host=10.1.1.2 user=postgres dbname=replicated_db'<br />
bdr.upstream2_dsn = 'host=10.1.1.3 user=postgres dbname=replicated_db'<br />
<br />
==== <tt>bdr.synchronous_commit</tt> ====<br />
<br />
This boolean option controls the <tt>synchronous_commit</tt> setting for BDR apply workers. It defaults to <tt>on</tt>.<br />
<br />
If set to <tt>off</tt>, BDR apply workers will perform async commits, allowing PostgreSQL to considerably improve throughput. It is safe unless you intend to run BDR with synchronous replication, in which case <tt>bdr.synchronous_commit</tt> must be left <tt>on</tt>.<br />
<br />
==== <tt>bdr.default_apply_delay</tt> ====<br />
<br />
Sets the default for <tt>bdr.<conname>_apply_delay</tt> for all configured connections.<br />
<br />
Primarily useful for debugging.<br />
<br />
Added after 0.5.<br />
<br />
==== <tt>bdr.<connection_name>_apply_delay</tt> ====<br />
<br />
This parameter, which defaults to zero, causes the application of received transactions to be delayed by the specified number of milliseconds.<br />
<br />
It is mostly useful for testing and debugging purposes, but may also be used to provide a replica of the database at a known point in the recent past.<br />
<br />
==== <tt>bdr.log_conflicts_to_table</tt> ====<br />
<br />
Boolean, controls whether detected BDR conflicts get logged to a <tt>bdr.bdr_conflict_history</tt> table.<br />
<br />
Added after 0.5, subject to change.<br />
<br />
==== <tt>bdr.<connection_name>_init_replica</tt> ====<br />
<br />
Added after BDR 0.5.<br />
<br />
This parameter defaults to <tt>off</tt>. If set to <tt>on</tt>, it will cause BDR to dump the database pointed to by this <tt>bdr.<connection_name>.dsn</tt> before starting replication and apply the dump to the local database, which must be empty.<br />
<br />
The dump is guaranteed to be consistent with the start point for replication.<br />
<br />
The following parameters become required for this connection if and only if <tt>init_replica</tt> is enabled.<br />
<br />
* <tt>bdr.<connection_name>_replica_local_dsn</tt><br />
<br />
==== <tt>bdr.<connection_name>_replica_local_dsn</tt> ====<br />
<br />
Added after BDR 0.5. Ignored unless <tt>_init_replica</tt> is <tt>on</tt> for this connection, required if it is <tt>on</tt>.<br />
<br />
A connection string that is passed to the script at <tt>bdr.<connection_name>_replica_script_path</tt>, telling it which local database to connect to in order to apply the dump of the remote DB.<br />
<br />
The connection string is only visible to superusers, and should specify a superuser connection. You may include a password in the connection string if required, or put it in the separate <tt>.pgpass</tt> file for the <tt>postgres</tt> user.<br />
<br />
==== <tt>bdr.temp_dump_directory</tt> ====<br />
<br />
Added after BDR 0.5. Has no effect without <tt>bdr.<conname>_init_replica=on</tt> for one or more connections.<br />
<br />
Specifies the path to a temporary storage location, writable by the <tt>postgres</tt> user, that has enough storage space to contain a complete dump of the database at <tt>bdr.<connection_name>_dsn</tt> for each configured connection with <tt>init_replica</tt> enabled.<br />
<br />
Only used during initial bringup.<br />
<br />
==== <tt>bdr.max_workers</tt> ====<br />
<br />
Allocates shared memory space for BDR worker configuration information. You can ignore this parameter at the moment.<br />
<br />
This parameter is auto-calculated from the number of <tt>bdr.connections</tt>, with the assumption that each connection has a separate database and thus needs two connections. This wastes a small amount of shared memory, but the impact is minimal. It isn't otherwise useful - it'll be important when BDR is enhanced to allow new connections to be added at runtime, but isn't currently worth paying attention to.<br />
<br />
Added after BDR 0.5.<br />
<br />
==== <tt>max_replication_slots</tt> ====<br />
<br />
The new parameter <tt>max_replication_slots</tt> has been added for use on both upstream and downstream masters. This parameter controls the maximum number of logical replication slots - upstream or downstream - that this cluster may have at a time. It must be set at postmaster start time.<br />
<br />
As logical replication slots are persistent, slots are consumed even by replicas that are not currently connected. Slot management is discussed in Starting, Stopping and Managing Replication.<br />
<br />
<tt>max_replication_slots</tt> should be set to the sum of the number of logical replication upstream masters this server will have plus the number of logical replication downstream masters will connect to it it.<br />
<br />
==== <tt>wal_level = 'logical'</tt> ====<br />
<br />
A new setting, <tt>'logical'</tt>, has been added for the existing <tt>wal_level</tt> parameter. <tt>‘logical’</tt> includes everything that the existing <tt>hot_standby</tt> setting does and adds additional details required for logical changeset decoding to the write-ahead logs. <br />
<br />
This additional information is consumed by the upstream-master-side xlog decoding worker. Downstream masters that do not also act as upstream masters do not require <tt>wal_level</tt> to be increased above the default <tt>'minimal'</tt>.<br />
<br />
<tt>wal_level</tt>, except for the new <tt>'logical'</tt> setting, is [http://www.postgresql.org/docs/current/static/runtime-config-wal.html documented in the main PostgreSQL manual].<br />
<br />
==== <tt>max_wal_senders</tt> ====<br />
<br />
Logical replication hasn't altered the <tt>max_wal_senders</tt> parameter, but it is important in upstream masters for logical replication and BDR because every logical sender consumes a <tt>max_wal_senders</tt> entry.<br />
<br />
You should configure <tt>max_wal_senders</tt> to the sum of the number of physical and logical replicas you want to allow an upstream master to serve. If you intend to use <tt>pg_basebackup</tt> you should add at least two more senders to allow for its use.<br />
<br />
Like <tt>max_replication_slots</tt>, <tt>max_wal_senders</tt> entries don't cost a large amount of memory, so you can overestimate fairly safely.<br />
<br />
<tt>max_wal_senders</tt> is documented in [http://www.postgresql.org/docs/current/static/runtime-config-replication.html the main PostgreSQL documentation].<br />
<br />
==== <tt>track_commit_timestamp</tt> ====<br />
<br />
Setting this parameter to "on" enables commit timestamp tracking, which is used to implement last-UPDATE-wins conflict resolution.<br />
<br />
It is also required for use of the <tt>pg_get_transaction_committime</tt> function.<br />
<br />
=== Function reference ===<br />
<br />
BDR / LSLR adds a number of functions. Some of them have been integrated into PostgreSQL 9.4 and [http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-REPLICATION can be found in the 9.4 documentation]. Others are pending integration; those are listed here.<br />
<br />
==== <tt>pg_get_transaction_committime</tt> ====<br />
<br />
<tt>pg_get_transaction_committime(txid integer)</tt>: Get the timestamp at which the specified transaction, as identified by transaction ID, committed. This function can be useful when monitoring replication lag.<br />
<br />
This function is added by [https://commitfest.postgresql.org/action/patch_view?id=1265 the commit timestamp patch set] and is included in the <tt>bdr</tt> branch.<br />
<br />
==== <tt>pg_xlog_wait_remote_apply</tt> ====<br />
<br />
The <tt>pg_xlog_wait_remote_apply(lsn text, pid integer)</tt> function allows you to wait on an upstream master until all downstream masters' replication has caught up to a certain point.<br />
<br />
The <tt>lsn</tt> argument is a Log Sequence Number, an identifier for the WAL (Write-Ahead Log) record you want to make sure has been applied on all nodes. The most useful record you will want to wait for is <tt>pg_current_xlog_location()</tt>, as discussed in [http://www.postgresql.org/docs/current/static/functions-admin.html PostgreSQL Admin Functions] in the manual.<br />
<br />
The <tt>pid</tt> argument specifies the process ID of a walsender to wait for. It may be set to zero to wait until the receivers associated with all walsenders on this upstream master have caught up to the specified <tt>lsn</tt>, or to a process ID obtained from <tt>pg_stat_replication.pid</tt> to wait for just one downstream to catch up.<br />
<br />
The most common use is:<br />
<br />
select pg_xlog_wait_remote_apply(pg_current_xlog_location(), 0)<br />
<br />
which will wait until all downstream masters have applied changes up to the time on the upstream master at which <tt>pg_xlog_wait_remote_apply</tt> was called. <br />
<br />
<tt>pg_current_xlog_location</tt> is not transactional, so unlike things like <tt>current_timestamp</tt> it'll always return the very latest status server-wide, irrespective of how long the current transaction has been running for and when it started.<br />
<br />
==== <tt>pg_xlog_wait_remote_receive</tt> ====<br />
<br />
<tt>pg_xlog_wait_remote_receive</tt> is the same as <tt>pg_xlog_wait_remote_apply</tt>, except that it only waits until the remote node has confirmed that it's received the given LSN, not until it has actually applied it after receiving it.<br />
<br />
=== Catalog changes ===<br />
<br />
BDR has a number of its own catalogs for metadata and state. It also introduces a few changes to core system catalogs.<br />
<br />
==== <tt>pg_catalog.pg_seqam</tt> ====<br />
<br />
This is a new core system catalog added in the BDR patchset, in the sequence access methods patch. It is due to be submitted for inclusion in PostgreSQL core in 9.5 but is currently maintained as part of BDR.<br />
<br />
To support [[#Distributed_Sequences|distributed sequences]], BDR adds an access method abstraction for sequences. It serves a similar purpose to index access methods - it abstracts the implementation of sequence storage from usage of sequences, so the client doesn't need to care whether it's using a distributed sequence, a local sequence, or something else entirely.<br />
<br />
This access method is described by the <tt>pg_seqam</tt> table. Two entries are defined:<br />
<br />
postgres=# select * from pg_seqam ;<br />
seqamname | seqamalloc | seqamsetval | seqamoptions <br />
-----------+----------------------+-----------------------+------------------------<br />
local | sequence_local_alloc | sequence_local_setval | sequence_local_options<br />
bdr | bdr_sequence_alloc | bdr_sequence_setval | bdr_sequence_options<br />
(2 rows)<br />
<br />
<tt>local</tt> is the traditional local-only sequence access method.<br />
<br />
<tt>bdr</tt> is for distributed sequences. For more information, see the [[#Distributed_Sequences|distributed sequences]] section.<br />
<br />
==== <tt>bdr.bdr_conflict_handlers</tt> ====<br />
<br />
The <tt>bdr_conflict_handlers</tt> table contains user defined conflict handlers ("conflict triggers") that can be used to implement application-specific conflict resolution.<br />
<br />
See "Conflict Resolution by user-defined handlers", below.<br />
<br />
==== <tt>bdr.bdr_conflict_history</tt> ====<br />
<br />
The <tt>bdr_conflict_history</tt> table is a log table that records detected conflicts. See "Monitoring" for details.<br />
<br />
==== <tt>bdr.bdr_nodes</tt> ====<br />
<br />
<tt>bdr_nodes</tt> is a global state table that tracks all known members of the BDR group, online or not.<br />
<br />
If BDR is not set up in a star topology then each node still needs information about the existence and state of the other nodes. It cannot rely on the locally configured connections or slots. <tt>bdr_nodes</tt> maintains that information.<br />
<br />
In general you don't need to work directly with <tt>bdr.bdr_nodes</tt>. See the source code for details on its use.<br />
<br />
==== <tt>bdr.bdr_queued_commands</tt> and <tt>bdr.bdr_queued_drops</tt> ====<br />
<br />
<tt>bdr_queued_commands</tt> and <tt>bdr.bdr_queued_drops</tt> are an implementation detail that is generally not of concern to users.<br />
<br />
Rows are inserted into <tt>bdr_queued_commands</tt> when an event trigger detects a DDL command that can be replicated. The rows get replicated to other nodes, which detect the special case of an insertion into this table and execute the DDL command described in the row.<br />
<br />
A similar principle applies for <tt>bdr.bdr_queued_drops</tt>.<br />
<br />
You don't need to work directly with these tables. See the source code for details on their use.<br />
<br />
==== <tt>bdr.bdr_sequence_elections</tt>, <tt>bdr.bdr_sequence_values</tt> and <tt>bdr.bdr_votes</tt> ====<br />
<br />
These tables are implementation detail for global sequences. <br />
<br />
You don't need to work directly with these tables. See the source code for details on their use.<br />
<br />
=== Distributed Sequences ===<br />
<br />
Distributed sequences, or global sequences, are a sequence that is synchronized across all the nodes in a BDR cohort. A distributed sequence is more expensive to access than a purely local sequence, but it produces values that are guaranteed unique across the entire cohort.<br />
<br />
Using distributed sequences allows you to avoid the problems with inserts conflicts. If you define a <tt>PRIMARY KEY</tt> or <tt>UNIQUE</tt> column with a <tt>DEFAULT nextval(...)</tt> expression that refers to a global sequence shared across all nodes in a BDR cohort, it is not possible for any node to ever get the same value as any other node. When BDR synchronizes inserts between the nodes, they can never conflict.<br />
<br />
There is no need to use a distributed sequence if:<br />
<br />
* You are ensuring global uniqueness using another method such as:<br />
** Local sequences with an offset and increment;<br />
** UUIDs;<br />
** An externally co-ordinated natural key<br />
<br />
* You are using the data in a <tt>TEMPORARY</tt> or <tt>UNLOGGED</tt> table, as these are never visible outside the current node.<br />
<br />
(All of the following is subject to change and requires periodic review).<br />
<br />
You can get a listing of distributed sequences defined in a database with:<br />
<br />
SELECT *<br />
FROM pg_class <br />
INNER JOIN pg_seqam ON (pg_class.relam = pg_seqam.oid) <br />
WHERE pg_seqam.seqamname = 'bdr' AND relkind = 'S';<br />
<br />
(See <tt>[[#pg_seqam|pg_seqam]]</tt> for information on the new <tt>pg_seqam</tt> catalog table).<br />
<br />
New distributed sequences may be created with the <tt>USING</tt> clause to <tt>CREATE SEQUENCE</tt>:<br />
<br />
CREATE SEQUENCE test_seq USING bdr;<br />
<br />
Once you've created a distributed sequence you may use it with <tt>nextval</tt> like any other sequence.<br />
<br />
A few limitations and caveats apply to global sequences at time of writing:<br />
<br />
* Only an <tt>INCREMENT</tt> of 1 (the default) is supported. Client applications that expect a different increment must be configured to handle increment 1. An extended variant of <tt>nextval</tt> that takes the number of values to obtain as an argument and returns a set of values is planned as an extension to aid in porting.<br />
<br />
* <tt>MINVALUE</tt> and <tt>MAXVALUE</tt> are locked at their defaults and may not be changed.<br />
<br />
* <tt>START WITH</tt> may not be specified; however, <tt>setval</tt> may be used to set the start value after the sequence is created.<br />
<br />
* The <tt>CACHE</tt> directive is not supported.<br />
<br />
* Sequence values are handed out in chunks, so if three different nodes all call <tt>nextval</tt> at the same time they might get values 50, 150 and 250. Thus, at time 't' <tt>nextval</tt> on one node may return a value higher than a <tt>nextval</tt> call at time 't+1' on another node. Within a single node the usual rules for <tt>nextval</tt> still apply.<br />
<br />
The details used by BDR to manage global sequences are in the <tt>bdr_sequence_values</tt>, <tt>bdr_sequence_elections</tt> and <tt>bdr_votes</tt> tables in the <tt>public</tt> schema, though these details are subject to change.<br />
<br />
=== Configuration ===<br />
<br />
Details on individual parameters are described in the [[#Parameter_Reference|parameter reference]] section.<br />
<br />
The following configuration is an example of a simple one-way LLSR replication setup - a single upstream master to a single downstream master.<br />
<br />
The upstream master (sender)'s <tt>postgresql.conf</tt> should contain settings like:<br />
<br />
wal_level = 'logical' # Include enough info for logical replication<br />
max_replication_slots = X # Number of LLSR senders + any receivers<br />
max_wal_senders = Y # Y = max_replication_slots plus any physical <br />
# streaming requirements<br />
track_commit_timestamp = on # Not strictly required for LLSR, only for BDR<br />
# conflict resolution.<br />
<br />
Downstream (receiver) <tt>postgresql.conf</tt>:<br />
<br />
shared_preload_libraries = 'bdr'<br />
<br />
bdr.connections="name_of_upstream_master" # list of upstream master nodenames<br />
bdr.<nodename>_dsn = 'dbname=postgres' # connection string for connection<br />
# from downstream to upstream master<br />
bdr.<nodename>_local_dbname = 'xxx' # optional parameter to cover the case <br />
# where the databasename on upstream <br />
# and downstream master differ. <br />
# (Not yet implemented)<br />
bdr.<nodename>_apply_delay # optional parameter to delay apply of<br />
# transactions, time in milliseconds <br />
bdr.synchronous_commit = off; # optional parameter to set the<br />
# synchronous_commit parameter the<br />
# apply processes will be using.<br />
# Safe to set to 'off' unless you're<br />
# doing synchronous replication.<br />
max_replication_slots = X # set to the number of remotes<br />
track_commit_timestamp = on # Not strictly required for LLSR,<br />
# only for BDR conflict resolution.<br />
<br />
Note that a server can be both sender and receiver, either two servers to each other or more complex configurations like replication chains/trees.<br />
<br />
The upstream (sender) <tt>pg_hba.conf</tt> must be configured to allow the downstream master to connect for replication. Otherwise you'll see errors like the following on the downstream master:<br />
<br />
FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "[local]", user "postgres"<br />
<br />
A suitable <tt>pg_hba.conf</tt> entry for a replication connection from the replica server 10.1.4.8 might be:<br />
<br />
host replication postgres 10.1.4.8/32 trust<br />
<br />
(the user name should match the user name configured in the downstream master's dsn. md5 password authentication is supported.)<br />
<br />
For more details on these parameters, see [[#Parameter Reference|Parameter Reference]].<br />
<br />
=== Troubleshooting ===<br />
<br />
==== Could not access file "bdr": No such file or directory ====<br />
<br />
If you see the error:<br />
<br />
FATAL: could not access file "bdr": No such file or directory<br />
<br />
when starting a database set up to receive BDR replication, you probably forgot to install <tt>contrib/bdr</tt>. See above.<br />
<br />
==== Invalid value for parameter ====<br />
<br />
An error like:<br />
<br />
LOG: invalid value for parameter ...<br />
<br />
when setting one of these parameters means your server doesn't support logical replication and will need to be patched or updated.<br />
<br />
==== Couldn't find logical slot ====<br />
<br />
An error like:<br />
<br />
ERROR: couldn't find logical slot "bdr: 16384:5873181566046043070-1-24596:"<br />
<br />
on the upstream master suggests that a downstream master is trying to connect to a logical replication slot that no longer exists. The slot can not be re-created, so it is necessary to re-seed the downstream replica database.<br />
<br />
=== Operational Issues and Debugging ===<br />
<br />
In LLSR there are no user-level (ie SQL visible) ERRORs that have special meaning. Any ERRORs generated are likely to be serious problems of some kind, apart from apply deadlocks, which are automatically re-tried.<br />
<br />
=== Monitoring ===<br />
<br />
The following tables and views are available for monitoring replication activity:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE pg_stat_replication]</tt><br />
* <tt>[http://www.postgresql.org/docs/devel/static/catalog-pg-replication-slots.html pg_replication_slots]</tt><br />
* <tt>pg_stat_bdr</tt> (described below)<br />
* <tt>bdr_nodes</tt><br />
<br />
The following configuration and logging parameters are useful for monitoring replication:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt><br />
<br />
==== <tt>pg_replication_slots</tt> ====<br />
<br />
The <tt>pg_replication_slots</tt> view is specific to logical replication. It was incorporated into PostgreSQL 9.4 after the release of BDR 0.5 so the documentation for it has been removed from here; see <tt>[http://www.postgresql.org/docs/devel/static/catalog-pg-replication-slots.html pg_replication_slots]</tt> in the PostgreSQL manual.<br />
<br />
==== <tt>bdr.pg_stat_bdr</tt> ====<br />
<br />
The <tt>bdr.pg_stat_bdr</tt> view is supplied by the <tt>bdr</tt> extension. It provides information on a server's connection(s) to its upstream master(s).<br />
<br />
The primary purpose of this view is to report statistics on the progress of LLSR apply on a per-upstream master connection basis.<br />
<br />
View structure:<br />
<br />
View "public.pg_stat_bdr"<br />
Column | Type | Modifiers <br />
--------------------+--------+-----------<br />
rep_node_id | oid | <br />
riremotesysid | name | <br />
riremotedb | oid | <br />
rilocaldb | oid | <br />
nr_commit | bigint | <br />
nr_rollback | bigint | <br />
nr_insert | bigint | <br />
nr_insert_conflict | bigint | <br />
nr_update | bigint | <br />
nr_update_conflict | bigint | <br />
nr_delete | bigint | <br />
nr_delete_conflict | bigint | <br />
nr_disconnect | bigint | <br />
<br />
Fields:<br />
<br />
* <tt>rep_node_id</tt>: An internal identifier for the replication slot.<br />
<br />
* <tt>riremotesysid</tt>: The remote database system identifier, as reported by the <tt>Database system identifier</tt> line of <tt>pg_controldata /path/to/datadir</tt><br />
<br />
* <tt>riremotedb</tt>: The remote database OID, ie the <tt>oid</tt> column of the remote server's <tt>pg_catalog.pg_database</tt> entry for the replicated database. You can get the database name with <tt>select datname from pg_database where oid = 12345</tt> (where '12345' is the <tt>riremotedb</tt> oid).<br />
<br />
* <tt>rilocaldb </tt>: The local database OID, with the same meaning as <tt>riremotedb</tt> but with oids from the local system.<br />
<br />
''The rest of the rows are statistics about this upstream master slot'':<br />
<br />
* <tt>nr_commit</tt>: Number of commits applied to date from this master<br />
<br />
* <tt>nr_rollback</tt>: Number of rollbacks performed by this apply process due to recoverable errors (deadlock retries, lost races, etc) or unrecoverable errors like mismatched constraint errors.<br />
<br />
* <tt>nr_insert</tt>: Number of <tt>INSERT</tt>s performed<br />
<br />
* <tt>nr_insert_conflict</tt>: Number of <tt>INSERT</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_update</tt>: Number of <tt>UPDATE</tt>s performed<br />
<br />
* <tt>nr_update_conflict</tt>: Number of <tt>UPDATE</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_delete</tt>: Number of deletes performed<br />
<br />
* <tt>nr_delete_conflict</tt>: Number of deletes that resulted in conflicts.<br />
<br />
* <tt>nr_disconnect</tt>: Number of times this apply process has lost its connection to the upstream master since it was started.<br />
<br />
<br />
This view does not contain any information about how far behind the upstream master this downstream master is. The upstream master's <tt>pg_stat_logical_decoding</tt> and <tt>pg_stat_replication</tt> views must be queried to determine replication lag.<br />
<br />
==== Monitoring uses of <tt>bdr.bdr_nodes</tt> ====<br />
<br />
While generally not intended for end-user access, <tt>bdr.bdr_nodes</tt> may be queried to see the current state of any node initialization.<br />
<br />
A node writes a row for its <tt>(sysid, dbname, timelineid)</tt> in <tt>bdr.bdr_nodes</tt> when it is joining the BDR group. The only field of interest to users is <tt>status</tt>, which may have the values:<br />
<br />
* <tt>i</tt>: The node is doing initial slot creation or an initial dump and load (see <tt>init_replica</tt>, above).<br />
<br />
* <tt>c</tt>: The node is catching up to its <tt>init_replica</tt> target node and is not yet ready to participate fully in BDR.<br />
<br />
* <tt>r</tt>: The node is fully ready. Slots may be created on this node and it may participate fully in BDR.<br />
<br />
Note that the status doesn't indicate whether the node is actually up right now. A node may be shut down, isolated from the network, or crashed and still appear as <tt>r</tt> in <tt>bdr.bdr_nodes</tt> because it's still conceptually part of the BDR group.<br />
<br />
At this time there are no SQL-level functions for adding/removing nodes. <b>Do not directly modify <tt>bdr.bdr_nodes</tt>.</b>.<br />
<br />
==== Monitoring <tt>bdr.bdr_conflict_history</tt> ====<br />
<br />
<tt>bdr.bdr_conflict_history</tt> tracks conflicts that arise in replication. To learn more about conflicts see [[#Conflict Detection & Resolution]].<br />
<br />
At this time only detected conflicts are recorded in <tt>bdr.bdr_conflict_history</tt>. In future logging of unhandled conflicts where the conflict isn't detected until an apply statement fails with an <tt>ERROR</tt> will be added.<br />
<br />
Unlike most tables, the contents of <tt>bdr.bdr_conflict_history</tt> are ''not'' replicated between nodes. Each node has a local copy of the table with distinct data. This is a technical limitation that may be lifted in a future release, but it also saves on unnecessary replication overhead.<br />
<br />
You can use the conflict history table to determine how rapidly your application creates conflicts and where those conflicts occur, allowing you to improve the application to reduce conflict rates. It also helps detect cases where conflict resolutions may not have produced the desired results, allowing you to identify places where a user defined conflict trigger or an application design change may be desirable.<br />
<br />
Row values may optionally be logged for row conflicts. This is controlled by the global database-wide option <tt>bdr.log_conflicts_to_table</tt>. There is no per-table control over row value logging at this time. Nor is there any limit applied on the number of fields a row may have, number of elements dumped in arrays, length of fields, etc, so it may not be wise to enable this if you regularly work with multi-megabyte rows that may trigger conflicts.<br />
<br />
Because the conflict history table contains data on every table in the database so each row's schema might be different, if row values are logged they are stored as <tt>json</tt> fields. The json is created with <tt>[http://www.postgresql.org/docs/current/static/functions-json.html row_to_json]</tt>, just like if you'd called it on the row yourself from SQL. There is no corresponding <tt>json_to_row</tt> function in PostgreSQL at this time, so you'll need table-specific code (pl/pgsql, pl/python, pl/perl, whatever) if you want to reconstruct a composite-typed tuple from the logged json.<br />
<br />
The structure of <tt>bdr_conflict_history</tt> is:<br />
<br />
Column | Type |<br />
--------------------------+-----------------------------+<br />
conflict_id | bigint |<br />
local_node_sysid | text |<br />
local_conflict_xid | xid |<br />
local_conflict_lsn | pg_lsn |<br />
local_conflict_time | timestamp with time zone |<br />
object_schema | text |<br />
object_name | text |<br />
remote_node_sysid | text |<br />
remote_txid | xid |<br />
remote_commit_time | timestamp with time zone |<br />
remote_commit_lsn | pg_lsn |<br />
conflict_type | bdr.bdr_conflict_type |<br />
conflict_resolution | bdr.bdr_conflict_resolution |<br />
local_tuple | json |<br />
remote_tuple | json |<br />
local_tuple_xmin | xid |<br />
local_tuple_origin_sysid | text |<br />
error_message | text |<br />
error_sqlstate | text |<br />
error_querystring | text |<br />
error_cursorpos | integer |<br />
error_detail | text |<br />
error_hint | text |<br />
error_context | text |<br />
error_columnname | text |<br />
error_typename | text |<br />
error_constraintname | text |<br />
error_filename | text |<br />
error_lineno | integer |<br />
error_funcname | text |<br />
<br />
<br />
The primary key is <tt>(conflict_id, local_node_sysid)</tt>, where the conflict ID is auto-generated. The composite key allows replication between nodes to be added later without conflicts arising.<br />
<br />
Fields:<br />
<br />
* <tt>conflict_id</tt>: A locally unique key identifying each conflict.<br />
* <tt>local_node_sysid</tt>: The unique system identifier of the node that was applying the change and encountered the conflict. This is the receiving end of the slot.<br />
* <tt>local_conflict_xid</tt>: The transaction identifier of the apply transaction that detected the conflict.<br />
* <tt>local_conflict_lsn</tt>: The transaction log position the applying server was at when the conflict was detected. This can be used to order conflicts in time series.<br />
* <tt>local_conflict_time</tt>: The wall-clock time at which the conflict was detected on the local machine. This is ''not'' the time the conflict was originally created by a user SQL command.<br />
* <tt>object_schema</tt>: If this conflict applies to a particular database object (usually a table), the schema that object is in.<br />
* <tt>object_name</tt>: If this conflict applies to a particular database object, the name of that object, e.g. "my_table".<br />
* <tt>remote_node_sysid</tt>: The unique system identifier of the node that the change was received from. Unless catchup mode is active (where one node relays for another), this is the node that actually created the conflicting change.<br />
* <tt>remote_txid</tt>: The transaction ID on the remote node that created the conflicting change.<br />
* <tt>remote_commit_time</tt>: The wall-clock time that <tt>remote_txid</tt> committed on the remote node. This is read from the remote node's clock.<br />
* <tt>remote_commit_lsn</tt>: The transaction log position of the commit that ended the transaction this conflict was created in.<br />
* <tt>conflict_type</tt>: The type of conflict detected, e.g. <tt>insert_insert</tt>. For details on conflict types see the documentation on apply conflicts linked above. Possible values are:<br />
** <tt>insert_insert</tt>: Two <tt>INSERT</tt>s created the same key.<br />
** <tt>update_update</tt>: Two <tt>UPDATE</tt>s tried to modify the same row version.<br />
** <tt>update_delete</tt>: An <tt>UPDATE</tt> tried to modify a row that was concurrently <tt>DELETE</tt>d on another node. This conflict is only detected on the side that executed the conflicting <tt>DELETE</tt>; the side that did the <tt>UPDATE</tt> first doesn't see any conflict.<br />
** <tt>unhandled_tx_abort</tt>.<br />
* <tt>conflict_resolution</tt>: How BDR resolved this conflict:<br />
** <tt>conflict_trigger_skip_change</tt>: A user defined conflict handler decided to ignore this change completely, so this change was discarded.<br />
** <tt>conflict_trigger_returned_tuple</tt>: A user defined conflict handler decided to generate a replacement row instead of applying the local or remote rows. The replacement row was applied.<br />
** <tt>last_update_wins_keep_local</tt>: Timestamps were used to resolve the change in favour of the most recent change, which was the row already present on the local node, so this change was discarded.<br />
** <tt>last_update_wins_keep_remote</tt>: Timestamps were used to resolve the change in favour of the most recent change, which was the row sent by the remote node, so this change was applied.<br />
** <tt>unhandled_tx_abort</tt>: BDR did not have any way to resolve this conflict, a conflict trigger threw an exception, or another unhandled (possibly transient) error occurred. Examine the <tt>error_</tt> fields for details. Many non-error field will be unset in this case.<br />
* <tt>local_tuple</tt>: If tuple logging enabled, a json representation of the conflicting local tuple already present on this node if any.<br />
* <tt>remote_tuple</tt>: If tuple logging enabled, a json representation of the conflicting remote tuple received, if any.<br />
* <tt>local_tuple_xmin</tt>: The transaction ID that created the most recent version of the conflicting local tuple, if known.<br />
* <tt>local_tuple_origin_sysid</tt>: If the local tuple was replicated from a remote node using BDR, the system identifier of the real origin node. Null if the tuple was created directly on the local node.<br />
* <tt>error_message</tt>: For unhandled errors, the main error message<br />
* <tt>error_sqlstate</tt>: For unhandled errors, the SQLSTATE; see [http://www.postgresql.org/docs/current/static/errcodes-appendix.html error codes].<br />
* <tt>error_querystring</tt>: For unhandled errors, the text of the query that failed if available. (Currently never populated).<br />
* <tt>error_cursorpos</tt>: For unhandled errors, the position of the error within the query, if supplied by the server. (Currently never populated).<br />
* <tt>error_detail</tt>: For unhandled errors, any additional <tt>DETAIL</tt> section included in the error message.<br />
* <tt>error_hint</tt>: For unhandled errors, any additional <tt>HINT</tt> section included in the error message.<br />
* <tt>error_context</tt>: For unhandled errors, a call context like a pl/pgsql stack, containing function, etc.<br />
* <tt>error_columnname</tt>: For unhandled errors, the column of the target table if a specific column was affected. The table schema and name are in <tt>object_schema</tt> and <tt>object_name</tt>.<br />
* <tt>error_typename</tt>: For unhandled errors applying to a particular data type (like cast / conversion errors), the type name.<br />
* <tt>error_constraintname</tt>: For unhandled errors applying to a particular table constraint, the name of the constraint that was violated.<br />
* <tt>error_filename</tt>: For unhandled errors, the PostgreSQL source file name of the location that raised the error.<br />
* <tt>error_lineno</tt>: For unhandled errors, the line number within <tt>error_filename</tt> that raised the error.<br />
* <tt>error_funcname</tt>: For unhandled errors, the name of the PostgreSQL C-level function that raised the error (if known).<br />
<br />
At this time none of the <tt>error_</tt> fields are used. In future they will be populated with the fields from an unhandled error, matching those error fields output by <tt>psql</tt> like <tt>HINT</tt>, etc.<br />
<br />
==== Monitoring replication status and lag ====<br />
<br />
As with any replication setup, it is vital to monitor replication status on all BDR nodes to ensure no node is lagging severely behind the others or is stuck.<br />
<br />
In the case of BDR a stuck or crashed node will eventually cause disk space and table bloat problems on other masters so stuck nodes should be detected and removed or repaired in a reasonably timely manner. Exactly how urgent this is depends on the workload of the BDR group.<br />
<br />
The <tt>pg_stat_logical_decoding</tt> view described above may be used to verify that a downstream master is connected to its upstream master by querying it on the upstream side - the <tt>active</tt> boolean column is <tt>t</tt> if there's a downstream master connected to this upstream.<br />
<br />
The <tt>xmin</tt> column provides an indication of whether replication is advancing; it should increase as replication progresses. You can turn this into the time the transaction was committed on the master by running <tt>pg_get_transaction_committime(xmin)</tt> ''on the upstream master''. Since txids are different between upstream and downstream masters, running it on a downstream master with a txid from the upstream master as input would result in an error or and incorrect result.<br />
<br />
Example:<br />
<br />
postgres=# select slot_name, plugin, database, active, xmin,<br />
pg_get_transaction_committime(xmin)<br />
FROM pg_stat_logical_decoding ;<br />
-[ RECORD 1 ]-----------------+----------------------------------------<br />
slot_name | bdr: 12910:5882534759278050995-1-12910:<br />
plugin | bdr_output<br />
database | 12910<br />
active | f<br />
xmin | 1827<br />
pg_get_transaction_committime | 2013-05-27 06:14:36.851423+00<br />
<br />
=== Table and index usage statistics ===<br />
<br />
Statistics on table and index usage are updated normally by the downstream master. This is essential for correct function of auto-vacuum. If there are no local writes on the downstream master and stats have not been reset these two views should show matching results between upstream and downstream:<br />
<br />
* <tt>pg_stat_user_tables</tt><br />
* <tt>pg_statio_user_tables</tt><br />
<br />
Since indexes are used to apply changes, the identifying indexes on downstream side may appear more heavily used with workloads that perform <tt>UPDATE</tt>s and <tt>DELETE</tt>s than non-identifying indexes are. <br />
<br />
The built-in index monitoring views are:<br />
<br />
* <tt>pg_stat_user_indexes</tt><br />
* <tt>pg_statio_user_indexes</tt><br />
<br />
All these views are discussed in [http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE the PostgreSQL documentation on the statistics views].<br />
<br />
=== Starting, stopping and managing replication ===<br />
<br />
Replication is managed with the <tt>postgresql.conf</tt> settings described in "Parameter Reference" and "Configuration" above, and using the <tt>pg_receivellog</tt> utility command.<br />
<br />
==== Starting a new LLSR connection ====<br />
<br />
Logical replication is started automatically when a database is configured as a downstream master in <tt>postgresql.conf</tt> (see [[#Configuration|Configuration]]) and the postmaster is started. No explicit action is required to start replication, but replication will not actually work unless the upstream and downstream databases are identical within the requirements set by LLSR in the [[#Table definitions and DDL replication||Table definitions and DDL replication]] section.<br />
<br />
<tt>pg_dump</tt> and <tt>pg_restore</tt> may be used to set up the new replica's database.<br />
<br />
The development version in the <tt>bdr-next</tt> branch will automatically dump the upstream database and populate the local database if the <tt>bdr.<nodename>.init_replica</tt> setting is configured; see the parameter reference above.<br />
<br />
==== Viewing logical replication slots ====<br />
<br />
Examining the state of logical replication is discussed in [[#Monitoring|Monitoring]].<br />
<br />
==== Pausing and resuming logical replication ====<br />
<br />
You can execute the <tt>bdr_apply_pause()</tt> function to temporarily pause logical replication. Changes will once again be applied once you execute <tt>bdr_apply_resume()</tt>.<br />
<br />
==== Temporarily stopping an LLSR replica ====<br />
<br />
LLSR replicas can be temporarily stopped by shutting down the downstream master's postmaster.<br />
<br />
A stopped replica will still cause the upstream master to retain WAL for it, eventually causing the upstream master to run out of disk space in <tt>pg_xlog</tt>. Do not leave a replica shut down for too long - if it's going to be out of service for an extended period, consider dropping the upstream slot and retiring the replica, then creating a new one later.<br />
<br />
Once you remove an upstream slot you cannot simply rejoin a replica and have it catch up. It must be rebuilt.<br />
<br />
==== Removing an LLSR replica permanently ====<br />
<br />
To remove a replication connection permanently, remove its entries from the downstream master's <tt>postgresql.conf</tt>, restart the downstream master, then remove its slot from the upstream master with <tt>SELECT pg_drop_replication_slot('slotname')</tt> as a superuser. See [http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-REPLICATION the main PostgreSQL documentation].<br />
<br />
Alternately, you can use <tt>pg_receivellog</tt>:<br />
<br />
pg_receivellog -p 5434 -h master-hostname -d dbname \<br />
--slot='bdr: 16384:5873181566046043070-1-16384:' --stop<br />
<br />
It is important to remove the replication slot from the upstream master(s) to prevent xid wrap-around problems and issues with table bloat caused by delayed vacuum, and to prevent the upstream master from retaining WAL for the dead replica until it runs out of <tt>pg_xlog</tt> space.<br />
<br />
== Bi-Directional Replication ==<br />
<br />
Bi-Directional replication is built directly on LLSR by configuring two or more servers as both upstream ''and'' downstream masters of each other.<br />
<br />
All of the Log Level Streaming Replication documentation applies to BDR and should be read before moving on to reading about and configuring BDR.<br />
<br />
=== Bi-Directional Replication Use Cases ===<br />
<br />
Bi-Directional Replication is designed to allow a very wide range of server connection topologies. The simplest to understand would be two servers each sending their changes to the other, which would be produced by making each server the downstream master of the other and so using two connections for each database.<br />
<br />
Logical and physical streaming replication are designed to work side-by-side. This means that a master can be replicating using physical streaming replication to a local standby server, while at the same time replicating logical changes to a remote downstream master. Logical replication works alongside cascading replication also, so a physical standby can feed changes to a downstream master, allowing upstream master sending to physical standby sending to downstream master.<br />
<br />
==== Simple multi-master pair ====<br />
<br />
A simple mulit-master "HA Cluster" with two servers:<br />
<br />
* Server "Alpha" - Master<br />
* Server "Bravo" - Master<br />
<br />
===== Configuration =====<br />
<br />
Alpha:<br />
<br />
wal_level = 'logical'<br />
max_replication_slots = 3<br />
max_wal_senders = 4<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="bravo"<br />
bdr.bravo_dsn = 'dbname=dbtoreplicate'<br />
track_commit_timestamp = on<br />
<br />
Bravo:<br />
<br />
wal_level = 'logical'<br />
max_replication_slots = 3<br />
max_wal_senders = 4<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="alpha"<br />
bdr.alpha_dsn = 'dbname=dbtoreplicate'<br />
track_commit_timestamp = on<br />
<br />
See [[#Configuration|Configuration]] for an explanation of these parameters.<br />
<br />
==== HA and Logical Standby ====<br />
Downstream masters allow users to create temporary tables, so they can be used as reporting servers.<br />
<br />
"HA Cluster":<br />
<br />
* Server "Alpha" - Current Master<br />
* Server "Bravo" - Physical Standby - unused, apart from as failover target for Alpha - potentially specified in synchronous_standby_names<br />
* Server "Charlie" - "Logical Standby" - downstream master<br />
<br />
==== Very High Availability Multi-Master ====<br />
A typical configuration for remote multi-master would then be:<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha using logical streaming<br />
<br />
Bandwidth between Site 1 and Site 2 is minimised<br />
<br />
==== 3-remote site simple Multi-Master Plex ====<br />
<br />
BDR supports "all to all" connections, so the latency for any change being applied on other masters is minimised. (Note that early designs of multi-master were arranged for circular replication, which has latency issues with larger numbers of nodes)<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Alpha, Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha, Charlie using logical streaming replication<br />
<br />
===== Configuration =====<br />
<br />
If you wanted to test this configuration locally you could run three PostgreSQL instances on different ports. Such a configuration would look like the following if the port numbers were used as node names for the sake of notational clarity:<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441,node_5442'<br />
bdr.node_5441_dsn='port=5441 dbname=postgres'<br />
bdr.node_5442_dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5440,node_5442'<br />
bdr.node_5440_dsn='port=5440 dbname=postgres'<br />
bdr.node_5442_dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440,node_5441'<br />
bdr.node_5440_dsn='port=5440 dbname=postgres'<br />
bdr.node_5441_dsn='port=5441 dbname=postgres'<br />
<br />
In a typical real-world configuration each server would be on the same port on a different host instead.<br />
<br />
==== 3-remote site simple Multi-Master Circular Replication ====<br />
<br />
Simpler config uses "circular replication". This is simpler but results in higher latency for changes as the number of nodes increases. It's also less resilient to network disruptions and node faults.<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie using logical streaming replication<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha using logical streaming replication<br />
<br />
TODO: Regrettably this doesn't actually work yet because we don't cascade logical changes (yet).<br />
<br />
===== Configuration =====<br />
<br />
Using node names that match port numbers, for clarity<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441'<br />
bdr.node_5441_dsn='port=5441 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5442'<br />
bdr.node_5442_dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440'<br />
bdr.node_5440_dsn='port=5440 dbname=postgres'<br />
<br />
This would usually be done in the real world with databases on different hosts, all running on the same port.<br />
<br />
==== 3-remote site Max Availability Multi-Master Plex ====<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha, Echo using logical streaming<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Foxtrot using physical streaming with sync replication<br />
** Server "Foxtrot" - Physical Standby - feeds changes to Alpha, Charlie using logical streaming<br />
<br />
Bandwidth and latency between sites is minimised.<br />
<br />
Config left as an exercise for the reader.<br />
<br />
==== N-site symmetric cluster replication ====<br />
<br />
Symmetric cluster is where all masters are connected to each other.<br />
<br />
N=19 has been tested and works fine.<br />
<br />
N masters requires N-1 connections to other masters, so practical limits are <100 servers, or less if you have many separate databases.<br />
<br />
The amount of work caused by each change is O(N), so there is a much lower practical limit based upon resource limits. A future option to limit to filter rows/tables for replication becomes essential with larger or more heavily updated databases, which is planned.<br />
<br />
==== Complex/Assymetric Replication ====<br />
<br />
Variety of options are possible.<br />
<br />
=== Conflict Avoidance ===<br />
<br />
==== Distributed Locking ====<br />
<br />
Some clustering systems use distributed lock mechanisms to prevent concurrent access to data. These can perform reasonably when servers are very close but cannot support geographically distributed applications as very low latency is critical for acceptable performance.<br />
<br />
Distributed locking is essentially a pessimistic approach, whereas BDR advocates an optimistic approach: avoid conflicts where possible but allow some types of conflict to occur and and resolve them when they arise.<br />
<br />
==== Global Sequences ====<br />
<br />
Many applications require unique values be assigned to database entries. Some applications use GUIDs generated by external programs, some use database-supplied values. This is important with optimistic conflict resolution schemes because uniqueness violations are "divergent errors" and are not easily resolvable.<br />
<br />
The SQL standard requires Sequence objects which provide unique values, though these are isolated to a single node. These can then used to supply default values using <tt>DEFAULT nextval('mysequence')</tt>, as with PostgreSQL's <tt>SERIAL</tt> pseudo-type.<br />
<br />
BDR requires sequences to work together across multiple nodes. This is implemented as a new <tt>SequenceAccessMethod</tt> API (SeqAM), which allows plugins that provide get/set functions for sequences. Global Sequences are then implemented as a plugin which implements the SeqAM API and communicates across nodes to allow new ranges of values to be stored for each sequence.<br />
<br />
=== Conflict Detection & Resolution ===<br />
<br />
Because local writes can occur on a master, conflict detection and avoidance is a concern for basic LLSR setups as well as full BDR configurations.<br />
<br />
==== Lock Conflicts ====<br />
<br />
Changes from the upstream master are applied on the downstream master by a single apply process. That process needs to RowExclusiveLock on the changing table and be able to write lock the changing tuple(s). Concurrent activity will prevent those changes from being immediately applied because of lock waits. Use the <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt> facility to look for issues with apply blocking on locks.<br />
<br />
By concurrent activity on a row, we include <br />
<br />
* explicit row level locking (<tt>SELECT ... FOR UPDATE/FOR SHARE</tt>)<br />
* locking from foreign keys<br />
* implicit locking because of row <tt>UPDATE</tt>s, <tt>INSERT</tt>s or <tt>DELETE</tt>s, either from local activity or apply from other servers<br />
<br />
==== Data Conflicts ====<br />
<br />
Concurrent inserts, updates and deletes may also cause data-level conflicts to occur, which then require conflict resolution. It is important that these conflicts are resolved in a consistent and idempotent manner so that all servers end up with identical results.<br />
<br />
Concurrent <tt>UPDATE</tt>s are detected by BDR and resolved using last-update-wins strategy using timestamps. Should timestamps be identical, the tie is broken using system identifier from <tt>pg_control</tt> though this may change in a future release.<br />
<br />
<tt>INSERT</tt>s may cause uniqueness violation errors because of primary keys when applied at remote nodes. This conflict is detected by BDR. Concurrent inserts to the same key are resolved on a last-update-wins basis.<br />
<br />
Additionally, <tt>UPDATE</tt>s and <tt>INSERT</tt>s may cause violations of exclusion constraints or unique indexes when they are applied. These are not easily detectable or resolvable and represent severe application errors that cause the database contents of multiple servers to diverge from each other. Hence these are known as "divergent conflicts". Currently, replication stops should a divergent conflict occur. The errors causing the conflict can be seen in the error log of the downstream master with the problem.<br />
<br />
Updates which cannot locate a row are presumed to be <tt>DELETE</tt>/<tt>UPDATE</tt> conflicts. These are accepted as successful operations but in the case of <tt>UPDATE</tt> the data in the <tt>UPDATE</tt> is discarded. (Future improvements may allow replay to be forced until all nodes are caught up past the conflicting change so it can be definitively identified as a conflict, not just asynchronous changes). <br />
<br />
All conflicts are resolved at row level. Concurrent updates that touch completely separate columns can result in "false conflicts", where there is conflict in terms of the data, just in terms of the row update. Such conflicts will result in just one of those changes being made, the other discarded according to last update wins. It is not practical to automatically decide when a row should be merged and when a last-update-wins stragegy should be used at the database level. User-defined conflict resolution functions (see below) may be used where this is required.<br />
<br />
Changing unlogged and logged tables in the same transaction can result in apparently strange outcomes since the unlogged tables aren't replicated.<br />
<br />
==== Examples ====<br />
<br />
As an example, lets say we have two tables Activity and Customer. There is a Foreign Key from Activity to Customer, constraining us to only record activity rows that have a matching customer row. <br />
<br />
* We update a row on Customer table on NodeA. The change from NodeA is applied to NodeB just as we are inserting an activity on NodeB. The inserted activity causes a FK check.... <br />
<br />
<br />
=== Conflict Resolution by user-defined handlers (aka conflict handlers) ===<br />
<br />
For various conflicts the ability to resolve conflicts by handlers exists. Conflict handers are user-defined functions in for example [http://www.postgresql.org/docs/current/static/plpgsql.html PL/pgSQL]. They follow a specific API. Each handler function has to follow this signature:<br />
<br />
handler_fun(local_row tbltype, remote_row tbltype, command_tag text, rel regclass, event bdr.bdr_handler_types, OUT resolution_row tbltype, OUT resolution bdr.bdr_conflict_handler_action) RETURNS RECORD<br />
<br />
where the parameters are:<br />
<br />
* <code>local_row</code> and <code>remote_row</code>: the conflicting rows. Either one can be <code>NULL</code>, depending on <tt>event</tt> (see below). Their type is always the row-type of the table the conflict trigger applies to.<br />
* <code>command_tag</code>: Contains the executed command name, e.g. <code>UPDATE</code> or <code>INSERT</code>.<br />
* <code>rel</code>: The oid of the relation the conflict appears in (matching <tt>pg_class.oid</tt>).<br />
* <code>event</code>: The conflict event type (e.g. <code>UPDATE_VS_UPDATE</code>)<br />
* ''OUT'' <code>resolution_row</code>: A row chosen or created by the trigger to resolve the conflict with, if called for by <tt>resolution</tt> below, or NULL if not required.<br />
* ''OUT'' <code>resolution</code>: The trigger's decision about the resolution of the conflict. It may contain the following values:<br />
<br />
** <code>IGNORE</code>: ignore this handler's result and continue to the next one. In this case the <code>resolution_row</code> is ignored and may be <code>NULL</code><br />
** <code>ROW</code>: take the <code>resolution_row</code> to resolve this conflict, using this row to replace the row the conflict occurred for. <tt>resolution_row</tt> may not be <tt>NULL</tt>.<br />
** <code>SKIP</code>: simply ignore this conflict and don't apply anything. In this case <code>resolution_row</code> is ignored and may be <code>NULL</code>.<br />
<br />
Currently the following conflicts are supported:<br />
<br />
* <code>UPDATE</code> vs <code>UPDATE</code><br />
* <code>UPDATE</code> vs <code>DELETE</code><br />
<br />
The following conflict types are defined by <code>bdr.bdr_handler_types</code>:<br />
<br />
* <code>UPDATE_VS_UPDATE</code><br />
* <code>UPDATE_VS_DELETE</code><br />
* <code>INSERT_VS_INSERT</code><br />
* <code>INSERT_VS_UPDATE</code><br />
<br />
The conflict handler function has to use the exact row type of the table it applies to. As a consequence it is not possible to write one conflict handler that applies to multiple different table types at this time.<br />
<br />
==== Registering Conflict Handlers ====<br />
<br />
Conflict handlers are registered by a function provided by bdr:<br />
<br />
bdr.bdr_create_conflict_handler(ch_rel REGCLASS, ch_name NAME, ch_proc REGPROCEDURE, ch_type bdr.bdr_handler_types, ch_timeframe INTERVAL DEFAULT NULL) RETURNS VOID<br />
<br />
<code>ch_rel REGCLASS</code> defines the relation the conflict handler is responsible for, <code>ch_name NAME</code> defines the handler name for identification, <code>ch_proc REGPROCEDURE</code> defines the conflict handler procedure and <code>ch_type bdr.bdr_handler_types</code> defines the handler type (e.g. <code>UPDATE_VS_UPDATE</code>). The last parameter <code>ch_timeframe INTERVAL</code> is optional and defaults to 0. It defines the timeframe the conflict handler should be called in. E.g. if the conflicting row appeared 10 seconds in the past and you have a timeframe of 10ms, the handler wouldn't be called. If the conflicting row appeared 1ms ago, it would be called since it is in the time frame of 10ms.<br />
<br />
The conflict handler name has to be unique per table. Creating a conflict handler adds a dependency on the target relation. There can be more than one conflict handler per table and conflict type. In this case the handlers get called in alphabetical order, stopping at and returning the first resolution different from <code>IGNORE</code>.<br />
<br />
==== Removing a Conflict Handler ====<br />
<br />
Conflict handlers can be removed by a function provided by bdr:<br />
<br />
bdr.bdr_drop_conflict_handler(ch_rel REGCLASS, ch_name NAME) RETURNS VOID<br />
<br />
The <code>ch_rel REGCLASS</code> parameter defines the relation the conflict handler is responsible for, <code>ch_name NAME</code> defines the name of the conflict handler to drop.<br />
<br />
Dropping the conflict handler removes the dependency again.<br />
<br />
==== Listing Conflict Handlers ====<br />
<br />
There's also a list of registered conflict handlers available as a view:<br />
<br />
CREATE VIEW bdr_list_conflict_handlers(ch_name, ch_type, ch_reloid, ch_fun)<br />
<br />
<code>ch_name TEXT</code> is the name of the handler, <code>ch_type bdr.bdr_handler_types</code> is the conflict handler type, <code>ch_reloid OID</code> is the relation Oid, <code>ch_fun REGPROCEDURE</code> is the conflict handler function and <code>ch_timeframe INTERVAL</code> is the timeframe the handler is valid in.<br />
<br />
==== Parameter Definition by Conflict Type ====<br />
<br />
The parameters and return values for conflict handlers differ for the different conflict types:<br />
<br />
===== <code>UPDATE</code> vs <code>UPDATE</code> =====<br />
<br />
In this case all resolutions as defined above are valid. <code>local_row</code> and <code>remote_row</code> are both non-<code>NULL</code>.<br />
<br />
===== <code>UPDATE</code> vs <code>DELETE</code> =====<br />
<br />
In this case only <code>SKIP</code> and <code>IGNORE</code> resolutions are valid. <code>local_row</code> is <code>NULL</code>, <code>remote_row</code> contains the conflicting remote row. This case mainly exists to be able to fatally error out via <code>RAISE EXCEPTION</code>.<br />
<br />
<br />
[[Category:Replication]]</div>Amshttps://wiki.postgresql.org/index.php?title=BDR_User_Guide&diff=19754BDR User Guide2013-05-15T13:46:31Z<p>Ams: Set track_commit_timestamp to on</p>
<hr />
<div>----<br />
This page is the users and administrators guide for BDR. If you're looking for technical details on the project plan and implementation, see [[BDR Project]].<br />
----<br />
<br />
= BDR User Guide =<br />
<br />
BDR (BiDrectional Replication) is a feature being developed for inclusion in PostgreSQL core that provides greatly enhanced replication capabilities.<br />
<br />
BDR allows users to create a geographically distributed multi-master database using Logical Log Streaming Replication (LLSR) transport.<br />
BDR is designed to provide both high availability and geographically distributed disaster recovery capabilities. <br />
<br />
BDR is not “clustering” as some vendors use the term, in that it doesn't have a distributed lock manager, global transaction co-ordinator, etc. Each member server is separate yet connected, with design choices that allow separation between nodes that would not be possible with global transaction coordination.<br />
<br />
Guidance on getting a testing setup established are in [[#Initial setup]]. Please read the full documentation if you intend to put BDR into production.<br />
<br />
== Logical Log Streaming Replication ==<br />
<br />
Logical log streaming replication (LLSR) allows one PostgreSQL master (the "upstream master") to stream a sequence of changes to another read/write PostgreSQL server (the "downstream master"). Data is sent in one direction only over a normal libpq connection.<br />
<br />
Multiple LLSR connections can be used to set up bi-directional replication as discussed later in this guide.<br />
<br />
=== Overview of logical replication ===<br />
<br />
In some ways LLSR is similar to "streaming replication" i.e. physical log streaming replication (PLSR) from a user perspective; both replicate changes from one server to another. However, in LLSR the receiving server is also a full master database that can make changes, unlike the read-only replicas offered by PLSR hot standby. Additionally, LLSR is per-database, whereas PLSR is per-cluster and replicates all databases at once. There are many more differences discussed in the relevant sections of this document.<br />
<br />
In LLSR the data that is replicated is change data in a special format that allows the changes to be logically reconstructed on the downstream master. The changes are generated by reading transaction log (WAL) data, making change capture on the upstream master much more efficient than trigger based replication, hence why we call this "logical log replication". Changes are passed from upstream to downstream using the libpq protocol, just as with physical log streaming replication.<br />
<br />
One connection is required for each PostgreSQL database that is replicated. If two servers are connected, each of which has 50 databases then it would require 50 connections to send changes in one direction, from upstream to downstream. Each database connection must be specified, so it is possible to filter out unwanted databases simply by avoiding configuring replication for those databases.<br />
<br />
Setting up replication for new databases is not (yet?) automatic, so additional configuration steps are required after <tt>CREATE DATABASE</tt>. A restart of the downstream master is also required. The upstream master only needs restarting if the <tt>max_logical_slots</tt> parameter is too low to allow a new replica to be added. Adding replication for databases that do not exist yet will cause an ERROR, as will dropping a database that is being replicated. Setup is discussed in more detail below.<br />
<br />
Changes are processed by the downstream master using <tt>bdr</tt> plug-ins. This allows flexible handing of replication input, including:<br />
<br />
* BDR apply process - applies logical changes to the downstream master. The apply process makes changes directly rather than generating SQL text and then parse/plan/executing SQL.<br />
* Textual output plugin - a demo plugin that generates SQL text (but doesn't apply changes)<br />
* <tt>pg_xlogdump</tt> - examines physical WAL records and produces textual debugging output. This server program is included in PostgreSQL 9.3.<br />
<br />
=== Replication of DML changes ===<br />
<br />
All changes are replicated: <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> and <tt>TRUNCATE</tt>. <br />
<br />
(TRUNCATE is not yet implemented, but will be implemented before the feature goes to final release).<br />
<br />
Actions that generate WAL data but don't represent logical changes do not result in data transfer, e.g. full page writes, VACUUMs, hint bit setting. LLSR avoids much of the overhead from physical WAL, though it has overheads that mean that it doesn't always use less bandwidth than PLSR.<br />
<br />
Locks taken by <tt>LOCK</tt> and <tt>SELECT ... FOR UPDATE/SHARE</tt> on the upstream master are not replicated to downstream masters. Locks taken automatically by <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> or <tt>TRUNCATE</tt> *are* taken on the downstream master and may delay replication apply or concurrent transactions - see [[#Lock Conflicts|Lock Conflicts]].<br />
<br />
<tt>TEMPORARY</tt> and <tt>UNLOGGED</tt> tables are not replicated. In contrast to physical standby servers, downstream masters can use temporary and unlogged tables.<br />
<br />
<tt>DELETE</tt> and <tt>UPDATE</tt> statements that affect multiple rows on upstream master will cause a series of row changes on downstream master. These are likely to go at same speed as on origin, as long as an index is defined on the Primary Key of the table on the downstream master. <tt>INSERT</tt> on upstream master do not require a unique constraint in order to replicate correctly. <tt>UPDATE</tt>s and <tt>DELETE</tt>s require some form of unique constraint, either <tt>PRIMARY KEY</tt> or <tt>UNIQUE NOT NULL</tt>. A warning is issued in the downstream master's logs if the expected constraint is absent.<br />
<br />
<tt>UPDATE</tt>s that change the value of the Primary Key of a table will be replicated correctly.<br />
<br />
The values applied are the final values from the <tt>UPDATE</tt> on the upstream master, including any modifications from before-row triggers, rules or functions. Any reflexive conditions, such as N = N+ 1 are resolved to their final value. Volatile or stable functions are evaluated on the master side and the resulting values are replicated. Consequently any function side-effects (writing files, network socket activity, updating internal PostgreSQL variables, etc) will not occur on the replicas as the functions are not run again on the replica.<br />
<br />
All columns are replicated on each table. Large column values that would be placed in TOAST tables are replicated without problem, avoiding de-compression and re-compression. If we update a row but do not change a TOASTed column value, then that data is not sent downstream.<br />
<br />
All data types are handled, not just the built-in datatypes of PostgreSQL core. The only requirement is that user-defined types are installed identically in both upstream and downstream master (see "Limitations").<br />
<br />
The current LLSR plugin implementation uses the binary libpq protocol, so it requires that the upstream and downstream master use same CPU architecture and word-length, i.e. "identical servers", as with physical replication. A textual output option will be added later for passing data between non-identical servers, e.g. laptops or mobile devices communicating with a central server.<br />
<br />
Changes are accumulated in memory (spilling to disk where required) and then sent to the downstream server at commit time. Aborted transactions are never sent. Application of changes on downstream master is currently single-threaded, though this process is efficiently implemented. Parallel apply is a possible future feature, especially for changes made while holding <tt>AccessExclusiveLock</tt>.<br />
<br />
Changes are applied to the downstream master in the sequence in which they were commited on the upstream master. This is a known-good serialization ordering of changes, so no replication failures are possible, as can happen with statement based replication (e.g. MySQL) or trigger based replication (e.g. Slony version 2.0). Users should note that this means the original order of locking of tables is not maintained. Although lock order is provably not an issue for the set of locks held on upstream master, additional locking on downstream side could cause lock waits or deadlocking in some cases. (Discussed in further detail later).<br />
<br />
Larger transactions spill to disk on the upstream master once they reach a certain size. Currently, large transactions can cause increased latency. Future enhancement will be to stream changes to downstream master once they fill the upstream memory buffer, though this is likely to be implemented in 9.5.<br />
<br />
<tt>SET</tt> statements and parameter settings are not replicated. This has no effect on replication since we only replicate actual changes, not anything at SQL statement level. We always update the correct tables, whatever the setting of <tt>search_path</tt>. Values are replicated correctly irrespective of the values of <tt>bytea_output</tt>, <tt>TimeZone</tt>, <tt>DateStyle</tt>, etc.<br />
<br />
<tt>NOTIFY</tt> is not supported across log based replication, either physical or logical. <tt>NOTIFY</tt> and <tt>LISTEN</tt> will work fine on the upstream master but an upstream <tt>NOTIFY</tt> will not trigger a downstream <tt>LISTEN</tt>er.<br />
<br />
In some cases, additional deadlocks can occur on apply. This causes an automatic retry of the apply of the replaying transaction and is only an issue if the deadlock recurs repeatedly, delaying replication.<br />
<br />
From a performance and concurrency perspective the BDR apply process is similar to a normal backend. Frequent conflicts with locks from other transactions when replaying changes can slow things down and thus increase replication delay, so reducing the frequency of such conflicts can be a good way to speed things up. Any lock held by another transaction on the downstream master - <tt>LOCK</tt> statements, <tt>SELECT ... FOR UPDATE/FOR SHARE</tt>, or <tt>INSERT</tt>/<tt>UPDATE</tt>/<tt>DELETE</tt> row locks - can delay replication if the replication apply process needs to change the locked table/row.<br />
<br />
=== Table definitions and DDL replication ===<br />
<br />
DML changes are replicated between tables with matching <tt>"Schemaname"."Tablename"</tt> on both upstream and downstream masters. e.g. changes from upstream's <tt>public.mytable</tt> will go to downstream's <tt>public.mytable</tt> while changes to the upstream <tt>mychema.mytable</tt> will go to the downstream <tt>myschema.mytable</tt>. This works even when no schema is specified on the original SQL since we identify the changed table from its internal OIDs in WAL records and then map that to whatever internal identifier is used on the downstream node.<br />
<br />
This requires careful synchronization of table definitions on each node otherwise <tt>ERROR</tt>s will be generated by the replication apply process. In general, tables must be an exact match between upstream and downstream masters. <br />
<br />
There are no plans to implement working replication between dissimilar table definitions.<br />
<br />
Tables must meet the following requirements to be compatible for purposes of LLSR:<br />
<br />
* The downstream master must only have constraints (<tt>CHECK</tt>, <tt>UNIQUE</tt>, <tt>EXCLUSION</tt>, <tt>FOREIGN KEY</tt>, etc) that are also present on the upstream master. Replication may initially work with mismatched constraints but is likely to fail as soon as the downstream master rejects a row the upstream master accepted.<br />
* The table referenced by a FOREIGN KEY on a downstream master must have all the keys present in the upstream master version of the same table.<br />
* Storage parameters must match except for as allowed below<br />
* Inheritance must be the same<br />
* Dropped columns on master must be present on replicas<br />
* Custom types and enum definitions must match exactly<br />
* Composite types and enums must have the same oids on master and replication target<br />
* Extensions defining types used in replicated tables must be of the same version or fully SQL-level compatible and the oids of the types they define must match.<br />
<br />
The following differences are permissible between tables on different nodes:<br />
<br />
* The table's <tt>pg_class</tt> oid, the oid of its associated TOAST table, and the oid of the table's rowtype in <tt>pg_type</tt> may differ;<br />
* Extra or missing non-<tt>UNIQUE</tt> indexes<br />
* Extra keys in downstream lookup tables for <tt>FOREIGN KEY</tt> references that are not present on the upstream master<br />
* The table-level storage parameters for fillfactor and autovacuum<br />
* Triggers and rules may differ (they are not executed by replication apply)<br />
<br />
Replication of DDL changes between nodes will be possible using event triggers, but is not yet integrated with LLSR (see [[#LLSR Limitations|LLSR Limitations]]).<br />
<br />
Triggers and Rules are NOT executed by apply on downstream side, equivalent to an enforced setting of <tt>session_replication_role = origin</tt>.<br />
<br />
In future it is expected that composite types and enums with non-identical oids will be converted using text output and input functions. This feature is not yet implemented.<br />
<br />
=== LLSR limitations ===<br />
<br />
The current LLSR implementation is subject to some limitations, which are being progressively removed as work progresses.<br />
<br />
==== Data definition compatibility ====<br />
<br />
Table definitions, types, extensions, etc must be near identical between upstream and downstream masters. See [[#Table definitions and DDL replication|Table definitions and DDL replication]].<br />
<br />
==== DDL Replication ====<br />
<br />
DDL replication is not yet supported.<br />
<br />
==== Upstream feedback ====<br />
<br />
No feedback from downstream masters to the upstream master is implemented for asynchronous LLSR, so upstream masters must be configured to keep enough WAL. See [[#Configuration|Configuration]].<br />
<br />
==== TRUNCATE is not replicated ====<br />
<br />
TRUNCATE is not yet supported, however workarounds with user-level triggers are possible and a ProcessUtility hook is planned to implement a similar approach globally.<br />
<br />
The safest option is to define a user-level BEFORE trigger on each table that RAISEs an ERROR when TRUNCATE is attempted.<br />
<br />
A simple truncate-blocking trigger is:<br />
<br />
CREATE OR REPLACE FUNCTION deny_truncate() RETURNS trigger AS $$<br />
BEGIN<br />
IF tg_op = 'TRUNCATE' THEN<br />
RAISE EXCEPTION 'TRUNCATE is not supported on this table. Please use DELETE FROM.';<br />
ELSE<br />
RAISE EXCEPTION 'This trigger only supports TRUNCATE';<br />
END IF;<br />
END;<br />
$$ LANGUAGE plpgsql;<br />
<br />
It can be applied to a table with:<br />
<br />
CREATE TRIGGER deny_truncate_on_<tablename> BEFORE TRUNCATE ON <tablename><br />
FOR EACH STATEMENT EXECUTE PROCEDURE deny_truncate();<br />
<br />
A PL/PgSQL DO block that queries <tt>pg_class</tt> and loops over it to <tt>EXECUTE</tt> a dynamic SQL <tt>CREATE TRIGGER</tt> command for each table that does not already have the trigger can be used to apply the trigger to all tables.<br />
<br />
=== Initial setup ===<br />
<br />
To set up LLSR or BDR you first need a patched PostgreSQL that can support LLSR/BDR, then you need to create one or more LLSR/BDR senders and one or more LLSR/BDR receivers.<br />
<br />
==== Installing the patched PostgreSQL binaries ====<br />
<br />
Currently BDR is only available in builds of the 'bdr' branch on Andres Freund's git repo on git.postgresql.org. PostgreSQL 9.2 and below do not support BDR, and 9.3 requires patches, so this guide will not work for you if you are trying to use a normal install of PostgreSQL.<br />
<br />
First you need to clone, configure, compile and install like normal. Clone the sources from <tt>git://git.postgresql.org/git/users/andresfreund/postgres.git</tt> and checkout the <tt>bdr</tt> branch.<br />
<br />
If you have an existing local PostgreSQL git tree specify it as <tt>--reference /path/to/existing/tree</tt> to greatly speed your git clone.<br />
<br />
Example:<br />
<br />
mkdir -p $HOME/bdr<br />
cd bdr<br />
git clone git://git.postgresql.org/git/users/andresfreund/postgres.git $HOME/bdr/postgres-bdr-src<br />
cd postgres-bdr-src<br />
./configure --prefix=$HOME/bdr/postgres-bdr-bin<br />
make install<br />
cd contrib/bdr<br />
make install<br />
<br />
This will put everything in <tt>$HOME/bdr</tt>, with the source code and build tree in <tt>$HOME/bdr/postgres-bdr-src</tt> and the installed PostgreSQL in <tt>$HOME/bdr/postgres-bdr-bin</tt>. This is a convenient setup for testing and development because it doesn't require you to set up new users, wrangle permissions, run anything as root, etc, but it isn't recommended that you deploy this way in production.<br />
<br />
To actually use these new binaries you will need to:<br />
<br />
export PATH=$HOME/bdr/postgres-bdr-bin/bin:$PATH<br />
<br />
before running <tt>initdb</tt>, <tt>postgres</tt>, etc. You don't have to use the <tt>psql</tt> or <tt>libpq</tt> you compiled but you're likely to get version mismatch warnings if you don't.<br />
<br />
=== Parameter Reference ===<br />
<br />
The following parameters are new or have been changed in PostgreSQL's new logical streaming replication.<br />
<br />
==== <tt>shared_preload_libraries = ‘bdr’</tt> ====<br />
<br />
To load support for receiving changes on a downstream master, the <tt>bdr</tt> library must be added to the existing ‘shared_preload_libraries’ parameter. This loads the bdr library during postmaster start-up and allows it to create the required background worker(s).<br />
<br />
Upstream masters don't need to load the bdr library unless they're also operating as a downstream master as is the case in a BDR configuration.<br />
<br />
==== <tt>bdr.connections</tt> ====<br />
<br />
A comma-separated list of upstream master connection names is specified in <tt>bdr.connections</tt>. These names must be simple alphanumeric strings. They are used when naming the connection in error messages, configuration options and logs, but are otherwise of no special meaning.<br />
<br />
A typical two-upstream-master setting might be:<br />
<br />
bdr.connections = ‘upstream1, upstream2’<br />
<br />
==== <tt>bdr.&lt;connection_name&gt;.dsn</tt> ====<br />
<br />
Each connection name must have at least a data source name specified using the <tt>bdr.&lt;connection_name&gt;.dsn</tt> parameter. The DSN syntax is the same as that used by libpq so it is not discussed in further detail here. A <tt>dbname</tt> for the database to connect to must be specified; all other parts of the DSN are optional.<br />
<br />
The local (downstream) database name is assumed to be the same as the name of the upstream database being connected to, though future versions will make this configurable.<br />
<br />
For the above two-master setting for <tt>bdr.connections</tt> the DSNs might look like:<br />
<br />
bdr.upstream1.dsn = 'host=10.1.1.2 user=postgres dbname=replicated_db'<br />
bdr.upstream2.dsn = 'host=10.1.1.3 user=postgres dbname=replicated_db'<br />
<br />
==== <tt>max_logical_slots</tt> ====<br />
<br />
The new parameter <tt>max_logical_slots</tt> has been added for use on both upstream and downstream masters. This parameter controls the maximum number of logical replication slots - upstream or downstream - that this cluster may have at a time. It must be set at postmaster start time.<br />
<br />
As logical replication slots are persistent, slots are consumed even by replicas that are not currently connected. Slot management is discussed in Starting, Stopping and Managing Replication.<br />
<br />
<tt>max_logical_slots</tt> should be set to the sum of the number of logical replication upstream masters this server will have plus the number of logical replication downstream masters will connect to it it.<br />
<br />
==== <tt>wal_level = 'logical'</tt> ====<br />
<br />
A new setting, <tt>'logical'</tt>, has been added for the existing <tt>wal_level</tt> parameter. <tt>‘logical’</tt> includes everything that the existing <tt>hot_standby</tt> setting does and adds additional details required for logical changeset decoding to the write-ahead logs. <br />
<br />
This additional information is consumed by the upstream-master-side xlog decoding worker. Downstream masters that do not also act as upstream masters do not require <tt>wal_level</tt> to be increased above the default <tt>'minimal'</tt>.<br />
<br />
<tt>wal_level</tt>, except for the new <tt>'logical'</tt> setting, is [http://www.postgresql.org/docs/current/static/runtime-config-wal.html documented in the main PostgreSQL manual].<br />
<br />
==== <tt>max_wal_senders</tt> ====<br />
<br />
Logical replication hasn't altered the <tt>max_wal_senders</tt> parameter, but it is important in upstream masters for logical replication and BDR because every logical sender consumes a <tt>max_wal_senders</tt> entry.<br />
<br />
You should configure <tt>max_wal_senders</tt> to the sum of the number of physical and logical replicas you want to allow an upstream master to serve. If you intend to use <tt>pg_basebackup</tt> you should add at least two more senders to allow for its use.<br />
<br />
Like <tt>max_logical_slots</tt>, <tt>max_wal_senders</tt> entries don't cost a large amount of memory, so you can overestimate fairly safely.<br />
<br />
<tt>max_wal_senders</tt> is documented in [http://www.postgresql.org/docs/current/static/runtime-config-replication.html the main PostgreSQL documentation].<br />
<br />
==== <tt>wal_keep_segments</tt> ====<br />
<br />
Like <tt>max_wal_senders</tt>, the <tt>wal_keep_segments</tt> parameter isn't directly changed by logical replication but is still important for upstream masters. It is not required on downstream-only masters.<br />
<br />
<tt>wal_keep_segments</tt> should be set to a value that allows for some downtime or unreachable periods for downstream masters and for heavy bursts of write activity on the upstream master. <br />
<br />
Keep in mind that enough disk space must be available for the WAL segments, each of which is 16MB. If you run out of disk space the server will halt until disk space is freed and it may be quite difficult to free space when you can no longer start the server.<br />
<br />
If you exceed the required <tt>wal_keep_segments</tt> and "Insufficient WAL segments retained" error will be reported. See [[#Troubleshooting|Troubleshooting]].<br />
<br />
<tt>wal_keep_segments</tt> is documented in the [http://www.postgresql.org/docs/current/static/runtime-config-replication.html the main PostgreSQL manual].<br />
<br />
==== <tt>track_commit_timestamp</tt> ====<br />
<br />
Setting this parameter to "on" enables commit timestamp tracking, which is used to implement last-UPDATE-wins conflict resolution.<br />
<br />
=== Configuration ===<br />
<br />
Details on individual parameters are described in the [[parameter reference]] section.<br />
<br />
The following configuration is an example of a simple one-way LLSR replication setup - a single upstream master to a single downstream master.<br />
<br />
The upstream master (sender)'s <tt>postgresql.conf</tt> should contain settings like:<br />
<br />
wal_level = 'logical' # Include enough info for logical replication<br />
max_logical_slots = X # Number of LLSR senders + any receivers<br />
max_wal_senders = Y # Y = max_logical_slots plus any physical <br />
# streaming requirements<br />
wal_keep_segments = 5000 # Master must retain enough WAL segments to let <br />
# replicas catch up. Correct value depends on<br />
# rate of writes on master, max replica downtime<br />
# allowable. 5000 segments requires 78GB<br />
# in pg_xlog<br />
<br />
Downstream (receiver) <tt>postgresql.conf</tt>:<br />
<br />
shared_preload_libraries = 'bdr'<br />
<br />
bdr.connections="name_of_upstream_master" # list of upstream master nodenames<br />
bdr.<nodename>.dsn = 'dbname=postgres' # connection string for connection<br />
# from downstream to upstream master<br />
bdr.<nodename>.local_dbname = 'xxx' # optional parameter to cover the case <br />
# where the databasename on upstream <br />
# and downstream master differ. <br />
# (Not yet implemented)<br />
bdr.<nodename>.apply_delay # optional parameter to delay apply of<br />
# transactions, time in milliseconds <br />
bdr.synchronous_commit = ...; # optional parameter to set the<br />
# synchronous_commit parameter the<br />
# apply processes will be using<br />
max_logical_slots = X # set to the number of remotes<br />
<br />
Note that a server can be both sender and receiver, either two servers to each other or more complex configurations like replication chains/trees.<br />
<br />
The upstream (sender) <tt>pg_hba.conf</tt> must be configured to allow the downstream master to connect for replication. Otherwise you'll see errors like the following on the downstream master:<br />
<br />
FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "[local]", user "postgres"<br />
<br />
A suitable <tt>pg_hba.conf</tt> entry for a replication connection from the replica server 10.1.4.8 might be:<br />
<br />
host replication postgres 10.1.4.8/32 trust<br />
<br />
(the user name should match the user name configured in the downstream master's dsn. md5 password authentication is supported.)<br />
<br />
For more details on these parameters, see [[#Parameter Reference|Parameter Reference]].<br />
<br />
=== Troubleshooting ===<br />
<br />
==== Could not access file "bdr": No such file or directory ====<br />
<br />
If you see the error:<br />
<br />
FATAL: could not access file "bdr": No such file or directory<br />
<br />
when starting a database set up to receive BDR replication, you probably forgot to install <tt>contrib/bdr</tt>. See above.<br />
<br />
==== Invalid value for parameter ====<br />
<br />
An error like:<br />
<br />
LOG: invalid value for parameter ...<br />
<br />
when setting one of these parameters means your server doesn't support logical replication and will need to be patched or updated.<br />
<br />
==== Insufficient WAL segments retained ("requested WAL segment ... has already been removed") ====<br />
<br />
If <tt>wal_keep_segments</tt> is insufficient to meet the requirements of a replica that has fallen far behind, the master will report errors like:<br />
<br />
ERROR: requested WAL segment 00000001000000010000002D has already been removed<br />
<br />
Currently the replica errors look like:<br />
<br />
WARNING: Starting logical replication<br />
LOG: data stream ended<br />
LOG: worker process: master (PID 23812) exited with exit code 0<br />
LOG: starting background worker process "master"<br />
LOG: master initialized on master, remote dbname=master port=5434 replication=true fallback_application_name=bdr<br />
LOG: local sysid 5873181566046043070, remote: 5873181102189050714<br />
LOG: found valid replication identifier 1<br />
LOG: starting up replication at 1 from 1/2D9CA220<br />
<br />
but a more explicit error message for this condition is planned.<br />
<br />
The only way to recover from this fault is to re-seed the replica database.<br />
<br />
This fault could be prevented with feedback from the replica to the master, but this feature is not planned for the first release of BDR. Another alternative considered for future releases is making wal_keep_segments a dynamic parameter that is sized on demand.<br />
<br />
Monitoring of maximum replica lag and appropriate adjustment of wal_keep_segments will prevent this fault from arising.<br />
<br />
==== Couldn't find logical slot ====<br />
<br />
An error like:<br />
<br />
ERROR: couldn't find logical slot "bdr: 16384:5873181566046043070-1-24596:"<br />
<br />
on the upstream master suggests that a downstream master is trying to connect to a logical replication slot that no longer exists. The slot can not be re-created, so it is necessary to re-seed the downstream replica database.<br />
<br />
=== Operational Issues and Debugging ===<br />
<br />
In LLSR there are no user-level (ie SQL visible) ERRORs that have special meaning. Any ERRORs generated are likely to be serious problems of some kind, apart from apply deadlocks, which are automatically re-tried.<br />
<br />
=== Monitoring ===<br />
<br />
The following views are available for monitoring replication activity:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE pg_stat_replication]</tt><br />
* <tt>pg_stat_logical_replication</tt> (described below)<br />
* <tt>pg_stat_bdr</tt> (described below)<br />
<br />
The following configuration and logging parameters are useful for monitoring replication:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt><br />
<br />
==== pg_stat_logical_replication ====<br />
<br />
The new <tt>pg_stat_logical_replication</tt> view is specific to logical replication. It is based on the underlying <tt>pg_stat_get_logical_replication_slots</tt> function and has the following structure:<br />
<br />
View "pg_catalog.pg_stat_logical_replication"<br />
Column | Type | Modifiers <br />
--------------------------+---------+-----------<br />
slot_name | text | <br />
plugin | text | <br />
database | oid | <br />
active | boolean | <br />
xmin | xid | <br />
last_required_checkpoint | text | <br />
<br />
It contains one row for every connection from a downstream master to the server being queried (the upstream master). On a standalone PostgreSQL server or a downstream-only master this view will contain no rows.<br />
<br />
* <tt>slot_name</tt>: An internal name for a given logical replication slot (a connection from a downstream master to this upstream master). This slot name is used by the downstream master to uniquely identify its self and is used with the <tt>pg_receivellog</tt> command when managing logical replication slots. The slot name is composed of the decoding plugin name, the upstream database oid, the downstream system identifier (from <tt>pg_control</tt>), the downstream slot number, and the downstream database oid.<br />
<br />
* <tt>plugin</tt>: The logical replication plugin being used to decode WAL archives. You'll generally only see <tt>bdr_output</tt> here.<br />
<br />
* <tt>database</tt>: The oid of the database being replicated by this slot. You can get the database name by joining on <tt>pg_database.oid</tt>.<br />
<br />
* <tt>active</tt>: Whether this slot currently has an active connection.<br />
<br />
* <tt>xmin</tt>: The lowest transaction ID this replication slot can "see", like the xmin of a transaction or prepared transaction. xmin should keep on advancing as replication continues.<br />
<br />
* <tt>last_required_checkpoint</tt>: The checkpoint identifying the oldest WAL record required to bring this slot up to date with the upstream master. (This column is likely to be removed in a future version).<br />
<br />
==== pg_stat_bdr ====<br />
<br />
The <tt>pg_stat_bdr</tt> view is supplied by the <tt>bdr</tt> extension. It provides information on a server's connection(s) to its upstream master(s). It is not present on upstream-only masters.<br />
<br />
The primary purpose of this view is to report statistics on the progress of LLSR apply on a per-upstream master connection basis.<br />
<br />
View structure:<br />
<br />
View "public.pg_stat_bdr"<br />
Column | Type | Modifiers <br />
--------------------+--------+-----------<br />
rep_node_id | oid | <br />
riremotesysid | name | <br />
riremotedb | oid | <br />
rilocaldb | oid | <br />
nr_commit | bigint | <br />
nr_rollback | bigint | <br />
nr_insert | bigint | <br />
nr_insert_conflict | bigint | <br />
nr_update | bigint | <br />
nr_update_conflict | bigint | <br />
nr_delete | bigint | <br />
nr_delete_conflict | bigint | <br />
nr_disconnect | bigint | <br />
<br />
Fields:<br />
<br />
* <tt>rep_node_id</tt>: An internal identifier for the replication slot.<br />
<br />
* <tt>riremotesysid</tt>: The remote database system identifier, as reported by the <tt>Database system identifier</tt> line of <tt>pg_controldata /path/to/datadir</tt><br />
<br />
* <tt>riremotedb</tt>: The remote database OID, ie the <tt>oid</tt> column of the remote server's <tt>pg_catalog.pg_database</tt> entry for the replicated database. You can get the database name with <tt>select datname from pg_database where oid = 12345</tt> (where '12345' is the <tt>riremotedb</tt> oid).<br />
<br />
* <tt>rilocaldb </tt>: The local database OID, with the same meaning as <tt>riremotedb</tt> but with oids from the local system.<br />
<br />
''The rest of the rows are statistics about this upstream master slot'':<br />
<br />
* <tt>nr_commit</tt>: Number of commits applied to date from this master<br />
<br />
* <tt>nr_rollback</tt>: Number of rollbacks performed by this apply process due to recoverable errors (deadlock retries, lost races, etc) or unrecoverable errors like mismatched constraint errors.<br />
<br />
* <tt>nr_insert</tt>: Number of <tt>INSERT</tt>s performed<br />
<br />
* <tt>nr_insert_conflict</tt>: Number of <tt>INSERT</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_update</tt>: Number of <tt>UPDATE</tt>s performed<br />
<br />
* <tt>nr_update_conflict</tt>: Number of <tt>UPDATE</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_delete</tt>: Number of deletes performed<br />
<br />
* <tt>nr_delete_conflict</tt>: Number of deletes that resulted in conflicts.<br />
<br />
* <tt>nr_disconnect</tt>: Number of times this apply process has lost its connection to the upstream master since it was started.<br />
<br />
<br />
This view does not contain any information about how far behind the upstream master this downstream master is. The upstream master's <tt>pg_stat_logical_replication</tt> and <tt>pg_stat_replication</tt> views must be queried to determine replication lag.<br />
<br />
==== Monitoring replication status and lag ====<br />
<br />
As with any replication setup, it is vital to monitor replication status on all BDR nodes to ensure no node is lagging severely behind the others or is stuck.<br />
<br />
In the case of BDR a stuck or crashed node will eventually cause disk space and table bloat problems on other masters so stuck nodes should be detected and removed or repaired in a reasonably timely manner. Exactly how urgent this is depends on the workload of the BDR group.<br />
<br />
The <tt>pg_stat_logical_replication</tt> view described above may be used to verify that a downstream master is connected to its upstream master - the <tt>active</tt> boolean column is <tt>t</tt> if there's a downstream master connected.<br />
<br />
The <tt>xmin</tt> column provides an indication of whether replication is advancing; it should increase as replication progresses. There is no simple way to turn <tt>xmin</tt> into the time the last applied transaction was committed on the master, so it doesn't provide an indication of wall-clock lag.<br />
<br />
To determine wall-clock replication lag an application-level ticker may be used to periodically update a timestamp in a replicated table. The difference between this timestamp on the upstream and downstream masters provides the wall-clock replication lag. For BDR one row may be added to the table for each BDR master, giving an indication of how much lag each master has relative to each other master.<br />
<br />
=== Table and index usage statistics ===<br />
<br />
Statistics on table and index usage are updated normally by the downstream master. This is essential for correct function of auto-vacuum. If there are no local writes on the downstream master and stats have not been reset these two views should show matching results between upstream and downstream:<br />
<br />
* <tt>pg_stat_user_tables</tt><br />
* <tt>pg_statio_user_tables</tt><br />
<br />
Since indexes are used to apply changes, the identifying indexes on downstream side may appear more heavily used with workloads that perform <tt>UPDATE</tt>s and <tt>DELETE</tt>s than non-identifying indexes are. <br />
<br />
The built-in index monitoring views are:<br />
<br />
* <tt>pg_stat_user_indexes</tt><br />
* <tt>pg_statio_user_indexes</tt><br />
<br />
All these views are discussed in [http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE the PostgreSQL documentation on the statistics views].<br />
<br />
=== Starting, stopping and managing replication ===<br />
<br />
Replication is managed with the <tt>postgresql.conf</tt> settings described in "Parameter Reference" and "Configuration" above, and using the <tt>pg_receivellog</tt> utility command.<br />
<br />
==== Starting a new LLSR connection ====<br />
<br />
Logical replication is started automatically when a database is configured as a downstream master in <tt>postgresql.conf</tt> (see [[#Configuration|Configuration]]) and the postmaster is started. No explicit action is required to start replication, but replication will not actually work unless the upstream and downstream databases are identical within the requirements set by LLSR in the [[#Table definitions and DDL replication||Table definitions and DDL replication]] section.<br />
<br />
<tt>pg_dump</tt> and <tt>pg_restore</tt> may be used to set up the new replica's database.<br />
<br />
==== Viewing logical replication slots ====<br />
<br />
Examining the state of logical replication is discussed in [[#Monitoring|Monitoring]].<br />
<br />
==== Temporarily stopping an LLSR replica ====<br />
<br />
LLSR replicas can be temporarily stopped by shutting down the downstream master's postmaster.<br />
<br />
If the replica is not started back up before the upstream master discards the oldest WAL segment required for the downstream master to resume replay, as identified by the <tt>last_required_checkpoint</tt> column of <tt>pg_catalog.pg_stat_logical_replication</tt> then the replica will not resume replay. The error [[#Insufficient_WAL_segments_retained_.28.22requested_WAL_segment_..._has_already_been_removed.22.29|Insufficient WAL segments retained]] will be reported in the upstream master's logs. The replica must be re-created for replication to continue.<br />
<br />
==== Removing an LLSR replica permanently ====<br />
<br />
To remove a replication connection permanently, remove its entries from the downstream master's <tt>postgresql.conf</tt>, restart the downstream master, then use <tt>pg_receivellog</tt> to remove the replication slot on the upstream master.<br />
<br />
It is important to remove the replication slot from the upstream master(s) to prevent xid wrap-around problems and issues with table bloat caused by delayed vacuum.<br />
<br />
==== Cleaning up abandoned replication slots ====<br />
<br />
To remove a replication slot that was used for a now-defunct replica, find its slot name from the <tt>[[#pg_stat_logical_replication|pg_stat_logical_replication]]</tt> view on the upstream master then run:<br />
<br />
pg_receivellog -p 5434 -h master-hostname -d dbname \<br />
--slot='bdr: 16384:5873181566046043070-1-16384:' --stop<br />
<br />
where the argument to '--slot' is the slot name you found from the view.<br />
<br />
You may need to do this if you've created and then deleted several replicas so <tt>max_logical_slots</tt> has filled up with entries for replicas that no longer exist.<br />
<br />
== Bi-Directional Replication ==<br />
<br />
Bi-Directional replication is built directly on LLSR by configuring two or more servers as both upstream ''and'' downstream masters of each other.<br />
<br />
All of the Log Level Streaming Replication documentation applies to BDR and should be read before moving on to reading about and configuring BDR.<br />
<br />
=== Bi-Directional Replication Use Cases ===<br />
<br />
Bi-Directional Replication is designed to allow a very wide range of server connection topologies. The simplest to understand would be two servers each sending their changes to the other, which would be produced by making each server the downstream master of the other and so using two connections for each database.<br />
<br />
Logical and physical streaming replication are designed to work side-by-side. This means that a master can be replicating using physical streaming replication to a local standby server, while at the same time replicating logical changes to a remote downstream master. Logical replication works alongside cascading replication also, so a physical standby can feed changes to a downstream master, allowing upstream master sending to physical standby sending to downstream master.<br />
<br />
==== Simple multi-master pair ====<br />
<br />
A simple mulit-master "HA Cluster" with two servers:<br />
<br />
* Server "Alpha" - Master<br />
* Server "Bravo" - Master<br />
<br />
===== Configuration =====<br />
<br />
Alpha:<br />
<br />
wal_level = 'logical'<br />
max_logical_slots = 3<br />
max_wal_senders = 4<br />
wal_keep_segments = 5000<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="bravo"<br />
bdr.bravo.dsn = 'dbname=dbtoreplicate'<br />
track_commit_timestamp = on<br />
<br />
Bravo:<br />
<br />
wal_level = 'logical'<br />
max_logical_slots = 3<br />
max_wal_senders = 4<br />
wal_keep_segments = 5000<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="alpha"<br />
bdr.alpha.dsn = 'dbname=dbtoreplicate'<br />
track_commit_timestamp = on<br />
<br />
See [[#Configuration|Configuration]] for an explanation of these parameters.<br />
<br />
==== HA and Logical Standby ====<br />
Downstream masters allow users to create temporary tables, so they can be used as reporting servers.<br />
<br />
"HA Cluster":<br />
<br />
* Server "Alpha" - Current Master<br />
* Server "Bravo" - Physical Standby - unused, apart from as failover target for Alpha - potentially specified in synchronous_standby_names<br />
* Server "Charlie" - "Logical Standby" - downstream master<br />
<br />
==== Very High Availability Multi-Master ====<br />
A typical configuration for remote multi-master would then be:<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha using logical streaming<br />
<br />
Bandwidth between Site 1 and Site 2 is minimised<br />
<br />
==== 3-remote site simple Multi-Master Plex ====<br />
<br />
BDR supports "all to all" connections, so the latency for any change being applied on other masters is minimised. (Note that early designs of multi-master were arranged for circular replication, which has latency issues with larger numbers of nodes)<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Alpha, Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha, Charlie using logical streaming replication<br />
<br />
===== Configuration =====<br />
<br />
If you wanted to test this configuration locally you could run three PostgreSQL instances on different ports. Such a configuration would look like the following if the port numbers were used as node names for the sake of notational clarity:<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441,node_5442'<br />
bdr.node_5441.dsn='port=5441 dbname=postgres'<br />
bdr.node_5442.dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5440,node_5442'<br />
bdr.node_5440.dsn='port=5440 dbname=postgres'<br />
bdr.node_5442.dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440,node_5441'<br />
bdr.node_5440.dsn='port=5440 dbname=postgres'<br />
bdr.node_5441.dsn='port=5441 dbname=postgres'<br />
<br />
In a typical real-world configuration each server would be on the same port on a different host instead.<br />
<br />
==== 3-remote site simple Multi-Master Circular Replication ====<br />
<br />
Simpler config uses "circular replication". This is simpler but results in higher latency for changes as the number of nodes increases. It's also less resilient to network disruptions and node faults.<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie using logical streaming replication<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha using logical streaming replication<br />
<br />
TODO: Regrettably this doesn't actually work yet because we don't cascade logical changes (yet).<br />
<br />
===== Configuration =====<br />
<br />
Using node names that match port numbers, for clarity<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441'<br />
bdr.node_5441.dsn='port=5441 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5442'<br />
bdr.node_5442.dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440'<br />
bdr.node_5440.dsn='port=5440 dbname=postgres'<br />
<br />
This would usually be done in the real world with databases on different hosts, all running on the same port.<br />
<br />
==== 3-remote site Max Availability Multi-Master Plex ====<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha, Echo using logical streaming<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Foxtrot using physical streaming with sync replication<br />
** Server "Foxtrot" - Physical Standby - feeds changes to Alpha, Charlie using logical streaming<br />
<br />
Bandwidth and latency between sites is minimised.<br />
<br />
Config left as an exercise for the reader.<br />
<br />
==== N-site symmetric cluster replication ====<br />
<br />
Symmetric cluster is where all masters are connected to each other.<br />
<br />
N=19 has been tested and works fine.<br />
<br />
N masters requires N-1 connections to other masters, so practical limits are <100 servers, or less if you have many separate databases.<br />
<br />
The amount of work caused by each change is O(N), so there is a much lower practical limit based upon resource limits. A future option to limit to filter rows/tables for replication becomes essential with larger or more heavily updated databases, which is planned.<br />
<br />
==== Complex/Assymetric Replication ====<br />
<br />
Variety of options are possible.<br />
<br />
=== Conflict Avoidance ===<br />
<br />
==== Distributed Locking ====<br />
<br />
Some clustering systems use distributed lock mechanisms to prevent concurrent access to data. These can perform reasonably when servers are very close but cannot support geographically distributed applications as very low latency is critical for acceptable performance.<br />
<br />
Distributed locking is essentially a pessimistic approach, whereas BDR advocates an optimistic approach: avoid conflicts where possible but allow some types of conflict to occur and and resolve them when they arise.<br />
<br />
==== Global Sequences ====<br />
<br />
Many applications require unique values be assigned to database entries. Some applications use GUIDs generated by external programs, some use database-supplied values. This is important with optimistic conflict resolution schemes because uniqueness violations are "divergent errors" and are not easily resolvable.<br />
<br />
The SQL standard requires Sequence objects which provide unique values, though these are isolated to a single node. These can then used to supply default values using <tt>DEFAULT nextval('mysequence')</tt>, as with PostgreSQL's <tt>SERIAL</tt> pseudo-type.<br />
<br />
BDR requires sequences to work together across multiple nodes. This is implemented as a new <tt>SequenceAccessMethod</tt> API (SeqAM), which allows plugins that provide get/set functions for sequences. Global Sequences are then implemented as a plugin which implements the SeqAM API and communicates across nodes to allow new ranges of values to be stored for each sequence.<br />
<br />
=== Conflict Detection & Resolution ===<br />
<br />
Because local writes can occur on a master, conflict detection and avoidance is a concern for basic LLSR setups as well as full BDR configurations.<br />
<br />
==== Lock Conflicts ====<br />
<br />
Changes from the upstream master are applied on the downstream master by a single apply process. That process needs to RowExclusiveLock on the changing table and be able to write lock the changing tuple(s). Concurrent activity will prevent those changes from being immediately applied because of lock waits. Use the <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt> facility to look for issues with apply blocking on locks.<br />
<br />
By concurrent activity on a row, we include <br />
<br />
* explicit row level locking (<tt>SELECT ... FOR UPDATE/FOR SHARE</tt>)<br />
* locking from foreign keys<br />
* implicit locking because of row <tt>UPDATE</tt>s, <tt>INSERT</tt>s or <tt>DELETE</tt>s, either from local activity or apply from other servers<br />
<br />
==== Data Conflicts ====<br />
<br />
Concurrent updates and deletes may also cause data-level conflicts to occur, which then require conflict resolution. It is important that these conflicts are resolved in a consistent and idempotent manner so that all servers end up with identical results.<br />
<br />
Concurrent updates are resolved using last-update-wins strategy using timestamps. Should timestamps be identical, the tie is broken using system identifier from <tt>pg_control</tt> though this may change in a future release.<br />
<br />
<tt>UPDATE</tt>s and <tt>INSERT</tt>s may cause uniqueness violation errors because of primary keys, unique indexes and exclusion constraints when changes are applied at remote nodes. These are not easily resolvable and represent severe application errors that cause the database contents of multiple servers to diverge from each other. Hence these are known as "divergent conflicts". Currently, replication stops should a divergent conflict occur. The errors causing the conflict can be seen in the error log of the downstream master with the problem.<br />
<br />
Updates which cannot locate a row are presumed to be <tt>DELETE</tt>/<tt>UPDATE</tt> conflicts. These are accepted as successful operations but in the case of <tt>UPDATE</tt> the data in the <tt>UPDATE</tt> is discarded.<br />
<br />
All conflicts are resolved at row level. Concurrent updates that touch completely separate columns can result in "false conflicts", where there is conflict in terms of the data, just in terms of the row update. Such conflicts will result in just one of those changes being made, the other discarded according to last update wins. It is not practical to decide when a row should be merged and when a last-update-wins stragegy should be used at the database level; such decision making would require support for application-specific conflict resolution plugins.<br />
<br />
Changing unlogged and logged tables in the same transaction can result in apparently strange outcomes since the unlogged tables aren't replicated.<br />
<br />
==== Examples ====<br />
<br />
As an example, lets say we have two tables Activity and Customer. There is a Foreign Key from Activity to Customer, constraining us to only record activity rows that have a matching customer row. <br />
<br />
* We update a row on Customer table on NodeA. The change from NodeA is applied to NodeB just as we are inserting an activity on NodeB. The inserted activity causes a FK check.... <br />
<br />
<br />
<br />
[[Category:Replication]]</div>Amshttps://wiki.postgresql.org/index.php?title=BDR_User_Guide&diff=19753BDR User Guide2013-05-15T13:45:13Z<p>Ams: Add passing mention of track_commit_timestamp</p>
<hr />
<div>----<br />
This page is the users and administrators guide for BDR. If you're looking for technical details on the project plan and implementation, see [[BDR Project]].<br />
----<br />
<br />
= BDR User Guide =<br />
<br />
BDR (BiDrectional Replication) is a feature being developed for inclusion in PostgreSQL core that provides greatly enhanced replication capabilities.<br />
<br />
BDR allows users to create a geographically distributed multi-master database using Logical Log Streaming Replication (LLSR) transport.<br />
BDR is designed to provide both high availability and geographically distributed disaster recovery capabilities. <br />
<br />
BDR is not “clustering” as some vendors use the term, in that it doesn't have a distributed lock manager, global transaction co-ordinator, etc. Each member server is separate yet connected, with design choices that allow separation between nodes that would not be possible with global transaction coordination.<br />
<br />
Guidance on getting a testing setup established are in [[#Initial setup]]. Please read the full documentation if you intend to put BDR into production.<br />
<br />
== Logical Log Streaming Replication ==<br />
<br />
Logical log streaming replication (LLSR) allows one PostgreSQL master (the "upstream master") to stream a sequence of changes to another read/write PostgreSQL server (the "downstream master"). Data is sent in one direction only over a normal libpq connection.<br />
<br />
Multiple LLSR connections can be used to set up bi-directional replication as discussed later in this guide.<br />
<br />
=== Overview of logical replication ===<br />
<br />
In some ways LLSR is similar to "streaming replication" i.e. physical log streaming replication (PLSR) from a user perspective; both replicate changes from one server to another. However, in LLSR the receiving server is also a full master database that can make changes, unlike the read-only replicas offered by PLSR hot standby. Additionally, LLSR is per-database, whereas PLSR is per-cluster and replicates all databases at once. There are many more differences discussed in the relevant sections of this document.<br />
<br />
In LLSR the data that is replicated is change data in a special format that allows the changes to be logically reconstructed on the downstream master. The changes are generated by reading transaction log (WAL) data, making change capture on the upstream master much more efficient than trigger based replication, hence why we call this "logical log replication". Changes are passed from upstream to downstream using the libpq protocol, just as with physical log streaming replication.<br />
<br />
One connection is required for each PostgreSQL database that is replicated. If two servers are connected, each of which has 50 databases then it would require 50 connections to send changes in one direction, from upstream to downstream. Each database connection must be specified, so it is possible to filter out unwanted databases simply by avoiding configuring replication for those databases.<br />
<br />
Setting up replication for new databases is not (yet?) automatic, so additional configuration steps are required after <tt>CREATE DATABASE</tt>. A restart of the downstream master is also required. The upstream master only needs restarting if the <tt>max_logical_slots</tt> parameter is too low to allow a new replica to be added. Adding replication for databases that do not exist yet will cause an ERROR, as will dropping a database that is being replicated. Setup is discussed in more detail below.<br />
<br />
Changes are processed by the downstream master using <tt>bdr</tt> plug-ins. This allows flexible handing of replication input, including:<br />
<br />
* BDR apply process - applies logical changes to the downstream master. The apply process makes changes directly rather than generating SQL text and then parse/plan/executing SQL.<br />
* Textual output plugin - a demo plugin that generates SQL text (but doesn't apply changes)<br />
* <tt>pg_xlogdump</tt> - examines physical WAL records and produces textual debugging output. This server program is included in PostgreSQL 9.3.<br />
<br />
=== Replication of DML changes ===<br />
<br />
All changes are replicated: <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> and <tt>TRUNCATE</tt>. <br />
<br />
(TRUNCATE is not yet implemented, but will be implemented before the feature goes to final release).<br />
<br />
Actions that generate WAL data but don't represent logical changes do not result in data transfer, e.g. full page writes, VACUUMs, hint bit setting. LLSR avoids much of the overhead from physical WAL, though it has overheads that mean that it doesn't always use less bandwidth than PLSR.<br />
<br />
Locks taken by <tt>LOCK</tt> and <tt>SELECT ... FOR UPDATE/SHARE</tt> on the upstream master are not replicated to downstream masters. Locks taken automatically by <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> or <tt>TRUNCATE</tt> *are* taken on the downstream master and may delay replication apply or concurrent transactions - see [[#Lock Conflicts|Lock Conflicts]].<br />
<br />
<tt>TEMPORARY</tt> and <tt>UNLOGGED</tt> tables are not replicated. In contrast to physical standby servers, downstream masters can use temporary and unlogged tables.<br />
<br />
<tt>DELETE</tt> and <tt>UPDATE</tt> statements that affect multiple rows on upstream master will cause a series of row changes on downstream master. These are likely to go at same speed as on origin, as long as an index is defined on the Primary Key of the table on the downstream master. <tt>INSERT</tt> on upstream master do not require a unique constraint in order to replicate correctly. <tt>UPDATE</tt>s and <tt>DELETE</tt>s require some form of unique constraint, either <tt>PRIMARY KEY</tt> or <tt>UNIQUE NOT NULL</tt>. A warning is issued in the downstream master's logs if the expected constraint is absent.<br />
<br />
<tt>UPDATE</tt>s that change the value of the Primary Key of a table will be replicated correctly.<br />
<br />
The values applied are the final values from the <tt>UPDATE</tt> on the upstream master, including any modifications from before-row triggers, rules or functions. Any reflexive conditions, such as N = N+ 1 are resolved to their final value. Volatile or stable functions are evaluated on the master side and the resulting values are replicated. Consequently any function side-effects (writing files, network socket activity, updating internal PostgreSQL variables, etc) will not occur on the replicas as the functions are not run again on the replica.<br />
<br />
All columns are replicated on each table. Large column values that would be placed in TOAST tables are replicated without problem, avoiding de-compression and re-compression. If we update a row but do not change a TOASTed column value, then that data is not sent downstream.<br />
<br />
All data types are handled, not just the built-in datatypes of PostgreSQL core. The only requirement is that user-defined types are installed identically in both upstream and downstream master (see "Limitations").<br />
<br />
The current LLSR plugin implementation uses the binary libpq protocol, so it requires that the upstream and downstream master use same CPU architecture and word-length, i.e. "identical servers", as with physical replication. A textual output option will be added later for passing data between non-identical servers, e.g. laptops or mobile devices communicating with a central server.<br />
<br />
Changes are accumulated in memory (spilling to disk where required) and then sent to the downstream server at commit time. Aborted transactions are never sent. Application of changes on downstream master is currently single-threaded, though this process is efficiently implemented. Parallel apply is a possible future feature, especially for changes made while holding <tt>AccessExclusiveLock</tt>.<br />
<br />
Changes are applied to the downstream master in the sequence in which they were commited on the upstream master. This is a known-good serialization ordering of changes, so no replication failures are possible, as can happen with statement based replication (e.g. MySQL) or trigger based replication (e.g. Slony version 2.0). Users should note that this means the original order of locking of tables is not maintained. Although lock order is provably not an issue for the set of locks held on upstream master, additional locking on downstream side could cause lock waits or deadlocking in some cases. (Discussed in further detail later).<br />
<br />
Larger transactions spill to disk on the upstream master once they reach a certain size. Currently, large transactions can cause increased latency. Future enhancement will be to stream changes to downstream master once they fill the upstream memory buffer, though this is likely to be implemented in 9.5.<br />
<br />
<tt>SET</tt> statements and parameter settings are not replicated. This has no effect on replication since we only replicate actual changes, not anything at SQL statement level. We always update the correct tables, whatever the setting of <tt>search_path</tt>. Values are replicated correctly irrespective of the values of <tt>bytea_output</tt>, <tt>TimeZone</tt>, <tt>DateStyle</tt>, etc.<br />
<br />
<tt>NOTIFY</tt> is not supported across log based replication, either physical or logical. <tt>NOTIFY</tt> and <tt>LISTEN</tt> will work fine on the upstream master but an upstream <tt>NOTIFY</tt> will not trigger a downstream <tt>LISTEN</tt>er.<br />
<br />
In some cases, additional deadlocks can occur on apply. This causes an automatic retry of the apply of the replaying transaction and is only an issue if the deadlock recurs repeatedly, delaying replication.<br />
<br />
From a performance and concurrency perspective the BDR apply process is similar to a normal backend. Frequent conflicts with locks from other transactions when replaying changes can slow things down and thus increase replication delay, so reducing the frequency of such conflicts can be a good way to speed things up. Any lock held by another transaction on the downstream master - <tt>LOCK</tt> statements, <tt>SELECT ... FOR UPDATE/FOR SHARE</tt>, or <tt>INSERT</tt>/<tt>UPDATE</tt>/<tt>DELETE</tt> row locks - can delay replication if the replication apply process needs to change the locked table/row.<br />
<br />
=== Table definitions and DDL replication ===<br />
<br />
DML changes are replicated between tables with matching <tt>"Schemaname"."Tablename"</tt> on both upstream and downstream masters. e.g. changes from upstream's <tt>public.mytable</tt> will go to downstream's <tt>public.mytable</tt> while changes to the upstream <tt>mychema.mytable</tt> will go to the downstream <tt>myschema.mytable</tt>. This works even when no schema is specified on the original SQL since we identify the changed table from its internal OIDs in WAL records and then map that to whatever internal identifier is used on the downstream node.<br />
<br />
This requires careful synchronization of table definitions on each node otherwise <tt>ERROR</tt>s will be generated by the replication apply process. In general, tables must be an exact match between upstream and downstream masters. <br />
<br />
There are no plans to implement working replication between dissimilar table definitions.<br />
<br />
Tables must meet the following requirements to be compatible for purposes of LLSR:<br />
<br />
* The downstream master must only have constraints (<tt>CHECK</tt>, <tt>UNIQUE</tt>, <tt>EXCLUSION</tt>, <tt>FOREIGN KEY</tt>, etc) that are also present on the upstream master. Replication may initially work with mismatched constraints but is likely to fail as soon as the downstream master rejects a row the upstream master accepted.<br />
* The table referenced by a FOREIGN KEY on a downstream master must have all the keys present in the upstream master version of the same table.<br />
* Storage parameters must match except for as allowed below<br />
* Inheritance must be the same<br />
* Dropped columns on master must be present on replicas<br />
* Custom types and enum definitions must match exactly<br />
* Composite types and enums must have the same oids on master and replication target<br />
* Extensions defining types used in replicated tables must be of the same version or fully SQL-level compatible and the oids of the types they define must match.<br />
<br />
The following differences are permissible between tables on different nodes:<br />
<br />
* The table's <tt>pg_class</tt> oid, the oid of its associated TOAST table, and the oid of the table's rowtype in <tt>pg_type</tt> may differ;<br />
* Extra or missing non-<tt>UNIQUE</tt> indexes<br />
* Extra keys in downstream lookup tables for <tt>FOREIGN KEY</tt> references that are not present on the upstream master<br />
* The table-level storage parameters for fillfactor and autovacuum<br />
* Triggers and rules may differ (they are not executed by replication apply)<br />
<br />
Replication of DDL changes between nodes will be possible using event triggers, but is not yet integrated with LLSR (see [[#LLSR Limitations|LLSR Limitations]]).<br />
<br />
Triggers and Rules are NOT executed by apply on downstream side, equivalent to an enforced setting of <tt>session_replication_role = origin</tt>.<br />
<br />
In future it is expected that composite types and enums with non-identical oids will be converted using text output and input functions. This feature is not yet implemented.<br />
<br />
=== LLSR limitations ===<br />
<br />
The current LLSR implementation is subject to some limitations, which are being progressively removed as work progresses.<br />
<br />
==== Data definition compatibility ====<br />
<br />
Table definitions, types, extensions, etc must be near identical between upstream and downstream masters. See [[#Table definitions and DDL replication|Table definitions and DDL replication]].<br />
<br />
==== DDL Replication ====<br />
<br />
DDL replication is not yet supported.<br />
<br />
==== Upstream feedback ====<br />
<br />
No feedback from downstream masters to the upstream master is implemented for asynchronous LLSR, so upstream masters must be configured to keep enough WAL. See [[#Configuration|Configuration]].<br />
<br />
==== TRUNCATE is not replicated ====<br />
<br />
TRUNCATE is not yet supported, however workarounds with user-level triggers are possible and a ProcessUtility hook is planned to implement a similar approach globally.<br />
<br />
The safest option is to define a user-level BEFORE trigger on each table that RAISEs an ERROR when TRUNCATE is attempted.<br />
<br />
A simple truncate-blocking trigger is:<br />
<br />
CREATE OR REPLACE FUNCTION deny_truncate() RETURNS trigger AS $$<br />
BEGIN<br />
IF tg_op = 'TRUNCATE' THEN<br />
RAISE EXCEPTION 'TRUNCATE is not supported on this table. Please use DELETE FROM.';<br />
ELSE<br />
RAISE EXCEPTION 'This trigger only supports TRUNCATE';<br />
END IF;<br />
END;<br />
$$ LANGUAGE plpgsql;<br />
<br />
It can be applied to a table with:<br />
<br />
CREATE TRIGGER deny_truncate_on_<tablename> BEFORE TRUNCATE ON <tablename><br />
FOR EACH STATEMENT EXECUTE PROCEDURE deny_truncate();<br />
<br />
A PL/PgSQL DO block that queries <tt>pg_class</tt> and loops over it to <tt>EXECUTE</tt> a dynamic SQL <tt>CREATE TRIGGER</tt> command for each table that does not already have the trigger can be used to apply the trigger to all tables.<br />
<br />
=== Initial setup ===<br />
<br />
To set up LLSR or BDR you first need a patched PostgreSQL that can support LLSR/BDR, then you need to create one or more LLSR/BDR senders and one or more LLSR/BDR receivers.<br />
<br />
==== Installing the patched PostgreSQL binaries ====<br />
<br />
Currently BDR is only available in builds of the 'bdr' branch on Andres Freund's git repo on git.postgresql.org. PostgreSQL 9.2 and below do not support BDR, and 9.3 requires patches, so this guide will not work for you if you are trying to use a normal install of PostgreSQL.<br />
<br />
First you need to clone, configure, compile and install like normal. Clone the sources from <tt>git://git.postgresql.org/git/users/andresfreund/postgres.git</tt> and checkout the <tt>bdr</tt> branch.<br />
<br />
If you have an existing local PostgreSQL git tree specify it as <tt>--reference /path/to/existing/tree</tt> to greatly speed your git clone.<br />
<br />
Example:<br />
<br />
mkdir -p $HOME/bdr<br />
cd bdr<br />
git clone git://git.postgresql.org/git/users/andresfreund/postgres.git $HOME/bdr/postgres-bdr-src<br />
cd postgres-bdr-src<br />
./configure --prefix=$HOME/bdr/postgres-bdr-bin<br />
make install<br />
cd contrib/bdr<br />
make install<br />
<br />
This will put everything in <tt>$HOME/bdr</tt>, with the source code and build tree in <tt>$HOME/bdr/postgres-bdr-src</tt> and the installed PostgreSQL in <tt>$HOME/bdr/postgres-bdr-bin</tt>. This is a convenient setup for testing and development because it doesn't require you to set up new users, wrangle permissions, run anything as root, etc, but it isn't recommended that you deploy this way in production.<br />
<br />
To actually use these new binaries you will need to:<br />
<br />
export PATH=$HOME/bdr/postgres-bdr-bin/bin:$PATH<br />
<br />
before running <tt>initdb</tt>, <tt>postgres</tt>, etc. You don't have to use the <tt>psql</tt> or <tt>libpq</tt> you compiled but you're likely to get version mismatch warnings if you don't.<br />
<br />
=== Parameter Reference ===<br />
<br />
The following parameters are new or have been changed in PostgreSQL's new logical streaming replication.<br />
<br />
==== <tt>shared_preload_libraries = ‘bdr’</tt> ====<br />
<br />
To load support for receiving changes on a downstream master, the <tt>bdr</tt> library must be added to the existing ‘shared_preload_libraries’ parameter. This loads the bdr library during postmaster start-up and allows it to create the required background worker(s).<br />
<br />
Upstream masters don't need to load the bdr library unless they're also operating as a downstream master as is the case in a BDR configuration.<br />
<br />
==== <tt>bdr.connections</tt> ====<br />
<br />
A comma-separated list of upstream master connection names is specified in <tt>bdr.connections</tt>. These names must be simple alphanumeric strings. They are used when naming the connection in error messages, configuration options and logs, but are otherwise of no special meaning.<br />
<br />
A typical two-upstream-master setting might be:<br />
<br />
bdr.connections = ‘upstream1, upstream2’<br />
<br />
==== <tt>bdr.&lt;connection_name&gt;.dsn</tt> ====<br />
<br />
Each connection name must have at least a data source name specified using the <tt>bdr.&lt;connection_name&gt;.dsn</tt> parameter. The DSN syntax is the same as that used by libpq so it is not discussed in further detail here. A <tt>dbname</tt> for the database to connect to must be specified; all other parts of the DSN are optional.<br />
<br />
The local (downstream) database name is assumed to be the same as the name of the upstream database being connected to, though future versions will make this configurable.<br />
<br />
For the above two-master setting for <tt>bdr.connections</tt> the DSNs might look like:<br />
<br />
bdr.upstream1.dsn = 'host=10.1.1.2 user=postgres dbname=replicated_db'<br />
bdr.upstream2.dsn = 'host=10.1.1.3 user=postgres dbname=replicated_db'<br />
<br />
==== <tt>max_logical_slots</tt> ====<br />
<br />
The new parameter <tt>max_logical_slots</tt> has been added for use on both upstream and downstream masters. This parameter controls the maximum number of logical replication slots - upstream or downstream - that this cluster may have at a time. It must be set at postmaster start time.<br />
<br />
As logical replication slots are persistent, slots are consumed even by replicas that are not currently connected. Slot management is discussed in Starting, Stopping and Managing Replication.<br />
<br />
<tt>max_logical_slots</tt> should be set to the sum of the number of logical replication upstream masters this server will have plus the number of logical replication downstream masters will connect to it it.<br />
<br />
==== <tt>wal_level = 'logical'</tt> ====<br />
<br />
A new setting, <tt>'logical'</tt>, has been added for the existing <tt>wal_level</tt> parameter. <tt>‘logical’</tt> includes everything that the existing <tt>hot_standby</tt> setting does and adds additional details required for logical changeset decoding to the write-ahead logs. <br />
<br />
This additional information is consumed by the upstream-master-side xlog decoding worker. Downstream masters that do not also act as upstream masters do not require <tt>wal_level</tt> to be increased above the default <tt>'minimal'</tt>.<br />
<br />
<tt>wal_level</tt>, except for the new <tt>'logical'</tt> setting, is [http://www.postgresql.org/docs/current/static/runtime-config-wal.html documented in the main PostgreSQL manual].<br />
<br />
==== <tt>max_wal_senders</tt> ====<br />
<br />
Logical replication hasn't altered the <tt>max_wal_senders</tt> parameter, but it is important in upstream masters for logical replication and BDR because every logical sender consumes a <tt>max_wal_senders</tt> entry.<br />
<br />
You should configure <tt>max_wal_senders</tt> to the sum of the number of physical and logical replicas you want to allow an upstream master to serve. If you intend to use <tt>pg_basebackup</tt> you should add at least two more senders to allow for its use.<br />
<br />
Like <tt>max_logical_slots</tt>, <tt>max_wal_senders</tt> entries don't cost a large amount of memory, so you can overestimate fairly safely.<br />
<br />
<tt>max_wal_senders</tt> is documented in [http://www.postgresql.org/docs/current/static/runtime-config-replication.html the main PostgreSQL documentation].<br />
<br />
==== <tt>wal_keep_segments</tt> ====<br />
<br />
Like <tt>max_wal_senders</tt>, the <tt>wal_keep_segments</tt> parameter isn't directly changed by logical replication but is still important for upstream masters. It is not required on downstream-only masters.<br />
<br />
<tt>wal_keep_segments</tt> should be set to a value that allows for some downtime or unreachable periods for downstream masters and for heavy bursts of write activity on the upstream master. <br />
<br />
Keep in mind that enough disk space must be available for the WAL segments, each of which is 16MB. If you run out of disk space the server will halt until disk space is freed and it may be quite difficult to free space when you can no longer start the server.<br />
<br />
If you exceed the required <tt>wal_keep_segments</tt> and "Insufficient WAL segments retained" error will be reported. See [[#Troubleshooting|Troubleshooting]].<br />
<br />
<tt>wal_keep_segments</tt> is documented in the [http://www.postgresql.org/docs/current/static/runtime-config-replication.html the main PostgreSQL manual].<br />
<br />
==== <tt>track_commit_timestamp</tt> ====<br />
<br />
Setting this parameter to "on" enables commit timestamp tracking, which is used to implement last-UPDATE-wins conflict resolution.<br />
<br />
=== Configuration ===<br />
<br />
Details on individual parameters are described in the [[parameter reference]] section.<br />
<br />
The following configuration is an example of a simple one-way LLSR replication setup - a single upstream master to a single downstream master.<br />
<br />
The upstream master (sender)'s <tt>postgresql.conf</tt> should contain settings like:<br />
<br />
wal_level = 'logical' # Include enough info for logical replication<br />
max_logical_slots = X # Number of LLSR senders + any receivers<br />
max_wal_senders = Y # Y = max_logical_slots plus any physical <br />
# streaming requirements<br />
wal_keep_segments = 5000 # Master must retain enough WAL segments to let <br />
# replicas catch up. Correct value depends on<br />
# rate of writes on master, max replica downtime<br />
# allowable. 5000 segments requires 78GB<br />
# in pg_xlog<br />
<br />
Downstream (receiver) <tt>postgresql.conf</tt>:<br />
<br />
shared_preload_libraries = 'bdr'<br />
<br />
bdr.connections="name_of_upstream_master" # list of upstream master nodenames<br />
bdr.<nodename>.dsn = 'dbname=postgres' # connection string for connection<br />
# from downstream to upstream master<br />
bdr.<nodename>.local_dbname = 'xxx' # optional parameter to cover the case <br />
# where the databasename on upstream <br />
# and downstream master differ. <br />
# (Not yet implemented)<br />
bdr.<nodename>.apply_delay # optional parameter to delay apply of<br />
# transactions, time in milliseconds <br />
bdr.synchronous_commit = ...; # optional parameter to set the<br />
# synchronous_commit parameter the<br />
# apply processes will be using<br />
max_logical_slots = X # set to the number of remotes<br />
<br />
Note that a server can be both sender and receiver, either two servers to each other or more complex configurations like replication chains/trees.<br />
<br />
The upstream (sender) <tt>pg_hba.conf</tt> must be configured to allow the downstream master to connect for replication. Otherwise you'll see errors like the following on the downstream master:<br />
<br />
FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "[local]", user "postgres"<br />
<br />
A suitable <tt>pg_hba.conf</tt> entry for a replication connection from the replica server 10.1.4.8 might be:<br />
<br />
host replication postgres 10.1.4.8/32 trust<br />
<br />
(the user name should match the user name configured in the downstream master's dsn. md5 password authentication is supported.)<br />
<br />
For more details on these parameters, see [[#Parameter Reference|Parameter Reference]].<br />
<br />
=== Troubleshooting ===<br />
<br />
==== Could not access file "bdr": No such file or directory ====<br />
<br />
If you see the error:<br />
<br />
FATAL: could not access file "bdr": No such file or directory<br />
<br />
when starting a database set up to receive BDR replication, you probably forgot to install <tt>contrib/bdr</tt>. See above.<br />
<br />
==== Invalid value for parameter ====<br />
<br />
An error like:<br />
<br />
LOG: invalid value for parameter ...<br />
<br />
when setting one of these parameters means your server doesn't support logical replication and will need to be patched or updated.<br />
<br />
==== Insufficient WAL segments retained ("requested WAL segment ... has already been removed") ====<br />
<br />
If <tt>wal_keep_segments</tt> is insufficient to meet the requirements of a replica that has fallen far behind, the master will report errors like:<br />
<br />
ERROR: requested WAL segment 00000001000000010000002D has already been removed<br />
<br />
Currently the replica errors look like:<br />
<br />
WARNING: Starting logical replication<br />
LOG: data stream ended<br />
LOG: worker process: master (PID 23812) exited with exit code 0<br />
LOG: starting background worker process "master"<br />
LOG: master initialized on master, remote dbname=master port=5434 replication=true fallback_application_name=bdr<br />
LOG: local sysid 5873181566046043070, remote: 5873181102189050714<br />
LOG: found valid replication identifier 1<br />
LOG: starting up replication at 1 from 1/2D9CA220<br />
<br />
but a more explicit error message for this condition is planned.<br />
<br />
The only way to recover from this fault is to re-seed the replica database.<br />
<br />
This fault could be prevented with feedback from the replica to the master, but this feature is not planned for the first release of BDR. Another alternative considered for future releases is making wal_keep_segments a dynamic parameter that is sized on demand.<br />
<br />
Monitoring of maximum replica lag and appropriate adjustment of wal_keep_segments will prevent this fault from arising.<br />
<br />
==== Couldn't find logical slot ====<br />
<br />
An error like:<br />
<br />
ERROR: couldn't find logical slot "bdr: 16384:5873181566046043070-1-24596:"<br />
<br />
on the upstream master suggests that a downstream master is trying to connect to a logical replication slot that no longer exists. The slot can not be re-created, so it is necessary to re-seed the downstream replica database.<br />
<br />
=== Operational Issues and Debugging ===<br />
<br />
In LLSR there are no user-level (ie SQL visible) ERRORs that have special meaning. Any ERRORs generated are likely to be serious problems of some kind, apart from apply deadlocks, which are automatically re-tried.<br />
<br />
=== Monitoring ===<br />
<br />
The following views are available for monitoring replication activity:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE pg_stat_replication]</tt><br />
* <tt>pg_stat_logical_replication</tt> (described below)<br />
* <tt>pg_stat_bdr</tt> (described below)<br />
<br />
The following configuration and logging parameters are useful for monitoring replication:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt><br />
<br />
==== pg_stat_logical_replication ====<br />
<br />
The new <tt>pg_stat_logical_replication</tt> view is specific to logical replication. It is based on the underlying <tt>pg_stat_get_logical_replication_slots</tt> function and has the following structure:<br />
<br />
View "pg_catalog.pg_stat_logical_replication"<br />
Column | Type | Modifiers <br />
--------------------------+---------+-----------<br />
slot_name | text | <br />
plugin | text | <br />
database | oid | <br />
active | boolean | <br />
xmin | xid | <br />
last_required_checkpoint | text | <br />
<br />
It contains one row for every connection from a downstream master to the server being queried (the upstream master). On a standalone PostgreSQL server or a downstream-only master this view will contain no rows.<br />
<br />
* <tt>slot_name</tt>: An internal name for a given logical replication slot (a connection from a downstream master to this upstream master). This slot name is used by the downstream master to uniquely identify its self and is used with the <tt>pg_receivellog</tt> command when managing logical replication slots. The slot name is composed of the decoding plugin name, the upstream database oid, the downstream system identifier (from <tt>pg_control</tt>), the downstream slot number, and the downstream database oid.<br />
<br />
* <tt>plugin</tt>: The logical replication plugin being used to decode WAL archives. You'll generally only see <tt>bdr_output</tt> here.<br />
<br />
* <tt>database</tt>: The oid of the database being replicated by this slot. You can get the database name by joining on <tt>pg_database.oid</tt>.<br />
<br />
* <tt>active</tt>: Whether this slot currently has an active connection.<br />
<br />
* <tt>xmin</tt>: The lowest transaction ID this replication slot can "see", like the xmin of a transaction or prepared transaction. xmin should keep on advancing as replication continues.<br />
<br />
* <tt>last_required_checkpoint</tt>: The checkpoint identifying the oldest WAL record required to bring this slot up to date with the upstream master. (This column is likely to be removed in a future version).<br />
<br />
==== pg_stat_bdr ====<br />
<br />
The <tt>pg_stat_bdr</tt> view is supplied by the <tt>bdr</tt> extension. It provides information on a server's connection(s) to its upstream master(s). It is not present on upstream-only masters.<br />
<br />
The primary purpose of this view is to report statistics on the progress of LLSR apply on a per-upstream master connection basis.<br />
<br />
View structure:<br />
<br />
View "public.pg_stat_bdr"<br />
Column | Type | Modifiers <br />
--------------------+--------+-----------<br />
rep_node_id | oid | <br />
riremotesysid | name | <br />
riremotedb | oid | <br />
rilocaldb | oid | <br />
nr_commit | bigint | <br />
nr_rollback | bigint | <br />
nr_insert | bigint | <br />
nr_insert_conflict | bigint | <br />
nr_update | bigint | <br />
nr_update_conflict | bigint | <br />
nr_delete | bigint | <br />
nr_delete_conflict | bigint | <br />
nr_disconnect | bigint | <br />
<br />
Fields:<br />
<br />
* <tt>rep_node_id</tt>: An internal identifier for the replication slot.<br />
<br />
* <tt>riremotesysid</tt>: The remote database system identifier, as reported by the <tt>Database system identifier</tt> line of <tt>pg_controldata /path/to/datadir</tt><br />
<br />
* <tt>riremotedb</tt>: The remote database OID, ie the <tt>oid</tt> column of the remote server's <tt>pg_catalog.pg_database</tt> entry for the replicated database. You can get the database name with <tt>select datname from pg_database where oid = 12345</tt> (where '12345' is the <tt>riremotedb</tt> oid).<br />
<br />
* <tt>rilocaldb </tt>: The local database OID, with the same meaning as <tt>riremotedb</tt> but with oids from the local system.<br />
<br />
''The rest of the rows are statistics about this upstream master slot'':<br />
<br />
* <tt>nr_commit</tt>: Number of commits applied to date from this master<br />
<br />
* <tt>nr_rollback</tt>: Number of rollbacks performed by this apply process due to recoverable errors (deadlock retries, lost races, etc) or unrecoverable errors like mismatched constraint errors.<br />
<br />
* <tt>nr_insert</tt>: Number of <tt>INSERT</tt>s performed<br />
<br />
* <tt>nr_insert_conflict</tt>: Number of <tt>INSERT</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_update</tt>: Number of <tt>UPDATE</tt>s performed<br />
<br />
* <tt>nr_update_conflict</tt>: Number of <tt>UPDATE</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_delete</tt>: Number of deletes performed<br />
<br />
* <tt>nr_delete_conflict</tt>: Number of deletes that resulted in conflicts.<br />
<br />
* <tt>nr_disconnect</tt>: Number of times this apply process has lost its connection to the upstream master since it was started.<br />
<br />
<br />
This view does not contain any information about how far behind the upstream master this downstream master is. The upstream master's <tt>pg_stat_logical_replication</tt> and <tt>pg_stat_replication</tt> views must be queried to determine replication lag.<br />
<br />
==== Monitoring replication status and lag ====<br />
<br />
As with any replication setup, it is vital to monitor replication status on all BDR nodes to ensure no node is lagging severely behind the others or is stuck.<br />
<br />
In the case of BDR a stuck or crashed node will eventually cause disk space and table bloat problems on other masters so stuck nodes should be detected and removed or repaired in a reasonably timely manner. Exactly how urgent this is depends on the workload of the BDR group.<br />
<br />
The <tt>pg_stat_logical_replication</tt> view described above may be used to verify that a downstream master is connected to its upstream master - the <tt>active</tt> boolean column is <tt>t</tt> if there's a downstream master connected.<br />
<br />
The <tt>xmin</tt> column provides an indication of whether replication is advancing; it should increase as replication progresses. There is no simple way to turn <tt>xmin</tt> into the time the last applied transaction was committed on the master, so it doesn't provide an indication of wall-clock lag.<br />
<br />
To determine wall-clock replication lag an application-level ticker may be used to periodically update a timestamp in a replicated table. The difference between this timestamp on the upstream and downstream masters provides the wall-clock replication lag. For BDR one row may be added to the table for each BDR master, giving an indication of how much lag each master has relative to each other master.<br />
<br />
=== Table and index usage statistics ===<br />
<br />
Statistics on table and index usage are updated normally by the downstream master. This is essential for correct function of auto-vacuum. If there are no local writes on the downstream master and stats have not been reset these two views should show matching results between upstream and downstream:<br />
<br />
* <tt>pg_stat_user_tables</tt><br />
* <tt>pg_statio_user_tables</tt><br />
<br />
Since indexes are used to apply changes, the identifying indexes on downstream side may appear more heavily used with workloads that perform <tt>UPDATE</tt>s and <tt>DELETE</tt>s than non-identifying indexes are. <br />
<br />
The built-in index monitoring views are:<br />
<br />
* <tt>pg_stat_user_indexes</tt><br />
* <tt>pg_statio_user_indexes</tt><br />
<br />
All these views are discussed in [http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE the PostgreSQL documentation on the statistics views].<br />
<br />
=== Starting, stopping and managing replication ===<br />
<br />
Replication is managed with the <tt>postgresql.conf</tt> settings described in "Parameter Reference" and "Configuration" above, and using the <tt>pg_receivellog</tt> utility command.<br />
<br />
==== Starting a new LLSR connection ====<br />
<br />
Logical replication is started automatically when a database is configured as a downstream master in <tt>postgresql.conf</tt> (see [[#Configuration|Configuration]]) and the postmaster is started. No explicit action is required to start replication, but replication will not actually work unless the upstream and downstream databases are identical within the requirements set by LLSR in the [[#Table definitions and DDL replication||Table definitions and DDL replication]] section.<br />
<br />
<tt>pg_dump</tt> and <tt>pg_restore</tt> may be used to set up the new replica's database.<br />
<br />
==== Viewing logical replication slots ====<br />
<br />
Examining the state of logical replication is discussed in [[#Monitoring|Monitoring]].<br />
<br />
==== Temporarily stopping an LLSR replica ====<br />
<br />
LLSR replicas can be temporarily stopped by shutting down the downstream master's postmaster.<br />
<br />
If the replica is not started back up before the upstream master discards the oldest WAL segment required for the downstream master to resume replay, as identified by the <tt>last_required_checkpoint</tt> column of <tt>pg_catalog.pg_stat_logical_replication</tt> then the replica will not resume replay. The error [[#Insufficient_WAL_segments_retained_.28.22requested_WAL_segment_..._has_already_been_removed.22.29|Insufficient WAL segments retained]] will be reported in the upstream master's logs. The replica must be re-created for replication to continue.<br />
<br />
==== Removing an LLSR replica permanently ====<br />
<br />
To remove a replication connection permanently, remove its entries from the downstream master's <tt>postgresql.conf</tt>, restart the downstream master, then use <tt>pg_receivellog</tt> to remove the replication slot on the upstream master.<br />
<br />
It is important to remove the replication slot from the upstream master(s) to prevent xid wrap-around problems and issues with table bloat caused by delayed vacuum.<br />
<br />
==== Cleaning up abandoned replication slots ====<br />
<br />
To remove a replication slot that was used for a now-defunct replica, find its slot name from the <tt>[[#pg_stat_logical_replication|pg_stat_logical_replication]]</tt> view on the upstream master then run:<br />
<br />
pg_receivellog -p 5434 -h master-hostname -d dbname \<br />
--slot='bdr: 16384:5873181566046043070-1-16384:' --stop<br />
<br />
where the argument to '--slot' is the slot name you found from the view.<br />
<br />
You may need to do this if you've created and then deleted several replicas so <tt>max_logical_slots</tt> has filled up with entries for replicas that no longer exist.<br />
<br />
== Bi-Directional Replication ==<br />
<br />
Bi-Directional replication is built directly on LLSR by configuring two or more servers as both upstream ''and'' downstream masters of each other.<br />
<br />
All of the Log Level Streaming Replication documentation applies to BDR and should be read before moving on to reading about and configuring BDR.<br />
<br />
=== Bi-Directional Replication Use Cases ===<br />
<br />
Bi-Directional Replication is designed to allow a very wide range of server connection topologies. The simplest to understand would be two servers each sending their changes to the other, which would be produced by making each server the downstream master of the other and so using two connections for each database.<br />
<br />
Logical and physical streaming replication are designed to work side-by-side. This means that a master can be replicating using physical streaming replication to a local standby server, while at the same time replicating logical changes to a remote downstream master. Logical replication works alongside cascading replication also, so a physical standby can feed changes to a downstream master, allowing upstream master sending to physical standby sending to downstream master.<br />
<br />
==== Simple multi-master pair ====<br />
<br />
A simple mulit-master "HA Cluster" with two servers:<br />
<br />
* Server "Alpha" - Master<br />
* Server "Bravo" - Master<br />
<br />
===== Configuration =====<br />
<br />
Alpha:<br />
<br />
wal_level = 'logical'<br />
max_logical_slots = 3<br />
max_wal_senders = 4<br />
wal_keep_segments = 5000<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="bravo"<br />
bdr.bravo.dsn = 'dbname=dbtoreplicate'<br />
<br />
Bravo:<br />
<br />
wal_level = 'logical'<br />
max_logical_slots = 3<br />
max_wal_senders = 4<br />
wal_keep_segments = 5000<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="alpha"<br />
bdr.alpha.dsn = 'dbname=dbtoreplicate'<br />
<br />
See [[#Configuration|Configuration]] for an explanation of these parameters.<br />
<br />
==== HA and Logical Standby ====<br />
Downstream masters allow users to create temporary tables, so they can be used as reporting servers.<br />
<br />
"HA Cluster":<br />
<br />
* Server "Alpha" - Current Master<br />
* Server "Bravo" - Physical Standby - unused, apart from as failover target for Alpha - potentially specified in synchronous_standby_names<br />
* Server "Charlie" - "Logical Standby" - downstream master<br />
<br />
==== Very High Availability Multi-Master ====<br />
A typical configuration for remote multi-master would then be:<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha using logical streaming<br />
<br />
Bandwidth between Site 1 and Site 2 is minimised<br />
<br />
==== 3-remote site simple Multi-Master Plex ====<br />
<br />
BDR supports "all to all" connections, so the latency for any change being applied on other masters is minimised. (Note that early designs of multi-master were arranged for circular replication, which has latency issues with larger numbers of nodes)<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Alpha, Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha, Charlie using logical streaming replication<br />
<br />
===== Configuration =====<br />
<br />
If you wanted to test this configuration locally you could run three PostgreSQL instances on different ports. Such a configuration would look like the following if the port numbers were used as node names for the sake of notational clarity:<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441,node_5442'<br />
bdr.node_5441.dsn='port=5441 dbname=postgres'<br />
bdr.node_5442.dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5440,node_5442'<br />
bdr.node_5440.dsn='port=5440 dbname=postgres'<br />
bdr.node_5442.dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440,node_5441'<br />
bdr.node_5440.dsn='port=5440 dbname=postgres'<br />
bdr.node_5441.dsn='port=5441 dbname=postgres'<br />
<br />
In a typical real-world configuration each server would be on the same port on a different host instead.<br />
<br />
==== 3-remote site simple Multi-Master Circular Replication ====<br />
<br />
Simpler config uses "circular replication". This is simpler but results in higher latency for changes as the number of nodes increases. It's also less resilient to network disruptions and node faults.<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie using logical streaming replication<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha using logical streaming replication<br />
<br />
TODO: Regrettably this doesn't actually work yet because we don't cascade logical changes (yet).<br />
<br />
===== Configuration =====<br />
<br />
Using node names that match port numbers, for clarity<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441'<br />
bdr.node_5441.dsn='port=5441 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5442'<br />
bdr.node_5442.dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440'<br />
bdr.node_5440.dsn='port=5440 dbname=postgres'<br />
<br />
This would usually be done in the real world with databases on different hosts, all running on the same port.<br />
<br />
==== 3-remote site Max Availability Multi-Master Plex ====<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha, Echo using logical streaming<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Foxtrot using physical streaming with sync replication<br />
** Server "Foxtrot" - Physical Standby - feeds changes to Alpha, Charlie using logical streaming<br />
<br />
Bandwidth and latency between sites is minimised.<br />
<br />
Config left as an exercise for the reader.<br />
<br />
==== N-site symmetric cluster replication ====<br />
<br />
Symmetric cluster is where all masters are connected to each other.<br />
<br />
N=19 has been tested and works fine.<br />
<br />
N masters requires N-1 connections to other masters, so practical limits are <100 servers, or less if you have many separate databases.<br />
<br />
The amount of work caused by each change is O(N), so there is a much lower practical limit based upon resource limits. A future option to limit to filter rows/tables for replication becomes essential with larger or more heavily updated databases, which is planned.<br />
<br />
==== Complex/Assymetric Replication ====<br />
<br />
Variety of options are possible.<br />
<br />
=== Conflict Avoidance ===<br />
<br />
==== Distributed Locking ====<br />
<br />
Some clustering systems use distributed lock mechanisms to prevent concurrent access to data. These can perform reasonably when servers are very close but cannot support geographically distributed applications as very low latency is critical for acceptable performance.<br />
<br />
Distributed locking is essentially a pessimistic approach, whereas BDR advocates an optimistic approach: avoid conflicts where possible but allow some types of conflict to occur and and resolve them when they arise.<br />
<br />
==== Global Sequences ====<br />
<br />
Many applications require unique values be assigned to database entries. Some applications use GUIDs generated by external programs, some use database-supplied values. This is important with optimistic conflict resolution schemes because uniqueness violations are "divergent errors" and are not easily resolvable.<br />
<br />
The SQL standard requires Sequence objects which provide unique values, though these are isolated to a single node. These can then used to supply default values using <tt>DEFAULT nextval('mysequence')</tt>, as with PostgreSQL's <tt>SERIAL</tt> pseudo-type.<br />
<br />
BDR requires sequences to work together across multiple nodes. This is implemented as a new <tt>SequenceAccessMethod</tt> API (SeqAM), which allows plugins that provide get/set functions for sequences. Global Sequences are then implemented as a plugin which implements the SeqAM API and communicates across nodes to allow new ranges of values to be stored for each sequence.<br />
<br />
=== Conflict Detection & Resolution ===<br />
<br />
Because local writes can occur on a master, conflict detection and avoidance is a concern for basic LLSR setups as well as full BDR configurations.<br />
<br />
==== Lock Conflicts ====<br />
<br />
Changes from the upstream master are applied on the downstream master by a single apply process. That process needs to RowExclusiveLock on the changing table and be able to write lock the changing tuple(s). Concurrent activity will prevent those changes from being immediately applied because of lock waits. Use the <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt> facility to look for issues with apply blocking on locks.<br />
<br />
By concurrent activity on a row, we include <br />
<br />
* explicit row level locking (<tt>SELECT ... FOR UPDATE/FOR SHARE</tt>)<br />
* locking from foreign keys<br />
* implicit locking because of row <tt>UPDATE</tt>s, <tt>INSERT</tt>s or <tt>DELETE</tt>s, either from local activity or apply from other servers<br />
<br />
==== Data Conflicts ====<br />
<br />
Concurrent updates and deletes may also cause data-level conflicts to occur, which then require conflict resolution. It is important that these conflicts are resolved in a consistent and idempotent manner so that all servers end up with identical results.<br />
<br />
Concurrent updates are resolved using last-update-wins strategy using timestamps. Should timestamps be identical, the tie is broken using system identifier from <tt>pg_control</tt> though this may change in a future release.<br />
<br />
<tt>UPDATE</tt>s and <tt>INSERT</tt>s may cause uniqueness violation errors because of primary keys, unique indexes and exclusion constraints when changes are applied at remote nodes. These are not easily resolvable and represent severe application errors that cause the database contents of multiple servers to diverge from each other. Hence these are known as "divergent conflicts". Currently, replication stops should a divergent conflict occur. The errors causing the conflict can be seen in the error log of the downstream master with the problem.<br />
<br />
Updates which cannot locate a row are presumed to be <tt>DELETE</tt>/<tt>UPDATE</tt> conflicts. These are accepted as successful operations but in the case of <tt>UPDATE</tt> the data in the <tt>UPDATE</tt> is discarded.<br />
<br />
All conflicts are resolved at row level. Concurrent updates that touch completely separate columns can result in "false conflicts", where there is conflict in terms of the data, just in terms of the row update. Such conflicts will result in just one of those changes being made, the other discarded according to last update wins. It is not practical to decide when a row should be merged and when a last-update-wins stragegy should be used at the database level; such decision making would require support for application-specific conflict resolution plugins.<br />
<br />
Changing unlogged and logged tables in the same transaction can result in apparently strange outcomes since the unlogged tables aren't replicated.<br />
<br />
==== Examples ====<br />
<br />
As an example, lets say we have two tables Activity and Customer. There is a Foreign Key from Activity to Customer, constraining us to only record activity rows that have a matching customer row. <br />
<br />
* We update a row on Customer table on NodeA. The change from NodeA is applied to NodeB just as we are inserting an activity on NodeB. The inserted activity causes a FK check.... <br />
<br />
<br />
<br />
[[Category:Replication]]</div>Amshttps://wiki.postgresql.org/index.php?title=BDR_User_Guide&diff=19752BDR User Guide2013-05-15T13:43:40Z<p>Ams: s/LSLR/LLSR/g -- was a typo</p>
<hr />
<div>----<br />
This page is the users and administrators guide for BDR. If you're looking for technical details on the project plan and implementation, see [[BDR Project]].<br />
----<br />
<br />
= BDR User Guide =<br />
<br />
BDR (BiDrectional Replication) is a feature being developed for inclusion in PostgreSQL core that provides greatly enhanced replication capabilities.<br />
<br />
BDR allows users to create a geographically distributed multi-master database using Logical Log Streaming Replication (LLSR) transport.<br />
BDR is designed to provide both high availability and geographically distributed disaster recovery capabilities. <br />
<br />
BDR is not “clustering” as some vendors use the term, in that it doesn't have a distributed lock manager, global transaction co-ordinator, etc. Each member server is separate yet connected, with design choices that allow separation between nodes that would not be possible with global transaction coordination.<br />
<br />
Guidance on getting a testing setup established are in [[#Initial setup]]. Please read the full documentation if you intend to put BDR into production.<br />
<br />
== Logical Log Streaming Replication ==<br />
<br />
Logical log streaming replication (LLSR) allows one PostgreSQL master (the "upstream master") to stream a sequence of changes to another read/write PostgreSQL server (the "downstream master"). Data is sent in one direction only over a normal libpq connection.<br />
<br />
Multiple LLSR connections can be used to set up bi-directional replication as discussed later in this guide.<br />
<br />
=== Overview of logical replication ===<br />
<br />
In some ways LLSR is similar to "streaming replication" i.e. physical log streaming replication (PLSR) from a user perspective; both replicate changes from one server to another. However, in LLSR the receiving server is also a full master database that can make changes, unlike the read-only replicas offered by PLSR hot standby. Additionally, LLSR is per-database, whereas PLSR is per-cluster and replicates all databases at once. There are many more differences discussed in the relevant sections of this document.<br />
<br />
In LLSR the data that is replicated is change data in a special format that allows the changes to be logically reconstructed on the downstream master. The changes are generated by reading transaction log (WAL) data, making change capture on the upstream master much more efficient than trigger based replication, hence why we call this "logical log replication". Changes are passed from upstream to downstream using the libpq protocol, just as with physical log streaming replication.<br />
<br />
One connection is required for each PostgreSQL database that is replicated. If two servers are connected, each of which has 50 databases then it would require 50 connections to send changes in one direction, from upstream to downstream. Each database connection must be specified, so it is possible to filter out unwanted databases simply by avoiding configuring replication for those databases.<br />
<br />
Setting up replication for new databases is not (yet?) automatic, so additional configuration steps are required after <tt>CREATE DATABASE</tt>. A restart of the downstream master is also required. The upstream master only needs restarting if the <tt>max_logical_slots</tt> parameter is too low to allow a new replica to be added. Adding replication for databases that do not exist yet will cause an ERROR, as will dropping a database that is being replicated. Setup is discussed in more detail below.<br />
<br />
Changes are processed by the downstream master using <tt>bdr</tt> plug-ins. This allows flexible handing of replication input, including:<br />
<br />
* BDR apply process - applies logical changes to the downstream master. The apply process makes changes directly rather than generating SQL text and then parse/plan/executing SQL.<br />
* Textual output plugin - a demo plugin that generates SQL text (but doesn't apply changes)<br />
* <tt>pg_xlogdump</tt> - examines physical WAL records and produces textual debugging output. This server program is included in PostgreSQL 9.3.<br />
<br />
=== Replication of DML changes ===<br />
<br />
All changes are replicated: <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> and <tt>TRUNCATE</tt>. <br />
<br />
(TRUNCATE is not yet implemented, but will be implemented before the feature goes to final release).<br />
<br />
Actions that generate WAL data but don't represent logical changes do not result in data transfer, e.g. full page writes, VACUUMs, hint bit setting. LLSR avoids much of the overhead from physical WAL, though it has overheads that mean that it doesn't always use less bandwidth than PLSR.<br />
<br />
Locks taken by <tt>LOCK</tt> and <tt>SELECT ... FOR UPDATE/SHARE</tt> on the upstream master are not replicated to downstream masters. Locks taken automatically by <tt>INSERT</tt>, <tt>UPDATE</tt>, <tt>DELETE</tt> or <tt>TRUNCATE</tt> *are* taken on the downstream master and may delay replication apply or concurrent transactions - see [[#Lock Conflicts|Lock Conflicts]].<br />
<br />
<tt>TEMPORARY</tt> and <tt>UNLOGGED</tt> tables are not replicated. In contrast to physical standby servers, downstream masters can use temporary and unlogged tables.<br />
<br />
<tt>DELETE</tt> and <tt>UPDATE</tt> statements that affect multiple rows on upstream master will cause a series of row changes on downstream master. These are likely to go at same speed as on origin, as long as an index is defined on the Primary Key of the table on the downstream master. <tt>INSERT</tt> on upstream master do not require a unique constraint in order to replicate correctly. <tt>UPDATE</tt>s and <tt>DELETE</tt>s require some form of unique constraint, either <tt>PRIMARY KEY</tt> or <tt>UNIQUE NOT NULL</tt>. A warning is issued in the downstream master's logs if the expected constraint is absent.<br />
<br />
<tt>UPDATE</tt>s that change the value of the Primary Key of a table will be replicated correctly.<br />
<br />
The values applied are the final values from the <tt>UPDATE</tt> on the upstream master, including any modifications from before-row triggers, rules or functions. Any reflexive conditions, such as N = N+ 1 are resolved to their final value. Volatile or stable functions are evaluated on the master side and the resulting values are replicated. Consequently any function side-effects (writing files, network socket activity, updating internal PostgreSQL variables, etc) will not occur on the replicas as the functions are not run again on the replica.<br />
<br />
All columns are replicated on each table. Large column values that would be placed in TOAST tables are replicated without problem, avoiding de-compression and re-compression. If we update a row but do not change a TOASTed column value, then that data is not sent downstream.<br />
<br />
All data types are handled, not just the built-in datatypes of PostgreSQL core. The only requirement is that user-defined types are installed identically in both upstream and downstream master (see "Limitations").<br />
<br />
The current LLSR plugin implementation uses the binary libpq protocol, so it requires that the upstream and downstream master use same CPU architecture and word-length, i.e. "identical servers", as with physical replication. A textual output option will be added later for passing data between non-identical servers, e.g. laptops or mobile devices communicating with a central server.<br />
<br />
Changes are accumulated in memory (spilling to disk where required) and then sent to the downstream server at commit time. Aborted transactions are never sent. Application of changes on downstream master is currently single-threaded, though this process is efficiently implemented. Parallel apply is a possible future feature, especially for changes made while holding <tt>AccessExclusiveLock</tt>.<br />
<br />
Changes are applied to the downstream master in the sequence in which they were commited on the upstream master. This is a known-good serialization ordering of changes, so no replication failures are possible, as can happen with statement based replication (e.g. MySQL) or trigger based replication (e.g. Slony version 2.0). Users should note that this means the original order of locking of tables is not maintained. Although lock order is provably not an issue for the set of locks held on upstream master, additional locking on downstream side could cause lock waits or deadlocking in some cases. (Discussed in further detail later).<br />
<br />
Larger transactions spill to disk on the upstream master once they reach a certain size. Currently, large transactions can cause increased latency. Future enhancement will be to stream changes to downstream master once they fill the upstream memory buffer, though this is likely to be implemented in 9.5.<br />
<br />
<tt>SET</tt> statements and parameter settings are not replicated. This has no effect on replication since we only replicate actual changes, not anything at SQL statement level. We always update the correct tables, whatever the setting of <tt>search_path</tt>. Values are replicated correctly irrespective of the values of <tt>bytea_output</tt>, <tt>TimeZone</tt>, <tt>DateStyle</tt>, etc.<br />
<br />
<tt>NOTIFY</tt> is not supported across log based replication, either physical or logical. <tt>NOTIFY</tt> and <tt>LISTEN</tt> will work fine on the upstream master but an upstream <tt>NOTIFY</tt> will not trigger a downstream <tt>LISTEN</tt>er.<br />
<br />
In some cases, additional deadlocks can occur on apply. This causes an automatic retry of the apply of the replaying transaction and is only an issue if the deadlock recurs repeatedly, delaying replication.<br />
<br />
From a performance and concurrency perspective the BDR apply process is similar to a normal backend. Frequent conflicts with locks from other transactions when replaying changes can slow things down and thus increase replication delay, so reducing the frequency of such conflicts can be a good way to speed things up. Any lock held by another transaction on the downstream master - <tt>LOCK</tt> statements, <tt>SELECT ... FOR UPDATE/FOR SHARE</tt>, or <tt>INSERT</tt>/<tt>UPDATE</tt>/<tt>DELETE</tt> row locks - can delay replication if the replication apply process needs to change the locked table/row.<br />
<br />
=== Table definitions and DDL replication ===<br />
<br />
DML changes are replicated between tables with matching <tt>"Schemaname"."Tablename"</tt> on both upstream and downstream masters. e.g. changes from upstream's <tt>public.mytable</tt> will go to downstream's <tt>public.mytable</tt> while changes to the upstream <tt>mychema.mytable</tt> will go to the downstream <tt>myschema.mytable</tt>. This works even when no schema is specified on the original SQL since we identify the changed table from its internal OIDs in WAL records and then map that to whatever internal identifier is used on the downstream node.<br />
<br />
This requires careful synchronization of table definitions on each node otherwise <tt>ERROR</tt>s will be generated by the replication apply process. In general, tables must be an exact match between upstream and downstream masters. <br />
<br />
There are no plans to implement working replication between dissimilar table definitions.<br />
<br />
Tables must meet the following requirements to be compatible for purposes of LLSR:<br />
<br />
* The downstream master must only have constraints (<tt>CHECK</tt>, <tt>UNIQUE</tt>, <tt>EXCLUSION</tt>, <tt>FOREIGN KEY</tt>, etc) that are also present on the upstream master. Replication may initially work with mismatched constraints but is likely to fail as soon as the downstream master rejects a row the upstream master accepted.<br />
* The table referenced by a FOREIGN KEY on a downstream master must have all the keys present in the upstream master version of the same table.<br />
* Storage parameters must match except for as allowed below<br />
* Inheritance must be the same<br />
* Dropped columns on master must be present on replicas<br />
* Custom types and enum definitions must match exactly<br />
* Composite types and enums must have the same oids on master and replication target<br />
* Extensions defining types used in replicated tables must be of the same version or fully SQL-level compatible and the oids of the types they define must match.<br />
<br />
The following differences are permissible between tables on different nodes:<br />
<br />
* The table's <tt>pg_class</tt> oid, the oid of its associated TOAST table, and the oid of the table's rowtype in <tt>pg_type</tt> may differ;<br />
* Extra or missing non-<tt>UNIQUE</tt> indexes<br />
* Extra keys in downstream lookup tables for <tt>FOREIGN KEY</tt> references that are not present on the upstream master<br />
* The table-level storage parameters for fillfactor and autovacuum<br />
* Triggers and rules may differ (they are not executed by replication apply)<br />
<br />
Replication of DDL changes between nodes will be possible using event triggers, but is not yet integrated with LLSR (see [[#LLSR Limitations|LLSR Limitations]]).<br />
<br />
Triggers and Rules are NOT executed by apply on downstream side, equivalent to an enforced setting of <tt>session_replication_role = origin</tt>.<br />
<br />
In future it is expected that composite types and enums with non-identical oids will be converted using text output and input functions. This feature is not yet implemented.<br />
<br />
=== LLSR limitations ===<br />
<br />
The current LLSR implementation is subject to some limitations, which are being progressively removed as work progresses.<br />
<br />
==== Data definition compatibility ====<br />
<br />
Table definitions, types, extensions, etc must be near identical between upstream and downstream masters. See [[#Table definitions and DDL replication|Table definitions and DDL replication]].<br />
<br />
==== DDL Replication ====<br />
<br />
DDL replication is not yet supported.<br />
<br />
==== Upstream feedback ====<br />
<br />
No feedback from downstream masters to the upstream master is implemented for asynchronous LLSR, so upstream masters must be configured to keep enough WAL. See [[#Configuration|Configuration]].<br />
<br />
==== TRUNCATE is not replicated ====<br />
<br />
TRUNCATE is not yet supported, however workarounds with user-level triggers are possible and a ProcessUtility hook is planned to implement a similar approach globally.<br />
<br />
The safest option is to define a user-level BEFORE trigger on each table that RAISEs an ERROR when TRUNCATE is attempted.<br />
<br />
A simple truncate-blocking trigger is:<br />
<br />
CREATE OR REPLACE FUNCTION deny_truncate() RETURNS trigger AS $$<br />
BEGIN<br />
IF tg_op = 'TRUNCATE' THEN<br />
RAISE EXCEPTION 'TRUNCATE is not supported on this table. Please use DELETE FROM.';<br />
ELSE<br />
RAISE EXCEPTION 'This trigger only supports TRUNCATE';<br />
END IF;<br />
END;<br />
$$ LANGUAGE plpgsql;<br />
<br />
It can be applied to a table with:<br />
<br />
CREATE TRIGGER deny_truncate_on_<tablename> BEFORE TRUNCATE ON <tablename><br />
FOR EACH STATEMENT EXECUTE PROCEDURE deny_truncate();<br />
<br />
A PL/PgSQL DO block that queries <tt>pg_class</tt> and loops over it to <tt>EXECUTE</tt> a dynamic SQL <tt>CREATE TRIGGER</tt> command for each table that does not already have the trigger can be used to apply the trigger to all tables.<br />
<br />
=== Initial setup ===<br />
<br />
To set up LLSR or BDR you first need a patched PostgreSQL that can support LLSR/BDR, then you need to create one or more LLSR/BDR senders and one or more LLSR/BDR receivers.<br />
<br />
==== Installing the patched PostgreSQL binaries ====<br />
<br />
Currently BDR is only available in builds of the 'bdr' branch on Andres Freund's git repo on git.postgresql.org. PostgreSQL 9.2 and below do not support BDR, and 9.3 requires patches, so this guide will not work for you if you are trying to use a normal install of PostgreSQL.<br />
<br />
First you need to clone, configure, compile and install like normal. Clone the sources from <tt>git://git.postgresql.org/git/users/andresfreund/postgres.git</tt> and checkout the <tt>bdr</tt> branch.<br />
<br />
If you have an existing local PostgreSQL git tree specify it as <tt>--reference /path/to/existing/tree</tt> to greatly speed your git clone.<br />
<br />
Example:<br />
<br />
mkdir -p $HOME/bdr<br />
cd bdr<br />
git clone git://git.postgresql.org/git/users/andresfreund/postgres.git $HOME/bdr/postgres-bdr-src<br />
cd postgres-bdr-src<br />
./configure --prefix=$HOME/bdr/postgres-bdr-bin<br />
make install<br />
cd contrib/bdr<br />
make install<br />
<br />
This will put everything in <tt>$HOME/bdr</tt>, with the source code and build tree in <tt>$HOME/bdr/postgres-bdr-src</tt> and the installed PostgreSQL in <tt>$HOME/bdr/postgres-bdr-bin</tt>. This is a convenient setup for testing and development because it doesn't require you to set up new users, wrangle permissions, run anything as root, etc, but it isn't recommended that you deploy this way in production.<br />
<br />
To actually use these new binaries you will need to:<br />
<br />
export PATH=$HOME/bdr/postgres-bdr-bin/bin:$PATH<br />
<br />
before running <tt>initdb</tt>, <tt>postgres</tt>, etc. You don't have to use the <tt>psql</tt> or <tt>libpq</tt> you compiled but you're likely to get version mismatch warnings if you don't.<br />
<br />
=== Parameter Reference ===<br />
<br />
The following parameters are new or have been changed in PostgreSQL's new logical streaming replication.<br />
<br />
==== <tt>shared_preload_libraries = ‘bdr’</tt> ====<br />
<br />
To load support for receiving changes on a downstream master, the <tt>bdr</tt> library must be added to the existing ‘shared_preload_libraries’ parameter. This loads the bdr library during postmaster start-up and allows it to create the required background worker(s).<br />
<br />
Upstream masters don't need to load the bdr library unless they're also operating as a downstream master as is the case in a BDR configuration.<br />
<br />
==== <tt>bdr.connections</tt> ====<br />
<br />
A comma-separated list of upstream master connection names is specified in <tt>bdr.connections</tt>. These names must be simple alphanumeric strings. They are used when naming the connection in error messages, configuration options and logs, but are otherwise of no special meaning.<br />
<br />
A typical two-upstream-master setting might be:<br />
<br />
bdr.connections = ‘upstream1, upstream2’<br />
<br />
==== <tt>bdr.&lt;connection_name&gt;.dsn</tt> ====<br />
<br />
Each connection name must have at least a data source name specified using the <tt>bdr.&lt;connection_name&gt;.dsn</tt> parameter. The DSN syntax is the same as that used by libpq so it is not discussed in further detail here. A <tt>dbname</tt> for the database to connect to must be specified; all other parts of the DSN are optional.<br />
<br />
The local (downstream) database name is assumed to be the same as the name of the upstream database being connected to, though future versions will make this configurable.<br />
<br />
For the above two-master setting for <tt>bdr.connections</tt> the DSNs might look like:<br />
<br />
bdr.upstream1.dsn = 'host=10.1.1.2 user=postgres dbname=replicated_db'<br />
bdr.upstream2.dsn = 'host=10.1.1.3 user=postgres dbname=replicated_db'<br />
<br />
==== <tt>max_logical_slots</tt> ====<br />
<br />
The new parameter <tt>max_logical_slots</tt> has been added for use on both upstream and downstream masters. This parameter controls the maximum number of logical replication slots - upstream or downstream - that this cluster may have at a time. It must be set at postmaster start time.<br />
<br />
As logical replication slots are persistent, slots are consumed even by replicas that are not currently connected. Slot management is discussed in Starting, Stopping and Managing Replication.<br />
<br />
<tt>max_logical_slots</tt> should be set to the sum of the number of logical replication upstream masters this server will have plus the number of logical replication downstream masters will connect to it it.<br />
<br />
==== <tt>wal_level = 'logical'</tt> ====<br />
<br />
A new setting, <tt>'logical'</tt>, has been added for the existing <tt>wal_level</tt> parameter. <tt>‘logical’</tt> includes everything that the existing <tt>hot_standby</tt> setting does and adds additional details required for logical changeset decoding to the write-ahead logs. <br />
<br />
This additional information is consumed by the upstream-master-side xlog decoding worker. Downstream masters that do not also act as upstream masters do not require <tt>wal_level</tt> to be increased above the default <tt>'minimal'</tt>.<br />
<br />
<tt>wal_level</tt>, except for the new <tt>'logical'</tt> setting, is [http://www.postgresql.org/docs/current/static/runtime-config-wal.html documented in the main PostgreSQL manual].<br />
<br />
==== <tt>max_wal_senders</tt> ====<br />
<br />
Logical replication hasn't altered the <tt>max_wal_senders</tt> parameter, but it is important in upstream masters for logical replication and BDR because every logical sender consumes a <tt>max_wal_senders</tt> entry.<br />
<br />
You should configure <tt>max_wal_senders</tt> to the sum of the number of physical and logical replicas you want to allow an upstream master to serve. If you intend to use <tt>pg_basebackup</tt> you should add at least two more senders to allow for its use.<br />
<br />
Like <tt>max_logical_slots</tt>, <tt>max_wal_senders</tt> entries don't cost a large amount of memory, so you can overestimate fairly safely.<br />
<br />
<tt>max_wal_senders</tt> is documented in [http://www.postgresql.org/docs/current/static/runtime-config-replication.html the main PostgreSQL documentation].<br />
<br />
==== <tt>wal_keep_segments</tt> ====<br />
<br />
Like <tt>max_wal_senders</tt>, the <tt>wal_keep_segments</tt> parameter isn't directly changed by logical replication but is still important for upstream masters. It is not required on downstream-only masters.<br />
<br />
<tt>wal_keep_segments</tt> should be set to a value that allows for some downtime or unreachable periods for downstream masters and for heavy bursts of write activity on the upstream master. <br />
<br />
Keep in mind that enough disk space must be available for the WAL segments, each of which is 16MB. If you run out of disk space the server will halt until disk space is freed and it may be quite difficult to free space when you can no longer start the server.<br />
<br />
If you exceed the required <tt>wal_keep_segments</tt> and "Insufficient WAL segments retained" error will be reported. See [[#Troubleshooting|Troubleshooting]].<br />
<br />
<tt>wal_keep_segments</tt> is documented in the [http://www.postgresql.org/docs/current/static/runtime-config-replication.html the main PostgreSQL manual].<br />
<br />
=== Configuration ===<br />
<br />
Details on individual parameters are described in the [[parameter reference]] section.<br />
<br />
The following configuration is an example of a simple one-way LLSR replication setup - a single upstream master to a single downstream master.<br />
<br />
The upstream master (sender)'s <tt>postgresql.conf</tt> should contain settings like:<br />
<br />
wal_level = 'logical' # Include enough info for logical replication<br />
max_logical_slots = X # Number of LLSR senders + any receivers<br />
max_wal_senders = Y # Y = max_logical_slots plus any physical <br />
# streaming requirements<br />
wal_keep_segments = 5000 # Master must retain enough WAL segments to let <br />
# replicas catch up. Correct value depends on<br />
# rate of writes on master, max replica downtime<br />
# allowable. 5000 segments requires 78GB<br />
# in pg_xlog<br />
<br />
Downstream (receiver) <tt>postgresql.conf</tt>:<br />
<br />
shared_preload_libraries = 'bdr'<br />
<br />
bdr.connections="name_of_upstream_master" # list of upstream master nodenames<br />
bdr.<nodename>.dsn = 'dbname=postgres' # connection string for connection<br />
# from downstream to upstream master<br />
bdr.<nodename>.local_dbname = 'xxx' # optional parameter to cover the case <br />
# where the databasename on upstream <br />
# and downstream master differ. <br />
# (Not yet implemented)<br />
bdr.<nodename>.apply_delay # optional parameter to delay apply of<br />
# transactions, time in milliseconds <br />
bdr.synchronous_commit = ...; # optional parameter to set the<br />
# synchronous_commit parameter the<br />
# apply processes will be using<br />
max_logical_slots = X # set to the number of remotes<br />
<br />
Note that a server can be both sender and receiver, either two servers to each other or more complex configurations like replication chains/trees.<br />
<br />
The upstream (sender) <tt>pg_hba.conf</tt> must be configured to allow the downstream master to connect for replication. Otherwise you'll see errors like the following on the downstream master:<br />
<br />
FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "[local]", user "postgres"<br />
<br />
A suitable <tt>pg_hba.conf</tt> entry for a replication connection from the replica server 10.1.4.8 might be:<br />
<br />
host replication postgres 10.1.4.8/32 trust<br />
<br />
(the user name should match the user name configured in the downstream master's dsn. md5 password authentication is supported.)<br />
<br />
For more details on these parameters, see [[#Parameter Reference|Parameter Reference]].<br />
<br />
=== Troubleshooting ===<br />
<br />
==== Could not access file "bdr": No such file or directory ====<br />
<br />
If you see the error:<br />
<br />
FATAL: could not access file "bdr": No such file or directory<br />
<br />
when starting a database set up to receive BDR replication, you probably forgot to install <tt>contrib/bdr</tt>. See above.<br />
<br />
==== Invalid value for parameter ====<br />
<br />
An error like:<br />
<br />
LOG: invalid value for parameter ...<br />
<br />
when setting one of these parameters means your server doesn't support logical replication and will need to be patched or updated.<br />
<br />
==== Insufficient WAL segments retained ("requested WAL segment ... has already been removed") ====<br />
<br />
If <tt>wal_keep_segments</tt> is insufficient to meet the requirements of a replica that has fallen far behind, the master will report errors like:<br />
<br />
ERROR: requested WAL segment 00000001000000010000002D has already been removed<br />
<br />
Currently the replica errors look like:<br />
<br />
WARNING: Starting logical replication<br />
LOG: data stream ended<br />
LOG: worker process: master (PID 23812) exited with exit code 0<br />
LOG: starting background worker process "master"<br />
LOG: master initialized on master, remote dbname=master port=5434 replication=true fallback_application_name=bdr<br />
LOG: local sysid 5873181566046043070, remote: 5873181102189050714<br />
LOG: found valid replication identifier 1<br />
LOG: starting up replication at 1 from 1/2D9CA220<br />
<br />
but a more explicit error message for this condition is planned.<br />
<br />
The only way to recover from this fault is to re-seed the replica database.<br />
<br />
This fault could be prevented with feedback from the replica to the master, but this feature is not planned for the first release of BDR. Another alternative considered for future releases is making wal_keep_segments a dynamic parameter that is sized on demand.<br />
<br />
Monitoring of maximum replica lag and appropriate adjustment of wal_keep_segments will prevent this fault from arising.<br />
<br />
==== Couldn't find logical slot ====<br />
<br />
An error like:<br />
<br />
ERROR: couldn't find logical slot "bdr: 16384:5873181566046043070-1-24596:"<br />
<br />
on the upstream master suggests that a downstream master is trying to connect to a logical replication slot that no longer exists. The slot can not be re-created, so it is necessary to re-seed the downstream replica database.<br />
<br />
=== Operational Issues and Debugging ===<br />
<br />
In LLSR there are no user-level (ie SQL visible) ERRORs that have special meaning. Any ERRORs generated are likely to be serious problems of some kind, apart from apply deadlocks, which are automatically re-tried.<br />
<br />
=== Monitoring ===<br />
<br />
The following views are available for monitoring replication activity:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE pg_stat_replication]</tt><br />
* <tt>pg_stat_logical_replication</tt> (described below)<br />
* <tt>pg_stat_bdr</tt> (described below)<br />
<br />
The following configuration and logging parameters are useful for monitoring replication:<br />
<br />
* <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt><br />
<br />
==== pg_stat_logical_replication ====<br />
<br />
The new <tt>pg_stat_logical_replication</tt> view is specific to logical replication. It is based on the underlying <tt>pg_stat_get_logical_replication_slots</tt> function and has the following structure:<br />
<br />
View "pg_catalog.pg_stat_logical_replication"<br />
Column | Type | Modifiers <br />
--------------------------+---------+-----------<br />
slot_name | text | <br />
plugin | text | <br />
database | oid | <br />
active | boolean | <br />
xmin | xid | <br />
last_required_checkpoint | text | <br />
<br />
It contains one row for every connection from a downstream master to the server being queried (the upstream master). On a standalone PostgreSQL server or a downstream-only master this view will contain no rows.<br />
<br />
* <tt>slot_name</tt>: An internal name for a given logical replication slot (a connection from a downstream master to this upstream master). This slot name is used by the downstream master to uniquely identify its self and is used with the <tt>pg_receivellog</tt> command when managing logical replication slots. The slot name is composed of the decoding plugin name, the upstream database oid, the downstream system identifier (from <tt>pg_control</tt>), the downstream slot number, and the downstream database oid.<br />
<br />
* <tt>plugin</tt>: The logical replication plugin being used to decode WAL archives. You'll generally only see <tt>bdr_output</tt> here.<br />
<br />
* <tt>database</tt>: The oid of the database being replicated by this slot. You can get the database name by joining on <tt>pg_database.oid</tt>.<br />
<br />
* <tt>active</tt>: Whether this slot currently has an active connection.<br />
<br />
* <tt>xmin</tt>: The lowest transaction ID this replication slot can "see", like the xmin of a transaction or prepared transaction. xmin should keep on advancing as replication continues.<br />
<br />
* <tt>last_required_checkpoint</tt>: The checkpoint identifying the oldest WAL record required to bring this slot up to date with the upstream master. (This column is likely to be removed in a future version).<br />
<br />
==== pg_stat_bdr ====<br />
<br />
The <tt>pg_stat_bdr</tt> view is supplied by the <tt>bdr</tt> extension. It provides information on a server's connection(s) to its upstream master(s). It is not present on upstream-only masters.<br />
<br />
The primary purpose of this view is to report statistics on the progress of LLSR apply on a per-upstream master connection basis.<br />
<br />
View structure:<br />
<br />
View "public.pg_stat_bdr"<br />
Column | Type | Modifiers <br />
--------------------+--------+-----------<br />
rep_node_id | oid | <br />
riremotesysid | name | <br />
riremotedb | oid | <br />
rilocaldb | oid | <br />
nr_commit | bigint | <br />
nr_rollback | bigint | <br />
nr_insert | bigint | <br />
nr_insert_conflict | bigint | <br />
nr_update | bigint | <br />
nr_update_conflict | bigint | <br />
nr_delete | bigint | <br />
nr_delete_conflict | bigint | <br />
nr_disconnect | bigint | <br />
<br />
Fields:<br />
<br />
* <tt>rep_node_id</tt>: An internal identifier for the replication slot.<br />
<br />
* <tt>riremotesysid</tt>: The remote database system identifier, as reported by the <tt>Database system identifier</tt> line of <tt>pg_controldata /path/to/datadir</tt><br />
<br />
* <tt>riremotedb</tt>: The remote database OID, ie the <tt>oid</tt> column of the remote server's <tt>pg_catalog.pg_database</tt> entry for the replicated database. You can get the database name with <tt>select datname from pg_database where oid = 12345</tt> (where '12345' is the <tt>riremotedb</tt> oid).<br />
<br />
* <tt>rilocaldb </tt>: The local database OID, with the same meaning as <tt>riremotedb</tt> but with oids from the local system.<br />
<br />
''The rest of the rows are statistics about this upstream master slot'':<br />
<br />
* <tt>nr_commit</tt>: Number of commits applied to date from this master<br />
<br />
* <tt>nr_rollback</tt>: Number of rollbacks performed by this apply process due to recoverable errors (deadlock retries, lost races, etc) or unrecoverable errors like mismatched constraint errors.<br />
<br />
* <tt>nr_insert</tt>: Number of <tt>INSERT</tt>s performed<br />
<br />
* <tt>nr_insert_conflict</tt>: Number of <tt>INSERT</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_update</tt>: Number of <tt>UPDATE</tt>s performed<br />
<br />
* <tt>nr_update_conflict</tt>: Number of <tt>UPDATE</tt>s that resulted in conflicts.<br />
<br />
* <tt>nr_delete</tt>: Number of deletes performed<br />
<br />
* <tt>nr_delete_conflict</tt>: Number of deletes that resulted in conflicts.<br />
<br />
* <tt>nr_disconnect</tt>: Number of times this apply process has lost its connection to the upstream master since it was started.<br />
<br />
<br />
This view does not contain any information about how far behind the upstream master this downstream master is. The upstream master's <tt>pg_stat_logical_replication</tt> and <tt>pg_stat_replication</tt> views must be queried to determine replication lag.<br />
<br />
==== Monitoring replication status and lag ====<br />
<br />
As with any replication setup, it is vital to monitor replication status on all BDR nodes to ensure no node is lagging severely behind the others or is stuck.<br />
<br />
In the case of BDR a stuck or crashed node will eventually cause disk space and table bloat problems on other masters so stuck nodes should be detected and removed or repaired in a reasonably timely manner. Exactly how urgent this is depends on the workload of the BDR group.<br />
<br />
The <tt>pg_stat_logical_replication</tt> view described above may be used to verify that a downstream master is connected to its upstream master - the <tt>active</tt> boolean column is <tt>t</tt> if there's a downstream master connected.<br />
<br />
The <tt>xmin</tt> column provides an indication of whether replication is advancing; it should increase as replication progresses. There is no simple way to turn <tt>xmin</tt> into the time the last applied transaction was committed on the master, so it doesn't provide an indication of wall-clock lag.<br />
<br />
To determine wall-clock replication lag an application-level ticker may be used to periodically update a timestamp in a replicated table. The difference between this timestamp on the upstream and downstream masters provides the wall-clock replication lag. For BDR one row may be added to the table for each BDR master, giving an indication of how much lag each master has relative to each other master.<br />
<br />
=== Table and index usage statistics ===<br />
<br />
Statistics on table and index usage are updated normally by the downstream master. This is essential for correct function of auto-vacuum. If there are no local writes on the downstream master and stats have not been reset these two views should show matching results between upstream and downstream:<br />
<br />
* <tt>pg_stat_user_tables</tt><br />
* <tt>pg_statio_user_tables</tt><br />
<br />
Since indexes are used to apply changes, the identifying indexes on downstream side may appear more heavily used with workloads that perform <tt>UPDATE</tt>s and <tt>DELETE</tt>s than non-identifying indexes are. <br />
<br />
The built-in index monitoring views are:<br />
<br />
* <tt>pg_stat_user_indexes</tt><br />
* <tt>pg_statio_user_indexes</tt><br />
<br />
All these views are discussed in [http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE the PostgreSQL documentation on the statistics views].<br />
<br />
=== Starting, stopping and managing replication ===<br />
<br />
Replication is managed with the <tt>postgresql.conf</tt> settings described in "Parameter Reference" and "Configuration" above, and using the <tt>pg_receivellog</tt> utility command.<br />
<br />
==== Starting a new LLSR connection ====<br />
<br />
Logical replication is started automatically when a database is configured as a downstream master in <tt>postgresql.conf</tt> (see [[#Configuration|Configuration]]) and the postmaster is started. No explicit action is required to start replication, but replication will not actually work unless the upstream and downstream databases are identical within the requirements set by LLSR in the [[#Table definitions and DDL replication||Table definitions and DDL replication]] section.<br />
<br />
<tt>pg_dump</tt> and <tt>pg_restore</tt> may be used to set up the new replica's database.<br />
<br />
==== Viewing logical replication slots ====<br />
<br />
Examining the state of logical replication is discussed in [[#Monitoring|Monitoring]].<br />
<br />
==== Temporarily stopping an LLSR replica ====<br />
<br />
LLSR replicas can be temporarily stopped by shutting down the downstream master's postmaster.<br />
<br />
If the replica is not started back up before the upstream master discards the oldest WAL segment required for the downstream master to resume replay, as identified by the <tt>last_required_checkpoint</tt> column of <tt>pg_catalog.pg_stat_logical_replication</tt> then the replica will not resume replay. The error [[#Insufficient_WAL_segments_retained_.28.22requested_WAL_segment_..._has_already_been_removed.22.29|Insufficient WAL segments retained]] will be reported in the upstream master's logs. The replica must be re-created for replication to continue.<br />
<br />
==== Removing an LLSR replica permanently ====<br />
<br />
To remove a replication connection permanently, remove its entries from the downstream master's <tt>postgresql.conf</tt>, restart the downstream master, then use <tt>pg_receivellog</tt> to remove the replication slot on the upstream master.<br />
<br />
It is important to remove the replication slot from the upstream master(s) to prevent xid wrap-around problems and issues with table bloat caused by delayed vacuum.<br />
<br />
==== Cleaning up abandoned replication slots ====<br />
<br />
To remove a replication slot that was used for a now-defunct replica, find its slot name from the <tt>[[#pg_stat_logical_replication|pg_stat_logical_replication]]</tt> view on the upstream master then run:<br />
<br />
pg_receivellog -p 5434 -h master-hostname -d dbname \<br />
--slot='bdr: 16384:5873181566046043070-1-16384:' --stop<br />
<br />
where the argument to '--slot' is the slot name you found from the view.<br />
<br />
You may need to do this if you've created and then deleted several replicas so <tt>max_logical_slots</tt> has filled up with entries for replicas that no longer exist.<br />
<br />
== Bi-Directional Replication ==<br />
<br />
Bi-Directional replication is built directly on LLSR by configuring two or more servers as both upstream ''and'' downstream masters of each other.<br />
<br />
All of the Log Level Streaming Replication documentation applies to BDR and should be read before moving on to reading about and configuring BDR.<br />
<br />
=== Bi-Directional Replication Use Cases ===<br />
<br />
Bi-Directional Replication is designed to allow a very wide range of server connection topologies. The simplest to understand would be two servers each sending their changes to the other, which would be produced by making each server the downstream master of the other and so using two connections for each database.<br />
<br />
Logical and physical streaming replication are designed to work side-by-side. This means that a master can be replicating using physical streaming replication to a local standby server, while at the same time replicating logical changes to a remote downstream master. Logical replication works alongside cascading replication also, so a physical standby can feed changes to a downstream master, allowing upstream master sending to physical standby sending to downstream master.<br />
<br />
==== Simple multi-master pair ====<br />
<br />
A simple mulit-master "HA Cluster" with two servers:<br />
<br />
* Server "Alpha" - Master<br />
* Server "Bravo" - Master<br />
<br />
===== Configuration =====<br />
<br />
Alpha:<br />
<br />
wal_level = 'logical'<br />
max_logical_slots = 3<br />
max_wal_senders = 4<br />
wal_keep_segments = 5000<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="bravo"<br />
bdr.bravo.dsn = 'dbname=dbtoreplicate'<br />
<br />
Bravo:<br />
<br />
wal_level = 'logical'<br />
max_logical_slots = 3<br />
max_wal_senders = 4<br />
wal_keep_segments = 5000<br />
shared_preload_libraries = 'bdr'<br />
bdr.connections="alpha"<br />
bdr.alpha.dsn = 'dbname=dbtoreplicate'<br />
<br />
See [[#Configuration|Configuration]] for an explanation of these parameters.<br />
<br />
==== HA and Logical Standby ====<br />
Downstream masters allow users to create temporary tables, so they can be used as reporting servers.<br />
<br />
"HA Cluster":<br />
<br />
* Server "Alpha" - Current Master<br />
* Server "Bravo" - Physical Standby - unused, apart from as failover target for Alpha - potentially specified in synchronous_standby_names<br />
* Server "Charlie" - "Logical Standby" - downstream master<br />
<br />
==== Very High Availability Multi-Master ====<br />
A typical configuration for remote multi-master would then be:<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha using logical streaming<br />
<br />
Bandwidth between Site 1 and Site 2 is minimised<br />
<br />
==== 3-remote site simple Multi-Master Plex ====<br />
<br />
BDR supports "all to all" connections, so the latency for any change being applied on other masters is minimised. (Note that early designs of multi-master were arranged for circular replication, which has latency issues with larger numbers of nodes)<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Alpha, Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha, Charlie using logical streaming replication<br />
<br />
===== Configuration =====<br />
<br />
If you wanted to test this configuration locally you could run three PostgreSQL instances on different ports. Such a configuration would look like the following if the port numbers were used as node names for the sake of notational clarity:<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441,node_5442'<br />
bdr.node_5441.dsn='port=5441 dbname=postgres'<br />
bdr.node_5442.dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5440,node_5442'<br />
bdr.node_5440.dsn='port=5440 dbname=postgres'<br />
bdr.node_5442.dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440,node_5441'<br />
bdr.node_5440.dsn='port=5440 dbname=postgres'<br />
bdr.node_5441.dsn='port=5441 dbname=postgres'<br />
<br />
In a typical real-world configuration each server would be on the same port on a different host instead.<br />
<br />
==== 3-remote site simple Multi-Master Circular Replication ====<br />
<br />
Simpler config uses "circular replication". This is simpler but results in higher latency for changes as the number of nodes increases. It's also less resilient to network disruptions and node faults.<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Charlie using logical streaming replication<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Echo using logical streaming replication<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Alpha using logical streaming replication<br />
<br />
TODO: Regrettably this doesn't actually work yet because we don't cascade logical changes (yet).<br />
<br />
===== Configuration =====<br />
<br />
Using node names that match port numbers, for clarity<br />
<br />
Config for node_5440:<br />
<br />
port = 5440<br />
bdr.connections='node_5441'<br />
bdr.node_5441.dsn='port=5441 dbname=postgres'<br />
<br />
Config for node_5441:<br />
<br />
port = 5441<br />
bdr.connections='node_5442'<br />
bdr.node_5442.dsn='port=5442 dbname=postgres'<br />
<br />
Config for node_5442:<br />
<br />
port = 5442<br />
bdr.connections='node_5440'<br />
bdr.node_5440.dsn='port=5440 dbname=postgres'<br />
<br />
This would usually be done in the real world with databases on different hosts, all running on the same port.<br />
<br />
==== 3-remote site Max Availability Multi-Master Plex ====<br />
<br />
* Site 1<br />
** Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication<br />
** Server "Bravo" - Physical Standby - feeds changes to Charlie, Echo using logical streaming<br />
<br />
* Site 2<br />
** Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication<br />
** Server "Delta" - Physical Standby - feeds changes to Alpha, Echo using logical streaming<br />
<br />
* Site 3<br />
** Server "Echo" - Master - feeds changes to Foxtrot using physical streaming with sync replication<br />
** Server "Foxtrot" - Physical Standby - feeds changes to Alpha, Charlie using logical streaming<br />
<br />
Bandwidth and latency between sites is minimised.<br />
<br />
Config left as an exercise for the reader.<br />
<br />
==== N-site symmetric cluster replication ====<br />
<br />
Symmetric cluster is where all masters are connected to each other.<br />
<br />
N=19 has been tested and works fine.<br />
<br />
N masters requires N-1 connections to other masters, so practical limits are <100 servers, or less if you have many separate databases.<br />
<br />
The amount of work caused by each change is O(N), so there is a much lower practical limit based upon resource limits. A future option to limit to filter rows/tables for replication becomes essential with larger or more heavily updated databases, which is planned.<br />
<br />
==== Complex/Assymetric Replication ====<br />
<br />
Variety of options are possible.<br />
<br />
=== Conflict Avoidance ===<br />
<br />
==== Distributed Locking ====<br />
<br />
Some clustering systems use distributed lock mechanisms to prevent concurrent access to data. These can perform reasonably when servers are very close but cannot support geographically distributed applications as very low latency is critical for acceptable performance.<br />
<br />
Distributed locking is essentially a pessimistic approach, whereas BDR advocates an optimistic approach: avoid conflicts where possible but allow some types of conflict to occur and and resolve them when they arise.<br />
<br />
==== Global Sequences ====<br />
<br />
Many applications require unique values be assigned to database entries. Some applications use GUIDs generated by external programs, some use database-supplied values. This is important with optimistic conflict resolution schemes because uniqueness violations are "divergent errors" and are not easily resolvable.<br />
<br />
The SQL standard requires Sequence objects which provide unique values, though these are isolated to a single node. These can then used to supply default values using <tt>DEFAULT nextval('mysequence')</tt>, as with PostgreSQL's <tt>SERIAL</tt> pseudo-type.<br />
<br />
BDR requires sequences to work together across multiple nodes. This is implemented as a new <tt>SequenceAccessMethod</tt> API (SeqAM), which allows plugins that provide get/set functions for sequences. Global Sequences are then implemented as a plugin which implements the SeqAM API and communicates across nodes to allow new ranges of values to be stored for each sequence.<br />
<br />
=== Conflict Detection & Resolution ===<br />
<br />
Because local writes can occur on a master, conflict detection and avoidance is a concern for basic LLSR setups as well as full BDR configurations.<br />
<br />
==== Lock Conflicts ====<br />
<br />
Changes from the upstream master are applied on the downstream master by a single apply process. That process needs to RowExclusiveLock on the changing table and be able to write lock the changing tuple(s). Concurrent activity will prevent those changes from being immediately applied because of lock waits. Use the <tt>[http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-LOCK-WAITS log_lock_waits]</tt> facility to look for issues with apply blocking on locks.<br />
<br />
By concurrent activity on a row, we include <br />
<br />
* explicit row level locking (<tt>SELECT ... FOR UPDATE/FOR SHARE</tt>)<br />
* locking from foreign keys<br />
* implicit locking because of row <tt>UPDATE</tt>s, <tt>INSERT</tt>s or <tt>DELETE</tt>s, either from local activity or apply from other servers<br />
<br />
==== Data Conflicts ====<br />
<br />
Concurrent updates and deletes may also cause data-level conflicts to occur, which then require conflict resolution. It is important that these conflicts are resolved in a consistent and idempotent manner so that all servers end up with identical results.<br />
<br />
Concurrent updates are resolved using last-update-wins strategy using timestamps. Should timestamps be identical, the tie is broken using system identifier from <tt>pg_control</tt> though this may change in a future release.<br />
<br />
<tt>UPDATE</tt>s and <tt>INSERT</tt>s may cause uniqueness violation errors because of primary keys, unique indexes and exclusion constraints when changes are applied at remote nodes. These are not easily resolvable and represent severe application errors that cause the database contents of multiple servers to diverge from each other. Hence these are known as "divergent conflicts". Currently, replication stops should a divergent conflict occur. The errors causing the conflict can be seen in the error log of the downstream master with the problem.<br />
<br />
Updates which cannot locate a row are presumed to be <tt>DELETE</tt>/<tt>UPDATE</tt> conflicts. These are accepted as successful operations but in the case of <tt>UPDATE</tt> the data in the <tt>UPDATE</tt> is discarded.<br />
<br />
All conflicts are resolved at row level. Concurrent updates that touch completely separate columns can result in "false conflicts", where there is conflict in terms of the data, just in terms of the row update. Such conflicts will result in just one of those changes being made, the other discarded according to last update wins. It is not practical to decide when a row should be merged and when a last-update-wins stragegy should be used at the database level; such decision making would require support for application-specific conflict resolution plugins.<br />
<br />
Changing unlogged and logged tables in the same transaction can result in apparently strange outcomes since the unlogged tables aren't replicated.<br />
<br />
==== Examples ====<br />
<br />
As an example, lets say we have two tables Activity and Customer. There is a Foreign Key from Activity to Customer, constraining us to only record activity rows that have a matching customer row. <br />
<br />
* We update a row on Customer table on NodeA. The change from NodeA is applied to NodeB just as we are inserting an activity on NodeB. The inserted activity causes a FK check.... <br />
<br />
<br />
<br />
[[Category:Replication]]</div>Amshttps://wiki.postgresql.org/index.php?title=CommitFest_2008-09&diff=2188CommitFest 2008-092008-08-28T10:23:39Z<p>Ams: </p>
<hr />
<div>= CommitFest =<br />
<br />
This is the page for CommitFest starting 2008 September<br />
<br />
{{CommitFestOpen}}<br />
<br />
{{CommitFestSection|Pending patches}}<br />
<br />
{{patch|ded849dd0808171846g2c6c65adub942bd2510a6c94f@mail.gmail.com|GSoC Improved Hash Indexing|Xiao Meng}}<br />
{{comment|David Fetter|Git repository is [http://git.postgresql.org/git/~davidfetter/hash/.git here.]}}<br />
<br />
{{patch|e08cc0400807280325s69452bd3r721fda77a2ea20f9@mail.gmail.com|Windowing Functions|Hitoshi Harada}}<br />
{{comment|David Fetter|Git repository is [http://git.postgresql.org/git/~davidfetter/windows_functions/.git here.]}}<br />
<br />
{{patch|e739902b0808142300j6f19ac9difa8fab753e568b9b@mail.gmail.com|Unsigned Integer data types|Ryan Bradetich}}<br />
<br />
{{patch|20080803.114711.73374027.t-ishii@sraoss.co.jp|Common Table Expressions|Yoshiyuki Asaba|status=Pending Review}}<br />
{{comment|tgl|still hoping for an implementation README before starting serious review}}<br />
{{comment|David Fetter|README is {{messageLink|20080818.163852.105172778.t-ishii@sraoss.co.jp|here.}}}}<br />
{{comment|David Fetter|Git repository is [http://git.postgresql.org/git/~davidfetter/cte/.git here.]}}<br />
<br />
{{patch|e51f66da0805141329p3604a350mef9e997a4379b62f@mail.gmail.com|PL/Proxy|Marko Kreen|reviewers=Greg Stark}}<br />
{{comment|David Fetter|Next revision of the {{messageLink|e51f66da0806280636p1c76a953p37eeb72ecfb0b3a8@mail.gmail.com|patch here.}}}}<br />
{{comment|Zden?k Kotala|{{messageLink|16790.1216669380@sss.pgh.pa.us|Discussion}} about integration into to the core.}} <br />
<br />
{{patch|485A3F20.1080907@cybertec.at|posix fadvises|Zoltan Boszormenyi, Greg Stark|reviewers=Greg Smith, Abhijit Menon-Sen}}<br />
{{comment|Greg Smith|needs performance testing, {{messageLink|Pine.GSO.4.64.0807150045030.11155@westnet.com|here's the plan}}, will continue review throughput August}}<br />
<br />
<br />
{{patch|4863B3DA.8030209@kaigai.gr.jp|SE-PostgreSQL patches|Kaigai Kohei|reviewers=Peter Eisentraut, Abhijit Menon-Sen}}<br />
{{comment|Peter|checking with Solaris engineers about compatibility with Solaris TX; will continue review throughout August}}<br />
{{comment|KaiGai|{{messageLink|488F0C02.4020708@ak.jp.nec.com|latest patch versions}}}}<br />
<br />
{{patch|1215274649.4051.319.camel@ebony.site|psql help corrections|Simon Riggs}}<br />
<br />
{{patch|48325386.2010506@esilo.com|libpq object hooks|Merlin Moncure, Andrew Chernow|status=WIP}}<br />
<br />
{{patch|603c8f070807261444s133fb281sf34d069ab5b4c0b@mail.gmail.com|Add a separate TRUNCATE permission|Robert Haas}}<br />
<br />
{{patch|2FD031D4-7B94-4EEF-9C4D-34A77F3FE256@kineticode.com|Test citext casts|David Wheeler}}<br />
{{comment|David Wheeler|Next revision of the {{messageLink|F721EFF1-553C-4E25-A293-7BD08D6957F4@kineticode.com|patch here.}}}}<br />
{{patch|20080722055735.GW30869@sonic.net|pg_dumpall lock timeout|David Gould}}<br />
{{comment|David Gould|this was part of the pg_dump lock timeout patch that was committed in July, but apparently got missed.}}<br />
<br />
{{patch|20080805150620.A197.52131E4D@oss.ntt.co.jp|NDirectFileRead and Write|Takahiro Itagaki}}<br />
<br />
{{patch|20080812170932.8E65.52131E4D@oss.ntt.co.jp|Copy column storage parameters on CREATE TABLE LIKE/INHERITS|Takahiro Itagaki}}<br />
<br />
{{patch|20080819110134.C6FD.52131E4D@oss.ntt.co.jp|pgbench duration option|Takahiro Itagaki}}<br />
<br />
{{patch|d7df81620808130429l2a75c895g5dd6fe8ae64cc23e@mail.gmail.com|3 new functions into intarray and intagg|Dmitry Koterov}}<br />
<br />
{{patch|48A4954B.3040903@students.mimuw.edu.pl|operator restrictivity function for text search|Jan Urba?ski}}<br />
<br />
{{patch|20080819135214.C708.52131E4D@oss.ntt.co.jp|GUC flags to custom variables|Takahiro Itagaki}}<br />
<br />
{{patch|1218204150.4549.610.camel@ebony.2ndQuadrant|pg_stop_backup minor fixes|Simon Riggs}}<br />
<br />
{{patch|20080710031140.GA427@toroid.org|Extending grant insert on tables to sequences|Jaime Casanova}}<br />
<br />
{{patch|20080828111915.76FC.52131E4D@oss.ntt.co.jp|contrib/auto_explain|Takahiro Itagaki|status=WIP}}<br />
<br />
{{patch|20080731070754.GA1764@toroid.org|pg_get_functiondef() and psql \ef|Abhijit Menon-Sen}}<br />
<br />
{{patch|20080807080803.GA18573@toroid.org|Allow has_table_privilege(...,'usage') on sequences|Abhijit Menon-Sen}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Committed patches}}<br />
<br />
{{patch|1215263027.4051.299.camel@ebony.site|pgbench minor fixes|Simon Riggs|status=Committed 2008-08-22}}<br />
<br />
{{patch|1215502581.4051.674.camel@ebony.site|suset log temp files|Simon Riggs|status=Committed 2008-08-22}}<br />
<br />
{{patch|3073cc9b0808242315v6ce82077o75fba36b526a54dd@mail.gmail.com|Make some internal SRF functions use output parameters|Jaime Casanova|status=Committed 2008-08-25|reviewers=Magnus Hagander}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Returned with Feedback}}<br />
{{CommitFestEndSection}}</div>Amshttps://wiki.postgresql.org/index.php?title=CommitFest_2008-09&diff=2187CommitFest 2008-092008-08-28T10:22:17Z<p>Ams: Add has_table_privilege patch</p>
<hr />
<div>= CommitFest =<br />
<br />
This is the page for CommitFest starting 2008 September<br />
<br />
{{CommitFestOpen}}<br />
<br />
{{CommitFestSection|Pending patches}}<br />
<br />
{{patch|ded849dd0808171846g2c6c65adub942bd2510a6c94f@mail.gmail.com|GSoC Improved Hash Indexing|Xiao Meng}}<br />
{{comment|David Fetter|Git repository is [http://git.postgresql.org/git/~davidfetter/hash/.git here.]}}<br />
<br />
{{patch|e08cc0400807280325s69452bd3r721fda77a2ea20f9@mail.gmail.com|Windowing Functions|Hitoshi Harada}}<br />
{{comment|David Fetter|Git repository is [http://git.postgresql.org/git/~davidfetter/windows_functions/.git here.]}}<br />
<br />
{{patch|e739902b0808142300j6f19ac9difa8fab753e568b9b@mail.gmail.com|Unsigned Integer data types|Ryan Bradetich}}<br />
<br />
{{patch|20080803.114711.73374027.t-ishii@sraoss.co.jp|Common Table Expressions|Yoshiyuki Asaba|status=Pending Review}}<br />
{{comment|tgl|still hoping for an implementation README before starting serious review}}<br />
{{comment|David Fetter|README is {{messageLink|20080818.163852.105172778.t-ishii@sraoss.co.jp|here.}}}}<br />
{{comment|David Fetter|Git repository is [http://git.postgresql.org/git/~davidfetter/cte/.git here.]}}<br />
<br />
{{patch|e51f66da0805141329p3604a350mef9e997a4379b62f@mail.gmail.com|PL/Proxy|Marko Kreen|reviewers=Greg Stark}}<br />
{{comment|David Fetter|Next revision of the {{messageLink|e51f66da0806280636p1c76a953p37eeb72ecfb0b3a8@mail.gmail.com|patch here.}}}}<br />
{{comment|Zden?k Kotala|{{messageLink|16790.1216669380@sss.pgh.pa.us|Discussion}} about integration into to the core.}} <br />
<br />
{{patch|485A3F20.1080907@cybertec.at|posix fadvises|Zoltan Boszormenyi, Greg Stark|reviewers=Greg Smith, Abhijit Menon-Sen}}<br />
{{comment|Greg Smith|needs performance testing, {{messageLink|Pine.GSO.4.64.0807150045030.11155@westnet.com|here's the plan}}, will continue review throughput August}}<br />
<br />
<br />
{{patch|4863B3DA.8030209@kaigai.gr.jp|SE-PostgreSQL patches|Kaigai Kohei|reviewers=Peter Eisentraut, Abhijit Menon-Sen}}<br />
{{comment|Peter|checking with Solaris engineers about compatibility with Solaris TX; will continue review throughout August}}<br />
{{comment|KaiGai|{{messageLink|488F0C02.4020708@ak.jp.nec.com|latest patch versions}}}}<br />
<br />
{{patch|1215274649.4051.319.camel@ebony.site|psql help corrections|Simon Riggs}}<br />
<br />
{{patch|48325386.2010506@esilo.com|libpq object hooks|Merlin Moncure, Andrew Chernow|status=WIP}}<br />
<br />
{{patch|603c8f070807261444s133fb281sf34d069ab5b4c0b@mail.gmail.com|Add a separate TRUNCATE permission|Robert Haas}}<br />
<br />
{{patch|2FD031D4-7B94-4EEF-9C4D-34A77F3FE256@kineticode.com|Test citext casts|David Wheeler}}<br />
{{comment|David Wheeler|Next revision of the {{messageLink|F721EFF1-553C-4E25-A293-7BD08D6957F4@kineticode.com|patch here.}}}}<br />
{{patch|20080722055735.GW30869@sonic.net|pg_dumpall lock timeout|David Gould}}<br />
{{comment|David Gould|this was part of the pg_dump lock timeout patch that was committed in July, but apparently got missed.}}<br />
<br />
{{patch|20080805150620.A197.52131E4D@oss.ntt.co.jp|NDirectFileRead and Write|Takahiro Itagaki}}<br />
<br />
{{patch|20080812170932.8E65.52131E4D@oss.ntt.co.jp|Copy column storage parameters on CREATE TABLE LIKE/INHERITS|Takahiro Itagaki}}<br />
<br />
{{patch|20080819110134.C6FD.52131E4D@oss.ntt.co.jp|pgbench duration option|Takahiro Itagaki}}<br />
<br />
{{patch|d7df81620808130429l2a75c895g5dd6fe8ae64cc23e@mail.gmail.com|3 new functions into intarray and intagg|Dmitry Koterov}}<br />
<br />
{{patch|48A4954B.3040903@students.mimuw.edu.pl|operator restrictivity function for text search|Jan Urba?ski}}<br />
<br />
{{patch|20080819135214.C708.52131E4D@oss.ntt.co.jp|GUC flags to custom variables|Takahiro Itagaki}}<br />
<br />
{{patch|1218204150.4549.610.camel@ebony.2ndQuadrant|pg_stop_backup minor fixes|Simon Riggs}}<br />
<br />
{{patch|20080710031140.GA427@toroid.org|Extending grant insert on tables to sequences|Jaime Casanova}}<br />
<br />
{{patch|20080828111915.76FC.52131E4D@oss.ntt.co.jp|contrib/auto_explain|Takahiro Itagaki|status=WIP}}<br />
<br />
{{patch|20080731070754.GA1764@toroid.org|pg_get_functiondef() and psql \ef|Abhijit Menon-Sen}}<br />
<br />
{{patch|20080731070754.GA1764@toroid.org|Allow has_table_privilege(...,'usage') on sequences|Abhijit Menon-Sen}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Committed patches}}<br />
<br />
{{patch|1215263027.4051.299.camel@ebony.site|pgbench minor fixes|Simon Riggs|status=Committed 2008-08-22}}<br />
<br />
{{patch|1215502581.4051.674.camel@ebony.site|suset log temp files|Simon Riggs|status=Committed 2008-08-22}}<br />
<br />
{{patch|3073cc9b0808242315v6ce82077o75fba36b526a54dd@mail.gmail.com|Make some internal SRF functions use output parameters|Jaime Casanova|status=Committed 2008-08-25|reviewers=Magnus Hagander}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Returned with Feedback}}<br />
{{CommitFestEndSection}}</div>Amshttps://wiki.postgresql.org/index.php?title=CommitFest_2008-09&diff=2186CommitFest 2008-092008-08-28T10:19:33Z<p>Ams: add \ef patch</p>
<hr />
<div>= CommitFest =<br />
<br />
This is the page for CommitFest starting 2008 September<br />
<br />
{{CommitFestOpen}}<br />
<br />
{{CommitFestSection|Pending patches}}<br />
<br />
{{patch|ded849dd0808171846g2c6c65adub942bd2510a6c94f@mail.gmail.com|GSoC Improved Hash Indexing|Xiao Meng}}<br />
{{comment|David Fetter|Git repository is [http://git.postgresql.org/git/~davidfetter/hash/.git here.]}}<br />
<br />
{{patch|e08cc0400807280325s69452bd3r721fda77a2ea20f9@mail.gmail.com|Windowing Functions|Hitoshi Harada}}<br />
{{comment|David Fetter|Git repository is [http://git.postgresql.org/git/~davidfetter/windows_functions/.git here.]}}<br />
<br />
{{patch|e739902b0808142300j6f19ac9difa8fab753e568b9b@mail.gmail.com|Unsigned Integer data types|Ryan Bradetich}}<br />
<br />
{{patch|20080803.114711.73374027.t-ishii@sraoss.co.jp|Common Table Expressions|Yoshiyuki Asaba|status=Pending Review}}<br />
{{comment|tgl|still hoping for an implementation README before starting serious review}}<br />
{{comment|David Fetter|README is {{messageLink|20080818.163852.105172778.t-ishii@sraoss.co.jp|here.}}}}<br />
{{comment|David Fetter|Git repository is [http://git.postgresql.org/git/~davidfetter/cte/.git here.]}}<br />
<br />
{{patch|e51f66da0805141329p3604a350mef9e997a4379b62f@mail.gmail.com|PL/Proxy|Marko Kreen|reviewers=Greg Stark}}<br />
{{comment|David Fetter|Next revision of the {{messageLink|e51f66da0806280636p1c76a953p37eeb72ecfb0b3a8@mail.gmail.com|patch here.}}}}<br />
{{comment|Zden?k Kotala|{{messageLink|16790.1216669380@sss.pgh.pa.us|Discussion}} about integration into to the core.}} <br />
<br />
{{patch|485A3F20.1080907@cybertec.at|posix fadvises|Zoltan Boszormenyi, Greg Stark|reviewers=Greg Smith, Abhijit Menon-Sen}}<br />
{{comment|Greg Smith|needs performance testing, {{messageLink|Pine.GSO.4.64.0807150045030.11155@westnet.com|here's the plan}}, will continue review throughput August}}<br />
<br />
<br />
{{patch|4863B3DA.8030209@kaigai.gr.jp|SE-PostgreSQL patches|Kaigai Kohei|reviewers=Peter Eisentraut, Abhijit Menon-Sen}}<br />
{{comment|Peter|checking with Solaris engineers about compatibility with Solaris TX; will continue review throughout August}}<br />
{{comment|KaiGai|{{messageLink|488F0C02.4020708@ak.jp.nec.com|latest patch versions}}}}<br />
<br />
{{patch|1215274649.4051.319.camel@ebony.site|psql help corrections|Simon Riggs}}<br />
<br />
{{patch|48325386.2010506@esilo.com|libpq object hooks|Merlin Moncure, Andrew Chernow|status=WIP}}<br />
<br />
{{patch|603c8f070807261444s133fb281sf34d069ab5b4c0b@mail.gmail.com|Add a separate TRUNCATE permission|Robert Haas}}<br />
<br />
{{patch|2FD031D4-7B94-4EEF-9C4D-34A77F3FE256@kineticode.com|Test citext casts|David Wheeler}}<br />
{{comment|David Wheeler|Next revision of the {{messageLink|F721EFF1-553C-4E25-A293-7BD08D6957F4@kineticode.com|patch here.}}}}<br />
{{patch|20080722055735.GW30869@sonic.net|pg_dumpall lock timeout|David Gould}}<br />
{{comment|David Gould|this was part of the pg_dump lock timeout patch that was committed in July, but apparently got missed.}}<br />
<br />
{{patch|20080805150620.A197.52131E4D@oss.ntt.co.jp|NDirectFileRead and Write|Takahiro Itagaki}}<br />
<br />
{{patch|20080812170932.8E65.52131E4D@oss.ntt.co.jp|Copy column storage parameters on CREATE TABLE LIKE/INHERITS|Takahiro Itagaki}}<br />
<br />
{{patch|20080819110134.C6FD.52131E4D@oss.ntt.co.jp|pgbench duration option|Takahiro Itagaki}}<br />
<br />
{{patch|d7df81620808130429l2a75c895g5dd6fe8ae64cc23e@mail.gmail.com|3 new functions into intarray and intagg|Dmitry Koterov}}<br />
<br />
{{patch|48A4954B.3040903@students.mimuw.edu.pl|operator restrictivity function for text search|Jan Urba?ski}}<br />
<br />
{{patch|20080819135214.C708.52131E4D@oss.ntt.co.jp|GUC flags to custom variables|Takahiro Itagaki}}<br />
<br />
{{patch|1218204150.4549.610.camel@ebony.2ndQuadrant|pg_stop_backup minor fixes|Simon Riggs}}<br />
<br />
{{patch|20080710031140.GA427@toroid.org|Extending grant insert on tables to sequences|Jaime Casanova}}<br />
<br />
{{patch|20080828111915.76FC.52131E4D@oss.ntt.co.jp|contrib/auto_explain|Takahiro Itagaki|status=WIP}}<br />
<br />
{{patch|20080731070754.GA1764@toroid.org|pg_get_functiondef() and psql \ef|Abhijit Menon-Sen}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Committed patches}}<br />
<br />
{{patch|1215263027.4051.299.camel@ebony.site|pgbench minor fixes|Simon Riggs|status=Committed 2008-08-22}}<br />
<br />
{{patch|1215502581.4051.674.camel@ebony.site|suset log temp files|Simon Riggs|status=Committed 2008-08-22}}<br />
<br />
{{patch|3073cc9b0808242315v6ce82077o75fba36b526a54dd@mail.gmail.com|Make some internal SRF functions use output parameters|Jaime Casanova|status=Committed 2008-08-25|reviewers=Magnus Hagander}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Returned with Feedback}}<br />
{{CommitFestEndSection}}</div>Amshttps://wiki.postgresql.org/index.php?title=CommitFest_2008-07&diff=1727CommitFest 2008-072008-07-11T12:12:44Z<p>Ams: status update</p>
<hr />
<div>= CommitFest =<br />
<br />
This is the page for CommitFest starting July 2008<br />
<br />
{{CommitFestCurrent}}<br />
<br />
{{CommitFestSection|Pending patches}}<br />
<br />
{{patch|20080511113047.GV5673@sonic.net|pg_dump lock timeout|Dave Gould|status=Waiting on response|reviewers=David Fetter, Stephen Frost}}<br />
{{review|20080703013346.GP31154@tamriel.snowman.net|Stephen Frost|Couple of minor changes requested, otherwise good}}<br />
{{comment|Stephen Frost|Dave Gould will make the changes agreed on and provide an update.}}<br />
<br />
{{patch|20080518.205129.86993930.t-ishii@sraoss.co.jp|WITH RECURSIVE|Yoshiyuki Asaba|status=Pending Review|reviewers=David Fetter}}<br />
{{review|20080518.205129.86993930.t-ishii@sraoss.co.jp|David Fetter|Updated with fixes from Yoshiyuki Asaba and Michael Meskes}}<br />
{{comment|alvherre|There's a long TODO list {{messageLink|20080527.101013.85412760.t-ishii@sraoss.co.jp|here}}}}<br />
{{comment|David Fetter|update patch {{messageLink|20080702231101.GH19610@fetter.org|here}}}}<br />
<br />
{{patch|482F955D.1050600@sun.com|DTrace probe additions|Robert Lor|reviewers=Zdeněk Kotala|status=Waiting on response}}<br />
{{comment|Theo Schlossnagle|another {{messageLink|BB9C10CC-EF5F-442F-949F-2D1E9C6F7A85@omniti.com|DTrace probe patch}}}}<br />
{{comment|Zdeněk Kotala|Robert Treat sent {{messageLink|200807022300.19538.xzilla@users.sourceforge.net|merge of DTrace probe patches}}}}<br />
{{comment|Zdeněk Kotala|Review is {{messageLink|486E35CE.4070309@sun.com|there}} and [http://reviewdemo.postgresql.org/r/25/ here]. Merged patch does not work.}}<br />
<br />
{{patch|e51f66da0805141329p3604a350mef9e997a4379b62f@mail.gmail.com|PL/Proxy|Marko Kreen|reviewers=Greg Stark}}<br />
{{comment|David Fetter|Next revision of the {{messageLink|e51f66da0806280636p1c76a953p37eeb72ecfb0b3a8@mail.gmail.com|patch here.}}}}<br />
<br />
{{patch|c2d9e70e0805232156p20dbe2b9j2a88d35f951d625e@mail.gmail.com|Extending permissions on tables to sequences|Jaime Casanova|reviewers=Alvaro Herrera, Abhijit Menon-Sen|status=Waiting for response}}<br />
{{comment|Abhijit Menon-Sen|The patch looks OK (modulo some {{messageLink|20080710031140.GA427@toroid.org|nits}}), but do we want the feature at all?}}<br />
<br />
{{patch|162867790806030403r221a6ae3s657f78ad2da9237f@mail.gmail.com|Table function support|Pavel Stěhule|reviewers=Marko Kreen}}<br />
{{review|e51f66da0807090405gd5ad4c3h892986181cf09d7a@mail.gmail.com|Marko Kreen|Minor issues}}<br />
{{comment|Pavel Stehule|update patch {{messageLink|162867790807100638n47c00eb4w304473a247a20aa7@mail.gmail.com|here}}}}<br />
<br />
{{patch|162867790806052206l7f6aae61ha29f6720c66caa6@mail.gmail.com|array_fill function|Pavel Stěhule|reviewers=Thomas Lee, Bruce Momjian}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|GIN fast insert()|Teodor Sigaev, Oleg Bartunov|reviewers=Thomas Lee, Bruce Momjian}} <br />
{{comment|teodor|updated patch {{messageLink|4849418C.6080909@sigaev.ru|here}}}}<br />
{{comment|thomas.lee|will likely need somebody else to assist with this particular review as I'm relatively new to postgres}}<br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|multicolumn GIN|Teodor Sigaev, Oleg Bartunov|reviewers=Neil Conway}} <br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}. I'd like to commit this patch before GIN fast insert one}}<br />
<br />
{{patch|2e78013d0806092232h6ca15ffejcbcd24e88401308f@mail.gmail.com|VACUUM Improvements - Avoiding second heap scan|Pavan Deolasee|status=WIP}}<br />
<br />
{{patch|36152.64.119.130.186.1213364119.squirrel@mail.mohawksoft.com|SSL configure patch|Mark Woodward|reviewers=Abhijit Menon-Sen|status=Waiting for response}}<br />
<br />
{{patch|48529B83.9010907@sun.com|page macros cleanup|Zdeněk Kotala|reviewers=Pavan Deolasee, Heikki Linnakangas}}<br />
{{comment|Zdeněk|updated patch {{messageLink|486E138F.4060900@sun.com|here}} - ver 04}}<br />
<br />
{{patch|485A3F20.1080907@cybertec.at|posix fadvises|Zoltan Boszormenyi, Greg Stark|reviewers=Greg Smith, Abhijit Menon-Sen|status=Waiting for response}}<br />
<br />
{{patch|162867790806230613w5719d25ejd2a03fac84792d18@mail.gmail.com|Custom variadic functions|Pavel Stěhule|reviewers=Jeff Davis}}<br />
{{comment|pavel|updated patch {{messageLink|162867790806240810jc61fd56le8563fe4f7d9a265@mail.gmail.com|here}}}}<br />
<br />
{{patch|4863B3DA.8030209@kaigai.gr.jp|SE-PostgreSQL patches|Kaigai Kohei|reviewers=Peter Eisentraut, Abhijit Menon-Sen}}<br />
<br />
{{patch|1210171029.4268.115.camel@ebony.site|pg_dump additional options for performance|Simon Riggs|reviewers=Stephen Frost}}<br />
<br />
{{patch|20080626214839.xbfsvm8740gsswos@webmail.cecs.pdx.edu|EXPLAIN in XML format|Tom Raney, Germán Poó Caamaño}}<br />
{{comment|David Fetter|updated patch {{messageLink|20080701214825.m2eczeym8404ss4g@webmail.cecs.pdx.edu|here}}}}<br />
{{comment|Simon Riggs|comments posted on-list - IMHO not ready for commit}}<br />
<br />
{{patch|4013F1AE-FE1B-427B-8C23-1A5681DA297E@kineticode.com|CITEXT: A case-insensitive TEXT type|David E. Wheeler|reviewers=Zdeněk Kotala}}<br />
{{comment|David Wheeler|v2 patch {{messageLink|ACEC459B-CB5B-4B71-87FA-55E6A649C17C@kineticode.com|here}}}}<br />
{{comment|David Wheeler|v3 patch {{messageLink|890EA230-DA04-4D65-996F-5E7107690BE8@kineticode.com|here}}}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|Relation forks|Heikki Linnakangas}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|FSM rewrite|Heikki Linnakangas}}<br />
{{comment|Simon Riggs|comments posted on-list with further ideas}}<br />
<br />
{{patch|87wsn82lda.fsf@oxford.xeocode.com|SIGINFO to EXPLAIN queries in progress|status=Proof of Concept|Greg Stark}}<br />
<br />
{{patch|1214865781.3845.525.camel@ebony.site|Hint Bits and Write I/O|Simon Riggs|reviewers=Pavan Deolasee}}<br />
<br />
{{patch|1214901548.3845.544.camel@ebony.site|pg_standby minor changes for Windows|Simon Riggs|reviewers=Martin Zaun}}<br />
<br />
{{patch|20080623150535.946E.52131E4D@oss.ntt.co.jp|executor_hook|Takahiro Itagaki|status=Ready to Commit|reviewers=Simon Riggs}}<br />
{{comment|Simon Riggs|executor_hook.patch looks good. OK to commit}}<br />
{{comment|tgl|don't like exposing ExecutePlan, why not hook at ExecutorRun?}}<br />
<br />
{{patch|de5165440807010758h147d5fw1fc969bc66dcb300@mail.gmail.com|Collation at database level|status=WIP |Radek Strnad }}<br />
<br />
{{patch|48325386.2010506@esilo.com|libpq object hooks|Merlin Moncure, Andrew Chernow|status=WIP}}<br />
<br />
{{patch|BAY102-W42AF4FE40B20BE3C18EE73F2980@phx.gbl|Auto-Explain|Dean Rasheed|reviewers=Simon Riggs}}<br />
{{comment|Simon Riggs|Code location needs review; change proposal made to -hackers}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Committed patches}}<br />
<br />
{{patch|20080512215757.GB22159@fetter.org|psql: \timing as a boolean arg|David Fetter|status=Committed 2008-06-11|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4851FC54.2020708@timbira.com|small typo in DTrace docs|Euler Taveira de Oliveira|status=Committed 2008-06-18|reviewers=Neil Conway}}<br />
{{comment|alvherre|Note another typo "definitons" in patch}}<br />
<br />
{{patch|486101C0.9090006@vector-seven.com|GUC variable to replace PGBE_ACTIVITY_SIZE|Thomas Lee|status=Committed 2008-06-30|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|482CBDB0.7020901@students.mimuw.edu.pl|extend VacAttrStats to allowstavalues of different types|Jan Urbański|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
{{comment|Jan Urbański|updated patch {{messageLink|484418B8.6060004@students.mimuw.edu.pl|here}}}}<br />
<br />
{{patch|20080612091033.40e02ebd@greg-laptop|Better formatting of functions in pg_dump|Greg Sabino Mullane|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4831BD6B.5070109@lelarge.info|multi-version support for psql \d commands|Guillaume Lelarge|status=Committed 2008-07-02|reviewers=Tom Lane}}<br />
{{comment|gleu|updated patch {{messageLink|48333337.4040105@lelarge.info|here}}}}<br />
<br />
{{patch|937d27e10805021354s70b24c0l29f7f18dc0ad0ec9@mail.gmail.com|pg_get_keywords()|Dave Page|status=Committed 2008-07-03|reviewers=Nikhil Sontakke, Tom Lane}} <br />
{{comment|dpage|updated patch {{messageLink|937d27e10805031344n440e5ea5mbff6f5cda9d548f9@mail.gmail.com|here}}}}<br />
{{comment|nikhils|review complete from my side. Updated patch along with comments posted back {{messageLink|d3c4af540807022352w1876fe8fu60c1533396f9eb2c@mail.gmail.com|here}}}}<br />
<br />
{{patch|1215258625.4051.281.camel@ebony.site|pg_standby keepfiles calc bug|Simon Riggs|status=Committed 2008-07-08|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|79636C5E324155408C7FA171@imhotep.credativ.de|Adding variables for segment_size, wal_segment_size and block sizes|Bernd Helmle|reviewers=Abhijit Menon-Sen, Tom Lane|status=Committed 2008-07-10}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Returned with Feedback}}<br />
<br />
{{patch|48649261.5040703@students.mimuw.edu.pl|text search selectivity and dllist enhancements|Jan Urbański|status=Returned for rework|reviewers=Tom Lane}}<br />
{{comment|tgl|review {{messageLink|6233.1215113144@sss.pgh.pa.us|here}} --- data structure could be improved and simplified}}<br />
<br />
{{patch|1214858056.3845.514.camel@ebony.site|Stats Hooks|Simon Riggs|status=Returned for rework|reviewers=Tom Lane}}<br />
{{comment|Josh Berkus|[http://archives.postgresql.org/pgsql-hackers/2008-06/msg00975.php discussion of this patch is here]}}<br />
{{comment|tgl|review {{messageLink|12400.1215716369@sss.pgh.pa.us|here}} --- won't work as submitted}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
== Round Robin Reviewers ==<br />
<br />
{| border="1" cellpadding="1" cellspacing="0" style="width: 60%; border-collapse: collapse; border: 1px solid #ccc; font-size: 90%;"<br />
|- style="background: #eee;"<br />
! width="30%" | Name<br />
!Status<br />
!Patch Count<br />
|-<br />
|Neil Conway || Available || 1<br />
|-<br />
|Jeff Davis || Available || 1<br />
|-<br />
|Pavan Deolasee || Available || 2<br />
|-<br />
|Andrew Dunstan || Vacation || 0<br />
|-<br />
|Stephen Frost || Available || 2<br />
|-<br />
|Álvaro Herrera || Available || 1<br />
|-<br />
|Zdeněk Kotala || Available || 1<br />
|-<br />
|Thomas Lee || Available || 2<br />
|-<br />
|Abhijit Menon-Sen || Available || 3<br />
|-<br />
|Bruce Momjian || Available || 2<br />
|-<br />
|Nikhil Sontakke || Available || 1<br />
|-<br />
|Greg Stark || Available || 1<br />
|-<br />
|Martin Zaun || Available || 1<br />
|}</div>Amshttps://wiki.postgresql.org/index.php?title=CommitFest_2008-07&diff=1726CommitFest 2008-072008-07-11T12:10:38Z<p>Ams: </p>
<hr />
<div>= CommitFest =<br />
<br />
This is the page for CommitFest starting July 2008<br />
<br />
{{CommitFestCurrent}}<br />
<br />
{{CommitFestSection|Pending patches}}<br />
<br />
{{patch|20080511113047.GV5673@sonic.net|pg_dump lock timeout|Dave Gould|status=Waiting on response|reviewers=David Fetter, Stephen Frost}}<br />
{{review|20080703013346.GP31154@tamriel.snowman.net|Stephen Frost|Couple of minor changes requested, otherwise good}}<br />
{{comment|Stephen Frost|Dave Gould will make the changes agreed on and provide an update.}}<br />
<br />
{{patch|20080518.205129.86993930.t-ishii@sraoss.co.jp|WITH RECURSIVE|Yoshiyuki Asaba|status=Pending Review|reviewers=David Fetter}}<br />
{{review|20080518.205129.86993930.t-ishii@sraoss.co.jp|David Fetter|Updated with fixes from Yoshiyuki Asaba and Michael Meskes}}<br />
{{comment|alvherre|There's a long TODO list {{messageLink|20080527.101013.85412760.t-ishii@sraoss.co.jp|here}}}}<br />
{{comment|David Fetter|update patch {{messageLink|20080702231101.GH19610@fetter.org|here}}}}<br />
<br />
{{patch|482F955D.1050600@sun.com|DTrace probe additions|Robert Lor|reviewers=Zdeněk Kotala|status=Waiting on response}}<br />
{{comment|Theo Schlossnagle|another {{messageLink|BB9C10CC-EF5F-442F-949F-2D1E9C6F7A85@omniti.com|DTrace probe patch}}}}<br />
{{comment|Zdeněk Kotala|Robert Treat sent {{messageLink|200807022300.19538.xzilla@users.sourceforge.net|merge of DTrace probe patches}}}}<br />
{{comment|Zdeněk Kotala|Review is {{messageLink|486E35CE.4070309@sun.com|there}} and [http://reviewdemo.postgresql.org/r/25/ here]. Merged patch does not work.}}<br />
<br />
{{patch|e51f66da0805141329p3604a350mef9e997a4379b62f@mail.gmail.com|PL/Proxy|Marko Kreen|reviewers=Greg Stark}}<br />
{{comment|David Fetter|Next revision of the {{messageLink|e51f66da0806280636p1c76a953p37eeb72ecfb0b3a8@mail.gmail.com|patch here.}}}}<br />
<br />
{{patch|c2d9e70e0805232156p20dbe2b9j2a88d35f951d625e@mail.gmail.com|Extending permissions on tables to sequences|Jaime Casanova|reviewers=Alvaro Herrera, Abhijit Menon-Sen|status=Waiting for response}}<br />
{{comment|Abhijit Menon-Sen|The patch looks OK (modulo some {{messageLink|20080710031140.GA427@toroid.org|nits}}), but do we want the feature at all?}}<br />
<br />
{{patch|162867790806030403r221a6ae3s657f78ad2da9237f@mail.gmail.com|Table function support|Pavel Stěhule|reviewers=Marko Kreen}}<br />
{{review|e51f66da0807090405gd5ad4c3h892986181cf09d7a@mail.gmail.com|Marko Kreen|Minor issues}}<br />
{{comment|Pavel Stehule|update patch {{messageLink|162867790807100638n47c00eb4w304473a247a20aa7@mail.gmail.com|here}}}}<br />
<br />
{{patch|162867790806052206l7f6aae61ha29f6720c66caa6@mail.gmail.com|array_fill function|Pavel Stěhule|reviewers=Thomas Lee, Bruce Momjian}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|GIN fast insert()|Teodor Sigaev, Oleg Bartunov|reviewers=Thomas Lee, Bruce Momjian}} <br />
{{comment|teodor|updated patch {{messageLink|4849418C.6080909@sigaev.ru|here}}}}<br />
{{comment|thomas.lee|will likely need somebody else to assist with this particular review as I'm relatively new to postgres}}<br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|multicolumn GIN|Teodor Sigaev, Oleg Bartunov|reviewers=Neil Conway}} <br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}. I'd like to commit this patch before GIN fast insert one}}<br />
<br />
{{patch|2e78013d0806092232h6ca15ffejcbcd24e88401308f@mail.gmail.com|VACUUM Improvements - Avoiding second heap scan|Pavan Deolasee|status=WIP}}<br />
<br />
{{patch|36152.64.119.130.186.1213364119.squirrel@mail.mohawksoft.com|SSL configure patch|Mark Woodward|reviewers=Abhijit Menon-Sen|status=Waiting for response}}<br />
<br />
{{patch|48529B83.9010907@sun.com|page macros cleanup|Zdeněk Kotala|reviewers=Pavan Deolasee, Heikki Linnakangas}}<br />
{{comment|Zdeněk|updated patch {{messageLink|486E138F.4060900@sun.com|here}} - ver 04}}<br />
<br />
{{patch|485A3F20.1080907@cybertec.at|posix fadvises|Zoltan Boszormenyi, Greg Stark|reviewers=Greg Smith}}<br />
<br />
{{patch|162867790806230613w5719d25ejd2a03fac84792d18@mail.gmail.com|Custom variadic functions|Pavel Stěhule|reviewers=Jeff Davis}}<br />
{{comment|pavel|updated patch {{messageLink|162867790806240810jc61fd56le8563fe4f7d9a265@mail.gmail.com|here}}}}<br />
<br />
{{patch|4863B3DA.8030209@kaigai.gr.jp|SE-PostgreSQL patches|Kaigai Kohei|reviewers=Peter Eisentraut, Abhijit Menon-Sen}}<br />
<br />
{{patch|1210171029.4268.115.camel@ebony.site|pg_dump additional options for performance|Simon Riggs|reviewers=Stephen Frost}}<br />
<br />
{{patch|20080626214839.xbfsvm8740gsswos@webmail.cecs.pdx.edu|EXPLAIN in XML format|Tom Raney, Germán Poó Caamaño}}<br />
{{comment|David Fetter|updated patch {{messageLink|20080701214825.m2eczeym8404ss4g@webmail.cecs.pdx.edu|here}}}}<br />
{{comment|Simon Riggs|comments posted on-list - IMHO not ready for commit}}<br />
<br />
{{patch|4013F1AE-FE1B-427B-8C23-1A5681DA297E@kineticode.com|CITEXT: A case-insensitive TEXT type|David E. Wheeler|reviewers=Zdeněk Kotala}}<br />
{{comment|David Wheeler|v2 patch {{messageLink|ACEC459B-CB5B-4B71-87FA-55E6A649C17C@kineticode.com|here}}}}<br />
{{comment|David Wheeler|v3 patch {{messageLink|890EA230-DA04-4D65-996F-5E7107690BE8@kineticode.com|here}}}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|Relation forks|Heikki Linnakangas}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|FSM rewrite|Heikki Linnakangas}}<br />
{{comment|Simon Riggs|comments posted on-list with further ideas}}<br />
<br />
{{patch|87wsn82lda.fsf@oxford.xeocode.com|SIGINFO to EXPLAIN queries in progress|status=Proof of Concept|Greg Stark}}<br />
<br />
{{patch|1214865781.3845.525.camel@ebony.site|Hint Bits and Write I/O|Simon Riggs|reviewers=Pavan Deolasee}}<br />
<br />
{{patch|1214901548.3845.544.camel@ebony.site|pg_standby minor changes for Windows|Simon Riggs|reviewers=Martin Zaun}}<br />
<br />
{{patch|20080623150535.946E.52131E4D@oss.ntt.co.jp|executor_hook|Takahiro Itagaki|status=Ready to Commit|reviewers=Simon Riggs}}<br />
{{comment|Simon Riggs|executor_hook.patch looks good. OK to commit}}<br />
{{comment|tgl|don't like exposing ExecutePlan, why not hook at ExecutorRun?}}<br />
<br />
{{patch|de5165440807010758h147d5fw1fc969bc66dcb300@mail.gmail.com|Collation at database level|status=WIP |Radek Strnad }}<br />
<br />
{{patch|48325386.2010506@esilo.com|libpq object hooks|Merlin Moncure, Andrew Chernow|status=WIP}}<br />
<br />
{{patch|BAY102-W42AF4FE40B20BE3C18EE73F2980@phx.gbl|Auto-Explain|Dean Rasheed|reviewers=Simon Riggs}}<br />
{{comment|Simon Riggs|Code location needs review; change proposal made to -hackers}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Committed patches}}<br />
<br />
{{patch|20080512215757.GB22159@fetter.org|psql: \timing as a boolean arg|David Fetter|status=Committed 2008-06-11|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4851FC54.2020708@timbira.com|small typo in DTrace docs|Euler Taveira de Oliveira|status=Committed 2008-06-18|reviewers=Neil Conway}}<br />
{{comment|alvherre|Note another typo "definitons" in patch}}<br />
<br />
{{patch|486101C0.9090006@vector-seven.com|GUC variable to replace PGBE_ACTIVITY_SIZE|Thomas Lee|status=Committed 2008-06-30|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|482CBDB0.7020901@students.mimuw.edu.pl|extend VacAttrStats to allowstavalues of different types|Jan Urbański|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
{{comment|Jan Urbański|updated patch {{messageLink|484418B8.6060004@students.mimuw.edu.pl|here}}}}<br />
<br />
{{patch|20080612091033.40e02ebd@greg-laptop|Better formatting of functions in pg_dump|Greg Sabino Mullane|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4831BD6B.5070109@lelarge.info|multi-version support for psql \d commands|Guillaume Lelarge|status=Committed 2008-07-02|reviewers=Tom Lane}}<br />
{{comment|gleu|updated patch {{messageLink|48333337.4040105@lelarge.info|here}}}}<br />
<br />
{{patch|937d27e10805021354s70b24c0l29f7f18dc0ad0ec9@mail.gmail.com|pg_get_keywords()|Dave Page|status=Committed 2008-07-03|reviewers=Nikhil Sontakke, Tom Lane}} <br />
{{comment|dpage|updated patch {{messageLink|937d27e10805031344n440e5ea5mbff6f5cda9d548f9@mail.gmail.com|here}}}}<br />
{{comment|nikhils|review complete from my side. Updated patch along with comments posted back {{messageLink|d3c4af540807022352w1876fe8fu60c1533396f9eb2c@mail.gmail.com|here}}}}<br />
<br />
{{patch|1215258625.4051.281.camel@ebony.site|pg_standby keepfiles calc bug|Simon Riggs|status=Committed 2008-07-08|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|79636C5E324155408C7FA171@imhotep.credativ.de|Adding variables for segment_size, wal_segment_size and block sizes|Bernd Helmle|reviewers=Abhijit Menon-Sen, Tom Lane|status=Committed 2008-07-10}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Returned with Feedback}}<br />
<br />
{{patch|48649261.5040703@students.mimuw.edu.pl|text search selectivity and dllist enhancements|Jan Urbański|status=Returned for rework|reviewers=Tom Lane}}<br />
{{comment|tgl|review {{messageLink|6233.1215113144@sss.pgh.pa.us|here}} --- data structure could be improved and simplified}}<br />
<br />
{{patch|1214858056.3845.514.camel@ebony.site|Stats Hooks|Simon Riggs|status=Returned for rework|reviewers=Tom Lane}}<br />
{{comment|Josh Berkus|[http://archives.postgresql.org/pgsql-hackers/2008-06/msg00975.php discussion of this patch is here]}}<br />
{{comment|tgl|review {{messageLink|12400.1215716369@sss.pgh.pa.us|here}} --- won't work as submitted}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
== Round Robin Reviewers ==<br />
<br />
{| border="1" cellpadding="1" cellspacing="0" style="width: 60%; border-collapse: collapse; border: 1px solid #ccc; font-size: 90%;"<br />
|- style="background: #eee;"<br />
! width="30%" | Name<br />
!Status<br />
!Patch Count<br />
|-<br />
|Neil Conway || Available || 1<br />
|-<br />
|Jeff Davis || Available || 1<br />
|-<br />
|Pavan Deolasee || Available || 2<br />
|-<br />
|Andrew Dunstan || Vacation || 0<br />
|-<br />
|Stephen Frost || Available || 2<br />
|-<br />
|Álvaro Herrera || Available || 1<br />
|-<br />
|Zdeněk Kotala || Available || 1<br />
|-<br />
|Thomas Lee || Available || 2<br />
|-<br />
|Abhijit Menon-Sen || Available || 3<br />
|-<br />
|Bruce Momjian || Available || 2<br />
|-<br />
|Nikhil Sontakke || Available || 1<br />
|-<br />
|Greg Stark || Available || 1<br />
|-<br />
|Martin Zaun || Available || 1<br />
|}</div>Amshttps://wiki.postgresql.org/index.php?title=CommitFest_2008-07&diff=1725CommitFest 2008-072008-07-11T12:08:46Z<p>Ams: status update</p>
<hr />
<div>= CommitFest =<br />
<br />
This is the page for CommitFest starting July 2008<br />
<br />
{{CommitFestCurrent}}<br />
<br />
{{CommitFestSection|Pending patches}}<br />
<br />
{{patch|20080511113047.GV5673@sonic.net|pg_dump lock timeout|Dave Gould|status=Waiting on response|reviewers=David Fetter, Stephen Frost}}<br />
{{review|20080703013346.GP31154@tamriel.snowman.net|Stephen Frost|Couple of minor changes requested, otherwise good}}<br />
{{comment|Stephen Frost|Dave Gould will make the changes agreed on and provide an update.}}<br />
<br />
{{patch|20080518.205129.86993930.t-ishii@sraoss.co.jp|WITH RECURSIVE|Yoshiyuki Asaba|status=Pending Review|reviewers=David Fetter}}<br />
{{review|20080518.205129.86993930.t-ishii@sraoss.co.jp|David Fetter|Updated with fixes from Yoshiyuki Asaba and Michael Meskes}}<br />
{{comment|alvherre|There's a long TODO list {{messageLink|20080527.101013.85412760.t-ishii@sraoss.co.jp|here}}}}<br />
{{comment|David Fetter|update patch {{messageLink|20080702231101.GH19610@fetter.org|here}}}}<br />
<br />
{{patch|482F955D.1050600@sun.com|DTrace probe additions|Robert Lor|reviewers=Zdeněk Kotala|status=Waiting on response}}<br />
{{comment|Theo Schlossnagle|another {{messageLink|BB9C10CC-EF5F-442F-949F-2D1E9C6F7A85@omniti.com|DTrace probe patch}}}}<br />
{{comment|Zdeněk Kotala|Robert Treat sent {{messageLink|200807022300.19538.xzilla@users.sourceforge.net|merge of DTrace probe patches}}}}<br />
{{comment|Zdeněk Kotala|Review is {{messageLink|486E35CE.4070309@sun.com|there}} and [http://reviewdemo.postgresql.org/r/25/ here]. Merged patch does not work.}}<br />
<br />
{{patch|e51f66da0805141329p3604a350mef9e997a4379b62f@mail.gmail.com|PL/Proxy|Marko Kreen|reviewers=Greg Stark}}<br />
{{comment|David Fetter|Next revision of the {{messageLink|e51f66da0806280636p1c76a953p37eeb72ecfb0b3a8@mail.gmail.com|patch here.}}}}<br />
<br />
{{patch|c2d9e70e0805232156p20dbe2b9j2a88d35f951d625e@mail.gmail.com|Extending permissions on tables to sequences|Jaime Casanova|reviewers=Alvaro Herrera, Abhijit Menon-Sen|status=Waiting for response}}<br />
{{comment|The patch looks OK (modulo some {{messageLink|20080710031140.GA427@toroid.org|nits}}), but do we want the feature at all?}}<br />
<br />
{{patch|162867790806030403r221a6ae3s657f78ad2da9237f@mail.gmail.com|Table function support|Pavel Stěhule|reviewers=Marko Kreen}}<br />
{{review|e51f66da0807090405gd5ad4c3h892986181cf09d7a@mail.gmail.com|Marko Kreen|Minor issues}}<br />
{{comment|Pavel Stehule|update patch {{messageLink|162867790807100638n47c00eb4w304473a247a20aa7@mail.gmail.com|here}}}}<br />
<br />
{{patch|162867790806052206l7f6aae61ha29f6720c66caa6@mail.gmail.com|array_fill function|Pavel Stěhule|reviewers=Thomas Lee, Bruce Momjian}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|GIN fast insert()|Teodor Sigaev, Oleg Bartunov|reviewers=Thomas Lee, Bruce Momjian}} <br />
{{comment|teodor|updated patch {{messageLink|4849418C.6080909@sigaev.ru|here}}}}<br />
{{comment|thomas.lee|will likely need somebody else to assist with this particular review as I'm relatively new to postgres}}<br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|multicolumn GIN|Teodor Sigaev, Oleg Bartunov|reviewers=Neil Conway}} <br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}. I'd like to commit this patch before GIN fast insert one}}<br />
<br />
{{patch|2e78013d0806092232h6ca15ffejcbcd24e88401308f@mail.gmail.com|VACUUM Improvements - Avoiding second heap scan|Pavan Deolasee|status=WIP}}<br />
<br />
{{patch|36152.64.119.130.186.1213364119.squirrel@mail.mohawksoft.com|SSL configure patch|Mark Woodward|reviewers=Abhijit Menon-Sen|status=Waiting for response}}<br />
<br />
{{patch|48529B83.9010907@sun.com|page macros cleanup|Zdeněk Kotala|reviewers=Pavan Deolasee, Heikki Linnakangas}}<br />
{{comment|Zdeněk|updated patch {{messageLink|486E138F.4060900@sun.com|here}} - ver 04}}<br />
<br />
{{patch|485A3F20.1080907@cybertec.at|posix fadvises|Zoltan Boszormenyi, Greg Stark|reviewers=Greg Smith}}<br />
<br />
{{patch|162867790806230613w5719d25ejd2a03fac84792d18@mail.gmail.com|Custom variadic functions|Pavel Stěhule|reviewers=Jeff Davis}}<br />
{{comment|pavel|updated patch {{messageLink|162867790806240810jc61fd56le8563fe4f7d9a265@mail.gmail.com|here}}}}<br />
<br />
{{patch|4863B3DA.8030209@kaigai.gr.jp|SE-PostgreSQL patches|Kaigai Kohei|reviewers=Peter Eisentraut, Abhijit Menon-Sen}}<br />
<br />
{{patch|1210171029.4268.115.camel@ebony.site|pg_dump additional options for performance|Simon Riggs|reviewers=Stephen Frost}}<br />
<br />
{{patch|20080626214839.xbfsvm8740gsswos@webmail.cecs.pdx.edu|EXPLAIN in XML format|Tom Raney, Germán Poó Caamaño}}<br />
{{comment|David Fetter|updated patch {{messageLink|20080701214825.m2eczeym8404ss4g@webmail.cecs.pdx.edu|here}}}}<br />
{{comment|Simon Riggs|comments posted on-list - IMHO not ready for commit}}<br />
<br />
{{patch|4013F1AE-FE1B-427B-8C23-1A5681DA297E@kineticode.com|CITEXT: A case-insensitive TEXT type|David E. Wheeler|reviewers=Zdeněk Kotala}}<br />
{{comment|David Wheeler|v2 patch {{messageLink|ACEC459B-CB5B-4B71-87FA-55E6A649C17C@kineticode.com|here}}}}<br />
{{comment|David Wheeler|v3 patch {{messageLink|890EA230-DA04-4D65-996F-5E7107690BE8@kineticode.com|here}}}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|Relation forks|Heikki Linnakangas}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|FSM rewrite|Heikki Linnakangas}}<br />
{{comment|Simon Riggs|comments posted on-list with further ideas}}<br />
<br />
{{patch|87wsn82lda.fsf@oxford.xeocode.com|SIGINFO to EXPLAIN queries in progress|status=Proof of Concept|Greg Stark}}<br />
<br />
{{patch|1214865781.3845.525.camel@ebony.site|Hint Bits and Write I/O|Simon Riggs|reviewers=Pavan Deolasee}}<br />
<br />
{{patch|1214901548.3845.544.camel@ebony.site|pg_standby minor changes for Windows|Simon Riggs|reviewers=Martin Zaun}}<br />
<br />
{{patch|20080623150535.946E.52131E4D@oss.ntt.co.jp|executor_hook|Takahiro Itagaki|status=Ready to Commit|reviewers=Simon Riggs}}<br />
{{comment|Simon Riggs|executor_hook.patch looks good. OK to commit}}<br />
{{comment|tgl|don't like exposing ExecutePlan, why not hook at ExecutorRun?}}<br />
<br />
{{patch|de5165440807010758h147d5fw1fc969bc66dcb300@mail.gmail.com|Collation at database level|status=WIP |Radek Strnad }}<br />
<br />
{{patch|48325386.2010506@esilo.com|libpq object hooks|Merlin Moncure, Andrew Chernow|status=WIP}}<br />
<br />
{{patch|BAY102-W42AF4FE40B20BE3C18EE73F2980@phx.gbl|Auto-Explain|Dean Rasheed|reviewers=Simon Riggs}}<br />
{{comment|Simon Riggs|Code location needs review; change proposal made to -hackers}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Committed patches}}<br />
<br />
{{patch|20080512215757.GB22159@fetter.org|psql: \timing as a boolean arg|David Fetter|status=Committed 2008-06-11|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4851FC54.2020708@timbira.com|small typo in DTrace docs|Euler Taveira de Oliveira|status=Committed 2008-06-18|reviewers=Neil Conway}}<br />
{{comment|alvherre|Note another typo "definitons" in patch}}<br />
<br />
{{patch|486101C0.9090006@vector-seven.com|GUC variable to replace PGBE_ACTIVITY_SIZE|Thomas Lee|status=Committed 2008-06-30|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|482CBDB0.7020901@students.mimuw.edu.pl|extend VacAttrStats to allowstavalues of different types|Jan Urbański|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
{{comment|Jan Urbański|updated patch {{messageLink|484418B8.6060004@students.mimuw.edu.pl|here}}}}<br />
<br />
{{patch|20080612091033.40e02ebd@greg-laptop|Better formatting of functions in pg_dump|Greg Sabino Mullane|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4831BD6B.5070109@lelarge.info|multi-version support for psql \d commands|Guillaume Lelarge|status=Committed 2008-07-02|reviewers=Tom Lane}}<br />
{{comment|gleu|updated patch {{messageLink|48333337.4040105@lelarge.info|here}}}}<br />
<br />
{{patch|937d27e10805021354s70b24c0l29f7f18dc0ad0ec9@mail.gmail.com|pg_get_keywords()|Dave Page|status=Committed 2008-07-03|reviewers=Nikhil Sontakke, Tom Lane}} <br />
{{comment|dpage|updated patch {{messageLink|937d27e10805031344n440e5ea5mbff6f5cda9d548f9@mail.gmail.com|here}}}}<br />
{{comment|nikhils|review complete from my side. Updated patch along with comments posted back {{messageLink|d3c4af540807022352w1876fe8fu60c1533396f9eb2c@mail.gmail.com|here}}}}<br />
<br />
{{patch|1215258625.4051.281.camel@ebony.site|pg_standby keepfiles calc bug|Simon Riggs|status=Committed 2008-07-08|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|79636C5E324155408C7FA171@imhotep.credativ.de|Adding variables for segment_size, wal_segment_size and block sizes|Bernd Helmle|reviewers=Abhijit Menon-Sen, Tom Lane|status=Committed 2008-07-10}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Returned with Feedback}}<br />
<br />
{{patch|48649261.5040703@students.mimuw.edu.pl|text search selectivity and dllist enhancements|Jan Urbański|status=Returned for rework|reviewers=Tom Lane}}<br />
{{comment|tgl|review {{messageLink|6233.1215113144@sss.pgh.pa.us|here}} --- data structure could be improved and simplified}}<br />
<br />
{{patch|1214858056.3845.514.camel@ebony.site|Stats Hooks|Simon Riggs|status=Returned for rework|reviewers=Tom Lane}}<br />
{{comment|Josh Berkus|[http://archives.postgresql.org/pgsql-hackers/2008-06/msg00975.php discussion of this patch is here]}}<br />
{{comment|tgl|review {{messageLink|12400.1215716369@sss.pgh.pa.us|here}} --- won't work as submitted}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
== Round Robin Reviewers ==<br />
<br />
{| border="1" cellpadding="1" cellspacing="0" style="width: 60%; border-collapse: collapse; border: 1px solid #ccc; font-size: 90%;"<br />
|- style="background: #eee;"<br />
! width="30%" | Name<br />
!Status<br />
!Patch Count<br />
|-<br />
|Neil Conway || Available || 1<br />
|-<br />
|Jeff Davis || Available || 1<br />
|-<br />
|Pavan Deolasee || Available || 2<br />
|-<br />
|Andrew Dunstan || Vacation || 0<br />
|-<br />
|Stephen Frost || Available || 2<br />
|-<br />
|Álvaro Herrera || Available || 1<br />
|-<br />
|Zdeněk Kotala || Available || 1<br />
|-<br />
|Thomas Lee || Available || 2<br />
|-<br />
|Abhijit Menon-Sen || Available || 3<br />
|-<br />
|Bruce Momjian || Available || 2<br />
|-<br />
|Nikhil Sontakke || Available || 1<br />
|-<br />
|Greg Stark || Available || 1<br />
|-<br />
|Martin Zaun || Available || 1<br />
|}</div>Amshttps://wiki.postgresql.org/index.php?title=CommitFest_2008-07&diff=1694CommitFest 2008-072008-07-08T19:20:20Z<p>Ams: update status</p>
<hr />
<div>= CommitFest =<br />
<br />
This is the page for CommitFest starting July 2008<br />
<br />
{{CommitFestCurrent}}<br />
<br />
{{CommitFestSection|Pending patches}}<br />
<br />
{{patch|20080511113047.GV5673@sonic.net|pg_dump lock timeout|Dave Gould|status=WIP|reviewers=David Fetter, Stephen Frost}}<br />
{{review|20080703013346.GP31154@tamriel.snowman.net|Stephen Frost|Couple of minor changes requested, otherwise good}}<br />
{{comment|Stephen Frost|Dave Gould will make the changes agreed on and provide an update.}}<br />
<br />
{{patch|20080518.205129.86993930.t-ishii@sraoss.co.jp|WITH RECURSIVE|Yoshiyuki Asaba|status=Pending Review|reviewers=David Fetter}}<br />
{{review|20080518.205129.86993930.t-ishii@sraoss.co.jp|David Fetter|Updated with fixes from Yoshiyuki Asaba and Michael Meskes}}<br />
{{comment|alvherre|There's a long TODO list {{messageLink|20080527.101013.85412760.t-ishii@sraoss.co.jp|here}}}}<br />
{{comment|David Fetter|update patch {{messageLink|20080702231101.GH19610@fetter.org|here}}}}<br />
<br />
{{patch|482F955D.1050600@sun.com|DTrace probe additions|Robert Lor|reviewers=Zdeněk Kotala|status=Waiting on response}}<br />
{{comment|Theo Schlossnagle|another {{messageLink|BB9C10CC-EF5F-442F-949F-2D1E9C6F7A85@omniti.com|DTrace probe patch}}}}<br />
{{comment|Zdeněk Kotala|Robert Treat sent {{messageLink|200807022300.19538.xzilla@users.sourceforge.net|merge of DTrace probe patches}}}}<br />
{{comment|Zdeněk Kotala|Review is {{messageLink|486E35CE.4070309@sun.com|there}} and [http://reviewdemo.postgresql.org/r/25/ here]. Merged patch does not work.}}<br />
<br />
{{patch|e51f66da0805141329p3604a350mef9e997a4379b62f@mail.gmail.com|PL/Proxy|Marko Kreen|reviewers=Greg Stark}}<br />
{{comment|David Fetter|Next revision of the {{messageLink|e51f66da0806280636p1c76a953p37eeb72ecfb0b3a8@mail.gmail.com|patch here.}}}}<br />
<br />
{{patch|c2d9e70e0805232156p20dbe2b9j2a88d35f951d625e@mail.gmail.com|Extending permissions on tables to sequences|Jaime Casanova|reviewers=Alvaro Herrera}}<br />
<br />
{{patch|162867790806030403r221a6ae3s657f78ad2da9237f@mail.gmail.com|Table function support|Pavel Stěhule|reviewers=Marko Kreen}}<br />
<br />
{{patch|162867790806052206l7f6aae61ha29f6720c66caa6@mail.gmail.com|array_fill function|Pavel Stěhule|reviewers=Thomas Lee, Bruce Momjian}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|GIN fast insert()|Teodor Sigaev, Oleg Bartunov|reviewers=Thomas Lee, Bruce Momjian}} <br />
{{comment|teodor|updated patch {{messageLink|4849418C.6080909@sigaev.ru|here}}}}<br />
{{comment|thomas.lee|will likely need somebody else to assist with this particular review as I'm relatively new to postgres}}<br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|multicolumn GIN|Teodor Sigaev, Oleg Bartunov|reviewers=Neil Conway}} <br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}. I'd like to commit this patch before GIN fast insert one}}<br />
<br />
{{patch|2e78013d0806092232h6ca15ffejcbcd24e88401308f@mail.gmail.com|VACUUM Improvements - Avoiding second heap scan|Pavan Deolasee|status=WIP}}<br />
<br />
{{patch|36152.64.119.130.186.1213364119.squirrel@mail.mohawksoft.com|SSL configure patch|Mark Woodward|reviewers=Abhijit Menon-Sen|status=Waiting for response}}<br />
<br />
{{patch|48529B83.9010907@sun.com|page macros cleanup|Zdeněk Kotala|reviewers=Pavan Deolasee, Heikki Linnakangas}}<br />
{{comment|Zdeněk|updated patch {{messageLink|486E138F.4060900@sun.com|here}} - ver 04}}<br />
<br />
{{patch|485A3F20.1080907@cybertec.at|posix fadvises|Zoltan Boszormenyi, Greg Stark|reviewers=Greg Smith}}<br />
<br />
{{patch|162867790806230613w5719d25ejd2a03fac84792d18@mail.gmail.com|Custom variadic functions|Pavel Stěhule}}<br />
{{comment|pavel|updated patch {{messageLink|162867790806240810jc61fd56le8563fe4f7d9a265@mail.gmail.com|here}}}}<br />
<br />
{{patch|4863B3DA.8030209@kaigai.gr.jp|SE-PostgreSQL patches|Kaigai Kohei}}<br />
<br />
{{patch|1210171029.4268.115.camel@ebony.site|pg_dump additional options for performance|Simon Riggs|reviewers=Stephen Frost}}<br />
<br />
{{patch|20080626214839.xbfsvm8740gsswos@webmail.cecs.pdx.edu|EXPLAIN in XML format|Tom Raney, Germán Poó Caamaño}}<br />
{{comment|David Fetter|updated patch {{messageLink|20080701214825.m2eczeym8404ss4g@webmail.cecs.pdx.edu|here}}}}<br />
{{comment|Simon Riggs|comments posted on-list - IMHO not ready for commit}}<br />
<br />
{{patch|4013F1AE-FE1B-427B-8C23-1A5681DA297E@kineticode.com|CITEXT: A case-insensitive TEXT type|David E. Wheeler|reviewers=Zdeněk Kotala}}<br />
{{comment|David Wheeler|v2 patch {{messageLink|ACEC459B-CB5B-4B71-87FA-55E6A649C17C@kineticode.com|here}}}}<br />
{{comment|David Wheeler|v3 patch {{messageLink|890EA230-DA04-4D65-996F-5E7107690BE8@kineticode.com|here}}}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|Relation forks|Heikki Linnakangas}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|FSM rewrite|Heikki Linnakangas}}<br />
{{comment|Simon Riggs|comments posted on-list with further ideas}}<br />
<br />
{{patch|87wsn82lda.fsf@oxford.xeocode.com|SIGINFO to EXPLAIN queries in progress|status=Proof of Concept|Greg Stark}}<br />
<br />
{{patch|1214858056.3845.514.camel@ebony.site|Stats Hooks|Simon Riggs}}<br />
<br />
{{patch|1214865781.3845.525.camel@ebony.site|Hint Bits and Write I/O|Simon Riggs|reviewers=Pavan Deolasee}}<br />
<br />
{{patch|1214901548.3845.544.camel@ebony.site|pg_standby minor changes for Windows|Simon Riggs|reviewers=Martin Zaun}}<br />
<br />
{{patch|20080623150535.946E.52131E4D@oss.ntt.co.jp|executor_hook|Takahiro Itagaki|reviewers=Simon Riggs}}<br />
{{comment|Simon Riggs|executor_hook.patch looks good. OK to commit}}<br />
<br />
{{patch|de5165440807010758h147d5fw1fc969bc66dcb300@mail.gmail.com|Collation at database level|status=WIP |Radek Strnad }}<br />
<br />
{{patch|48325386.2010506@esilo.com|libpq object hooks|Merlin Moncure, Andrew Chernow|status=WIP}}<br />
<br />
{{patch|BAY102-W42AF4FE40B20BE3C18EE73F2980@phx.gbl|Auto-Explain|Dean Rasheed|reviewers=Simon Riggs}}<br />
{{comment|Simon Riggs|Code location needs review; change proposal made to -hackers}}<br />
<br />
{{patch|79636C5E324155408C7FA171@imhotep.credativ.de|Adding variables for segment_size, wal_segment_size and block sizes|Bernd Helmle|reviewers=Abhijit Menon-Sen|status=Waiting for response}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Committed patches}}<br />
<br />
{{patch|20080512215757.GB22159@fetter.org|psql: \timing as a boolean arg|David Fetter|status=Committed 2008-06-11|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4851FC54.2020708@timbira.com|small typo in DTrace docs|Euler Taveira de Oliveira|status=Committed 2008-06-18|reviewers=Neil Conway}}<br />
{{comment|alvherre|Note another typo "definitons" in patch}}<br />
<br />
{{patch|486101C0.9090006@vector-seven.com|GUC variable to replace PGBE_ACTIVITY_SIZE|Thomas Lee|status=Committed 2008-06-30|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|482CBDB0.7020901@students.mimuw.edu.pl|extend VacAttrStats to allowstavalues of different types|Jan Urbański|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
{{comment|Jan Urbański|updated patch {{messageLink|484418B8.6060004@students.mimuw.edu.pl|here}}}}<br />
<br />
{{patch|20080612091033.40e02ebd@greg-laptop|Better formatting of functions in pg_dump|Greg Sabino Mullane|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4831BD6B.5070109@lelarge.info|multi-version support for psql \d commands|Guillaume Lelarge|status=Committed 2008-07-02|reviewers=Tom Lane}}<br />
{{comment|gleu|updated patch {{messageLink|48333337.4040105@lelarge.info|here}}}}<br />
<br />
{{patch|937d27e10805021354s70b24c0l29f7f18dc0ad0ec9@mail.gmail.com|pg_get_keywords()|Dave Page|status=Committed 2008-07-03|reviewers=Nikhil Sontakke, Tom Lane}} <br />
{{comment|dpage|updated patch {{messageLink|937d27e10805031344n440e5ea5mbff6f5cda9d548f9@mail.gmail.com|here}}}}<br />
{{comment|nikhils|review complete from my side. Updated patch along with comments posted back {{messageLink|d3c4af540807022352w1876fe8fu60c1533396f9eb2c@mail.gmail.com|here}}}}<br />
<br />
{{patch|1215258625.4051.281.camel@ebony.site|pg_standby keepfiles calc bug|Simon Riggs|status=Committed 2008-07-08|reviewers=Heikki Linnakangas}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Returned with Feedback}}<br />
<br />
{{patch|48649261.5040703@students.mimuw.edu.pl|text search selectivity and dllist enhancements|Jan Urbański|status=Returned for rework|reviewers=Tom Lane}}<br />
{{comment|tgl|review {{messageLink|6233.1215113144@sss.pgh.pa.us|here}} --- data structure could be improved and simplified}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
== Round Robin Reviewers ==<br />
<br />
{| border="1" cellpadding="1" cellspacing="0" style="width: 60%; border-collapse: collapse; border: 1px solid #ccc; font-size: 90%;"<br />
|- style="background: #eee;"<br />
! width="30%" | Name<br />
!Status<br />
!Patch Count<br />
|-<br />
|Neil Conway || Available || 1<br />
|-<br />
|Pavan Deolasee || Available || 2<br />
|-<br />
|Andrew Dunstan || Vacation || 0<br />
|-<br />
|Stephen Frost || Available || 2<br />
|-<br />
|Álvaro Herrera || Available || 1<br />
|-<br />
|Zdeněk Kotala || Available || 1<br />
|-<br />
|Thomas Lee || Available || 2<br />
|-<br />
|Abhijit Menon-Sen || Available || 2<br />
|-<br />
|Bruce Momjian || Available || 2<br />
|-<br />
|Nikhil Sontakke || Available || 1<br />
|-<br />
|Greg Stark || Available || 1<br />
|-<br />
|Martin Zaun || Available || 1<br />
|}</div>Amshttps://wiki.postgresql.org/index.php?title=CommitFest_2008-07&diff=1686CommitFest 2008-072008-07-08T03:59:23Z<p>Ams: review posted</p>
<hr />
<div>= CommitFest =<br />
<br />
This is the page for CommitFest starting July 2008<br />
<br />
{{CommitFestCurrent}}<br />
<br />
{{CommitFestSection|Pending patches}}<br />
<br />
{{patch|20080511113047.GV5673@sonic.net|pg_dump lock timeout|Dave Gould|status=WIP|reviewers=David Fetter, Stephen Frost}}<br />
{{review|20080703013346.GP31154@tamriel.snowman.net|Stephen Frost|Couple of minor changes requested, otherwise good}}<br />
{{comment|Stephen Frost|Dave Gould will make the changes agreed on and provide an update.}}<br />
<br />
{{patch|20080518.205129.86993930.t-ishii@sraoss.co.jp|WITH RECURSIVE|Yoshiyuki Asaba|status=Pending Review|reviewers=David Fetter}}<br />
{{review|20080518.205129.86993930.t-ishii@sraoss.co.jp|David Fetter|Updated with fixes from Yoshiyuki Asaba and Michael Meskes}}<br />
{{comment|alvherre|There's a long TODO list {{messageLink|20080527.101013.85412760.t-ishii@sraoss.co.jp|here}}}}<br />
{{comment|David Fetter|update patch {{messageLink|20080702231101.GH19610@fetter.org|here}}}}<br />
<br />
{{patch|482F955D.1050600@sun.com|DTrace probe additions|Robert Lor|reviewers=Zdeněk Kotala}}<br />
{{comment|Theo Schlossnagle|another {{messageLink|BB9C10CC-EF5F-442F-949F-2D1E9C6F7A85@omniti.com|DTrace probe patch}}}}<br />
{{comment|Zdeněk Kotala|Robert Treat sent {{messageLink|200807022300.19538.xzilla@users.sourceforge.net|merge of DTrace probe patches}}}}<br />
<br />
{{patch|e51f66da0805141329p3604a350mef9e997a4379b62f@mail.gmail.com|PL/Proxy|Marko Kreen|reviewers=Greg Stark}}<br />
{{comment|David Fetter|Next revision of the {{messageLink|e51f66da0806280636p1c76a953p37eeb72ecfb0b3a8@mail.gmail.com|patch here.}}}}<br />
<br />
{{patch|c2d9e70e0805232156p20dbe2b9j2a88d35f951d625e@mail.gmail.com|Extending permissions on tables to sequences|Jaime Casanova|reviewers=Alvaro Herrera}}<br />
<br />
{{patch|162867790806030403r221a6ae3s657f78ad2da9237f@mail.gmail.com|Table function support|Pavel Stěhule|reviewers=Marko Kreen}}<br />
<br />
{{patch|162867790806052206l7f6aae61ha29f6720c66caa6@mail.gmail.com|array_fill function|Pavel Stěhule|reviewers=Thomas Lee, Bruce Momjian}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|GIN fast insert()|Teodor Sigaev, Oleg Bartunov|reviewers=Thomas Lee, Bruce Momjian}} <br />
{{comment|teodor|updated patch {{messageLink|4849418C.6080909@sigaev.ru|here}}}}<br />
{{comment|thomas.lee|will likely need somebody else to assist with this particular review as I'm relatively new to postgres}}<br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}}}<br />
<br />
{{patch|483FD205.4090904@sigaev.ru|multicolumn GIN|Teodor Sigaev, Oleg Bartunov|reviewers=Neil Conway}} <br />
{{comment|teodor|synced with CVS patch {{messageLink|486BC501.6060605@sigaev.ru|here}}. I'd like to commit this patch before GIN fast insert one}}<br />
<br />
{{patch|2e78013d0806092232h6ca15ffejcbcd24e88401308f@mail.gmail.com|VACUUM Improvements - Avoiding second heap scan|Pavan Deolasee|status=WIP}}<br />
<br />
{{patch|36152.64.119.130.186.1213364119.squirrel@mail.mohawksoft.com|SSL configure patch|Mark Woodward|reviewers=Abhijit Menon-Sen}}<br />
<br />
{{patch|48529B83.9010907@sun.com|page macros cleanup|Zdeněk Kotala|reviewers=Pavan Deolasee}}<br />
{{comment|Zdeněk|updated patch {{messageLink|486E138F.4060900@sun.com|here}} - ver 04}}<br />
<br />
{{patch|485A3F20.1080907@cybertec.at|posix fadvises|Zoltan Boszormenyi, Greg Stark|reviewers=Greg Smith}}<br />
<br />
{{patch|162867790806230613w5719d25ejd2a03fac84792d18@mail.gmail.com|Custom variadic functions|Pavel Stěhule}}<br />
{{comment|pavel|updated patch {{messageLink|162867790806240810jc61fd56le8563fe4f7d9a265@mail.gmail.com|here}}}}<br />
<br />
{{patch|4863B3DA.8030209@kaigai.gr.jp|SE-PostgreSQL patches|Kaigai Kohei}}<br />
<br />
{{patch|1210171029.4268.115.camel@ebony.site|pg_dump additional options for performance|Simon Riggs|reviewers=Stephen Frost}}<br />
<br />
{{patch|20080626214839.xbfsvm8740gsswos@webmail.cecs.pdx.edu|EXPLAIN in XML format|Tom Raney, Germán Poó Caamaño}}<br />
{{comment|David Fetter|updated patch {{messageLink|20080701214825.m2eczeym8404ss4g@webmail.cecs.pdx.edu|here}}}}<br />
{{comment|Simon Riggs|comments posted on-list - IMHO not ready for commit}}<br />
<br />
{{patch|4013F1AE-FE1B-427B-8C23-1A5681DA297E@kineticode.com|CITEXT: A case-insensitive TEXT type|David E. Wheeler|reviewers=Zdeněk Kotala}}<br />
{{comment|David Wheeler|v2 patch {{messageLink|ACEC459B-CB5B-4B71-87FA-55E6A649C17C@kineticode.com|here}}}}<br />
{{comment|David Wheeler|v3 patch {{messageLink|890EA230-DA04-4D65-996F-5E7107690BE8@kineticode.com|here}}}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|Relation forks|Heikki Linnakangas}}<br />
<br />
{{patch|48651722.8020600@enterprisedb.com|FSM rewrite|Heikki Linnakangas}}<br />
{{comment|Simon Riggs|comments posted on-list with further ideas}}<br />
<br />
{{patch|87wsn82lda.fsf@oxford.xeocode.com|SIGINFO to EXPLAIN queries in progress|status=Proof of Concept|Greg Stark}}<br />
<br />
{{patch|1214858056.3845.514.camel@ebony.site|Stats Hooks|Simon Riggs}}<br />
<br />
{{patch|1214865781.3845.525.camel@ebony.site|Hint Bits and Write I/O|Simon Riggs|reviewers=Pavan Deolasee}}<br />
<br />
{{patch|1214901548.3845.544.camel@ebony.site|pg_standby minor changes for Windows|Simon Riggs|reviewers=Martin Zaun}}<br />
<br />
{{patch|20080623150535.946E.52131E4D@oss.ntt.co.jp|executor_hook|Takahiro Itagaki|reviewers=Simon Riggs}}<br />
{{comment|Simon Riggs|executor_hook.patch looks good. OK to commit}}<br />
<br />
{{patch|de5165440807010758h147d5fw1fc969bc66dcb300@mail.gmail.com|Collation at database level|status=WIP |Radek Strnad }}<br />
<br />
{{patch|48325386.2010506@esilo.com|libpq object hooks|Merlin Moncure, Andrew Chernow|status=WIP}}<br />
<br />
{{patch|BAY102-W42AF4FE40B20BE3C18EE73F2980@phx.gbl|Auto-Explain|Dean Rasheed|reviewers=Simon Riggs}}<br />
<br />
{{patch|1215258625.4051.281.camel@ebony.site|pg_standby keepfiles calc bug|Simon Riggs}}<br />
<br />
{{patch|79636C5E324155408C7FA171@imhotep.credativ.de|Adding variables for segment_size, wal_segment_size and block sizes|Bernd Helmle|reviewers=Abhijit Menon-Sen}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Committed patches}}<br />
<br />
{{patch|20080512215757.GB22159@fetter.org|psql: \timing as a boolean arg|David Fetter|status=Committed 2008-06-11|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4851FC54.2020708@timbira.com|small typo in DTrace docs|Euler Taveira de Oliveira|status=Committed 2008-06-18|reviewers=Neil Conway}}<br />
{{comment|alvherre|Note another typo "definitons" in patch}}<br />
<br />
{{patch|486101C0.9090006@vector-seven.com|GUC variable to replace PGBE_ACTIVITY_SIZE|Thomas Lee|status=Committed 2008-06-30|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|482CBDB0.7020901@students.mimuw.edu.pl|extend VacAttrStats to allowstavalues of different types|Jan Urbański|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
{{comment|Jan Urbański|updated patch {{messageLink|484418B8.6060004@students.mimuw.edu.pl|here}}}}<br />
<br />
{{patch|20080612091033.40e02ebd@greg-laptop|Better formatting of functions in pg_dump|Greg Sabino Mullane|status=Committed 2008-07-01|reviewers=Heikki Linnakangas}}<br />
<br />
{{patch|4831BD6B.5070109@lelarge.info|multi-version support for psql \d commands|Guillaume Lelarge|status=Committed 2008-07-02|reviewers=Tom Lane}}<br />
{{comment|gleu|updated patch {{messageLink|48333337.4040105@lelarge.info|here}}}}<br />
<br />
{{patch|937d27e10805021354s70b24c0l29f7f18dc0ad0ec9@mail.gmail.com|pg_get_keywords()|Dave Page|status=Committed 2008-07-03|reviewers=Nikhil Sontakke, Tom Lane}} <br />
{{comment|dpage|updated patch {{messageLink|937d27e10805031344n440e5ea5mbff6f5cda9d548f9@mail.gmail.com|here}}}}<br />
{{comment|nikhils|review complete from my side. Updated patch along with comments posted back {{messageLink|d3c4af540807022352w1876fe8fu60c1533396f9eb2c@mail.gmail.com|here}}}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
{{CommitFestSection|Returned with Feedback}}<br />
<br />
{{patch|48649261.5040703@students.mimuw.edu.pl|text search selectivity and dllist enhancements|Jan Urbański|status=Returned for rework|reviewers=Tom Lane}}<br />
{{comment|tgl|review {{messageLink|6233.1215113144@sss.pgh.pa.us|here}} --- data structure could be improved and simplified}}<br />
<br />
{{CommitFestEndSection}}<br />
<br />
== Round Robin Reviewers ==<br />
<br />
{| border="1" cellpadding="1" cellspacing="0" style="width: 60%; border-collapse: collapse; border: 1px solid #ccc; font-size: 90%;"<br />
|- style="background: #eee;"<br />
! width="30%" | Name<br />
!Status<br />
!Patch Count<br />
|-<br />
|Neil Conway || Available || 1<br />
|-<br />
|Pavan Deolasee || Available || 2<br />
|-<br />
|Andrew Dunstan || Vacation || 0<br />
|-<br />
|Stephen Frost || Available || 2<br />
|-<br />
|Álvaro Herrera || Available || 1<br />
|-<br />
|Zdeněk Kotala || Available || 1<br />
|-<br />
|Thomas Lee || Available || 2<br />
|-<br />
|Abhijit Menon-Sen || Available || 2<br />
|-<br />
|Bruce Momjian || Available || 2<br />
|-<br />
|Nikhil Sontakke || Available || 1<br />
|-<br />
|Greg Stark || Available || 1<br />
|-<br />
|Martin Zaun || Available || 1<br />
|}</div>Amshttps://wiki.postgresql.org/index.php?title=Tuning_Your_PostgreSQL_Server&diff=1557Tuning Your PostgreSQL Server2008-06-28T02:27:50Z<p>Ams: /* autovacuum fsm-s */</p>
<hr />
<div>'''This page is still under construction''' (insert animated digging guy picture here)<br />
<br />
PostgreSQL ships with a basic configuration tuned for wide compatibility rather than performance. Odds are good the default parameters are very undersized for your system. Rather than get dragged into the details of everything you should eventually know, here we're going to sprint through a simplified view of the basics, with a look at the most common performance-related things people new to PostgreSQL aren't aware of.<br />
<br />
=listen_address max_connections=<br />
<br />
listen_address = '*'<br />
<br />
max_connections drives the memory parameters below.<br />
<br />
=Set shared_buffers and effective_cache_size based on total memory=<br />
The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL use for caching data. The defaults are low because on some platforms (like older Solaris versions and SGI) having large values requires invasive action like recompiling the kernel. If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. If you have less ram you'll have to account more carefully for how much RAM the OS is taking up, closer to 15% is more typical there. <br />
<br />
Note that on Windows, large values for shared_buffers aren't as effective, and you may find better results keeping it relatively low and using the OS cache more instead.<br />
<br />
It's likely you will have to increase the amount of memory your operating system allows you to allocate at once to set the value for shared_buffers this high, where instead get a message like this:<br />
<br />
<code><pre><br />
IpcMemoryCreate: shmget(key=5432001, size=415776768, 03600) failed: Invalid argument <br />
<br />
This error usually means that PostgreSQL's request for a shared memory <br />
segment exceeded your kernel's SHMMAX parameter. You can either <br />
reduce the request size or reconfigure the kernel with larger SHMMAX. <br />
To reduce the request size (currently 415776768 bytes), reduce <br />
PostgreSQL's shared_buffers parameter (currently 50000) and/or <br />
its max_connections parameter (currently 12).<br />
</pre></code><br />
<br />
See [http://www.postgresql.org/docs/current/static/kernel-resources.html Managing Kernel Resources] for details on how to correct this. <br />
<br />
effective_cache_size should be set to how much memory is leftover for disk caching after taking into account what's used by the operating system, dedicated PostgreSQL memory, and other applications. If it's set too low, indexes may not be used for executing queries the way you'd expect. Setting effective_cache_size to 1/2 of total memory would be a normal conservative setting. You might find a better estimate by looking at your operating system's statistics. On UNIX-like systems, add the free+cached numbers from free or top. On Windows see the "System Cache" in the Windows Task Manager's Performance tab.<br />
<br />
<br />
=checkpoint_segments checkpoint_completion_target=<br />
<br />
<br />
<br />
=autovacuum fsm-s=<br />
<br />
Increase the value of max_fsm_pages and max_fsm_relations as needed<br />
<br />
The Free Space Map is used to track where there are dead tuples that may be reclaimed. You will only get effective nonblocking VACUUM queries if the dead tuples can be listed in the Free Space Map.<br />
<br />
As a result, if you do not plan to run VACUUM frequently, and if you expect a lot of updates, you should ensure these values are usefully large. It should be easy enough to set max_fsm_relations high enough; the problem that will more typically occur is when max_fsm_pages is not set high enough. Once the Free Space Map is full, VACUUM will be unable to track further dead pages. In a busy database, this needs to be set much higher than 1000...<br />
<br />
If you run VACUUM VERBOSE on your database, it'll tell you how many pages and relations are in use (and, under 8.3, what the current limits are). For example,<br />
<br />
<pre><br />
INFO: free space map contains 5293 pages in 214 relations<br />
DETAIL: A total of 8528 page slots are in use (including overhead).<br />
8528 page slots are required to track all free space.<br />
Current limits are: 204800 page slots, 1000 relations, using 1265 kB.<br />
</pre><br />
<br />
=logging=<br />
<br />
=default_stats_target=<br />
<br />
=work_mem maintainance_work_mem=<br />
<br />
If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents.<br />
<br />
This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem. You need to consider what you set max_connections to in order to size this parameter correction. This is where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory.<br />
<br />
maintainance_work_mem is used for operations like vacuum. Using extremely large values here doesn't help very much, and because you essentially need to reserve that memory for when vacuum kicks in that takes it away from more useful purposes. Something in the 256MB range as anecdotally been a reasonable large setting here.<br />
<br />
=wal_sync_method wal_buffers=<br />
<br />
=constraint_exclusion max_prepared_transactions=<br />
<br />
=synchronous_commit=<br />
<br />
=random_page_cost=<br />
If you have particularly fast disks, as commonly found with RAID arrays of SCSI disks, it may be appropriate to lower random_page_cost, which will encourage the query optimizer to use random access index scans.</div>Amshttps://wiki.postgresql.org/index.php?title=Tuning_Your_PostgreSQL_Server&diff=1556Tuning Your PostgreSQL Server2008-06-28T02:26:07Z<p>Ams: example vacuum verbose output</p>
<hr />
<div>'''This page is still under construction''' (insert animated digging guy picture here)<br />
<br />
PostgreSQL ships with a basic configuration tuned for wide compatibility rather than performance. Odds are good the default parameters are very undersized for your system. Rather than get dragged into the details of everything you should eventually know, here we're going to sprint through a simplified view of the basics, with a look at the most common performance-related things people new to PostgreSQL aren't aware of.<br />
<br />
=listen_address max_connections=<br />
<br />
listen_address = '*'<br />
<br />
max_connections drives the memory parameters below.<br />
<br />
=Set shared_buffers and effective_cache_size based on total memory=<br />
The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL use for caching data. The defaults are low because on some platforms (like older Solaris versions and SGI) having large values requires invasive action like recompiling the kernel. If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. If you have less ram you'll have to account more carefully for how much RAM the OS is taking up, closer to 15% is more typical there. <br />
<br />
Note that on Windows, large values for shared_buffers aren't as effective, and you may find better results keeping it relatively low and using the OS cache more instead.<br />
<br />
It's likely you will have to increase the amount of memory your operating system allows you to allocate at once to set the value for shared_buffers this high, where instead get a message like this:<br />
<br />
<code><pre><br />
IpcMemoryCreate: shmget(key=5432001, size=415776768, 03600) failed: Invalid argument <br />
<br />
This error usually means that PostgreSQL's request for a shared memory <br />
segment exceeded your kernel's SHMMAX parameter. You can either <br />
reduce the request size or reconfigure the kernel with larger SHMMAX. <br />
To reduce the request size (currently 415776768 bytes), reduce <br />
PostgreSQL's shared_buffers parameter (currently 50000) and/or <br />
its max_connections parameter (currently 12).<br />
</pre></code><br />
<br />
See [http://www.postgresql.org/docs/current/static/kernel-resources.html Managing Kernel Resources] for details on how to correct this. <br />
<br />
effective_cache_size should be set to how much memory is leftover for disk caching after taking into account what's used by the operating system, dedicated PostgreSQL memory, and other applications. If it's set too low, indexes may not be used for executing queries the way you'd expect. Setting effective_cache_size to 1/2 of total memory would be a normal conservative setting. You might find a better estimate by looking at your operating system's statistics. On UNIX-like systems, add the free+cached numbers from free or top. On Windows see the "System Cache" in the Windows Task Manager's Performance tab.<br />
<br />
<br />
=checkpoint_segments checkpoint_completion_target=<br />
<br />
<br />
<br />
=autovacuum fsm-s=<br />
<br />
Increase the value of max_fsm_pages and max_fsm_relations as needed<br />
<br />
The Free Space Map is used to track where there are dead tuples that may be reclaimed. You will only get effective nonblocking VACUUM queries if the dead tuples can be listed in the Free Space Map.<br />
<br />
As a result, if you do not plan to run VACUUM frequently, and if you expect a lot of updates, you should ensure these values are usefully large. It should be easy enough to set max_fsm_relations high enough; the problem that will more typically occur is when max_fsm_pages is not set high enough. Once the Free Space Map is full, VACUUM will be unable to track further dead pages. In a busy database, this needs to be set much higher than 1000...<br />
<br />
If you run VACUUM VERBOSE on your database, it'll tell you how many pages and relations are in use (and, under 8.3, what the current limits are). For example,<br />
<br />
INFO: free space map contains 5293 pages in 214 relations<br />
DETAIL: A total of 8528 page slots are in use (including overhead).<br />
8528 page slots are required to track all free space.<br />
Current limits are: 204800 page slots, 1000 relations, using 1265 kB.<br />
<br />
=logging=<br />
<br />
=default_stats_target=<br />
<br />
=work_mem maintainance_work_mem=<br />
<br />
If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents.<br />
<br />
This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem. You need to consider what you set max_connections to in order to size this parameter correction. This is where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory.<br />
<br />
maintainance_work_mem is used for operations like vacuum. Using extremely large values here doesn't help very much, and because you essentially need to reserve that memory for when vacuum kicks in that takes it away from more useful purposes. Something in the 256MB range as anecdotally been a reasonable large setting here.<br />
<br />
=wal_sync_method wal_buffers=<br />
<br />
=constraint_exclusion max_prepared_transactions=<br />
<br />
=synchronous_commit=<br />
<br />
=random_page_cost=<br />
If you have particularly fast disks, as commonly found with RAID arrays of SCSI disks, it may be appropriate to lower random_page_cost, which will encourage the query optimizer to use random access index scans.</div>Amshttps://wiki.postgresql.org/index.php?title=RRReviewers&diff=1539RRReviewers2008-06-25T17:41:33Z<p>Ams: Add my name as requested by aglio2 on IRC</p>
<hr />
<div>== Round Robin Reviewers ==<br />
<br />
This is a list of reviewers available for assignment to review assigned patches:<br />
<br />
* Neil Conway<br />
* David Fetter<br />
* Álvaro Herrera<br />
* Bruce Momjian<br />
* Greg Stark<br />
* Martin Zaun<br />
* Thomas Lee<br />
* Nikhil Sontakke<br />
* Abhijit Menon-Sen</div>Ams