<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://wiki.postgresql.org/skins/common/feed.css?207"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<channel>
		<title>PostgreSQL wiki - User contributions [en]</title>
		<link>http://wiki.postgresql.org/wiki/Special:Contributions/Kgrittn</link>
		<description>From PostgreSQL wiki</description>
		<language>en</language>
		<generator>MediaWiki 1.15.5-2squeeze5</generator>
		<lastBuildDate>Tue, 21 May 2013 13:43:21 GMT</lastBuildDate>
		<item>
			<title>PgCon 2013 Developer Meeting</title>
			<link>http://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting</link>
			<guid>http://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Proposed Agenda Items */ Add name to proposed agenda item&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A meeting of the most active PostgreSQL developers is being planned for Wednesday 22nd May, 2013 near the University of Ottawa, prior to pgCon 2013. In order to keep the numbers manageable, this meeting is '''by invitation only'''. Unfortunately it is quite possible that we've overlooked important code developers during the planning of the event - if you feel you fall into this category and would like to attend, please contact Dave Page (dpage@pgadmin.org). &lt;br /&gt;
&lt;br /&gt;
Please note that this year the attendee numbers have been kept low in order to keep the meeting more productive. Invitations have been sent only to developers that have been highly active on the database server over the 9.3 release cycle. We have not invited any contributors based on their contributions to related projects, or seniority in regional user groups or sponsoring companies, unlike in previous years.&lt;br /&gt;
&lt;br /&gt;
This is a PostgreSQL Community event. Room and refreshments/food sponsored by EnterpriseDB. Other companies sponsored attendance for their developers.&lt;br /&gt;
 &lt;br /&gt;
== Time &amp;amp; Location ==&lt;br /&gt;
&lt;br /&gt;
The meeting will be from 8:30AM to 5PM, and will be in the &amp;quot;Red Experience&amp;quot; room at:&lt;br /&gt;
&lt;br /&gt;
 Novotel Ottawa&lt;br /&gt;
 33 Nicholas Street&lt;br /&gt;
 Ottawa&lt;br /&gt;
 Ontario&lt;br /&gt;
 K1N 9M7&lt;br /&gt;
 &lt;br /&gt;
Food and drink will be provided throughout the day, including breakfast from 8AM.&lt;br /&gt;
&lt;br /&gt;
[http://maps.google.ca/maps?f=q&amp;amp;source=s_q&amp;amp;hl=en&amp;amp;geocode=&amp;amp;q=novotel+ottawa&amp;amp;aq=&amp;amp;sll=49.891235,-97.15369&amp;amp;sspn=36.237851,79.013672&amp;amp;ie=UTF8&amp;amp;hq=novotel+ottawa&amp;amp;hnear=&amp;amp;ll=45.421528,-75.683699&amp;amp;spn=0.036869,0.077162&amp;amp;z=14&amp;amp;iwloc=A&amp;amp;layer=c&amp;amp;cbll=45.425741,-75.689638&amp;amp;panoid=Z4FUGnkZkdHAOkIxyjjS9Q&amp;amp;cbp=12,25.83,,0,-0.6 View on Google Maps]&lt;br /&gt;
&lt;br /&gt;
== Attendees ==&lt;br /&gt;
&lt;br /&gt;
The following people have RSVPed to the meeting (in alphabetical order, by surname):&lt;br /&gt;
&lt;br /&gt;
* Josh Berkus (secretary)&lt;br /&gt;
* Jeff Davis&lt;br /&gt;
* Andrew Dunstan&lt;br /&gt;
* Peter Eisentraut&lt;br /&gt;
* Dimitri Fontaine&lt;br /&gt;
* Andres Freund&lt;br /&gt;
* Stephen Frost&lt;br /&gt;
* Peter Geoghegan&lt;br /&gt;
* Kevin Grittner&lt;br /&gt;
* Robert Haas&lt;br /&gt;
* Magnus Hagander&lt;br /&gt;
* KaiGai Kohei&lt;br /&gt;
* Alexander Korotkov&lt;br /&gt;
* Tom Lane&lt;br /&gt;
* Fujii Masao&lt;br /&gt;
* Noah Misch&lt;br /&gt;
* Bruce Momjian&lt;br /&gt;
* Dave Page (chair)&lt;br /&gt;
* Simon Riggs&lt;br /&gt;
&lt;br /&gt;
== Proposed Agenda Items ==&lt;br /&gt;
&lt;br /&gt;
Please list proposed agenda items here:&lt;br /&gt;
&lt;br /&gt;
* 9.4 Commitfest schedule&lt;br /&gt;
* [http://wiki.postgresql.org/wiki/Parallel_Query_Execution Parallel Query Execution] (Bruce, Noah)&lt;br /&gt;
* logical changeset generation review &amp;amp; integration (Andres)&lt;br /&gt;
* utilization of upcoming non-volatile RAM device (Kaigai)&lt;br /&gt;
* pluggable plan/exec nodes (Kaigai)&lt;br /&gt;
** to offload targetlist calculation, sorting, aggregates, ...&lt;br /&gt;
* [[GIN generalization]] (Alexander)&lt;br /&gt;
* An Extensibility Roadmap (dim)&lt;br /&gt;
* Representing severity - derive severity from SQLSTATE (Peter Geoghegan - see http://www.postgresql.org/message-id/CA+TgmoZEjq7va+SfDZQwk6E4emEWThENNyxfqEGhB3iuoT1OJw@mail.gmail.com)&lt;br /&gt;
* Error logging infrastructure - store normalized statistics about errors in a circular buffer (Peter Geoghegan). Arguably this could be discussed alongside SQLSTATE item.&lt;br /&gt;
* Failback with backup (Fujii Masao - related discussion is: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtbJgWrFu513s+Q@mail.gmail.com)&lt;br /&gt;
* Volume Management (Stephen Frost - wiki page will be forthcoming before the meeting)&lt;br /&gt;
* AXLE Project - Big data analytics for Postgres (Simon Riggs) - an overview of the feature plan, how project works and what community can expect&lt;br /&gt;
* Incremental maintenance of materialized views (Kevin) - differential REFRESH and infrastructure for ''counting'' algorithm&lt;br /&gt;
&lt;br /&gt;
== Agenda ==&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;4&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Item&lt;br /&gt;
!Presenter&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:00&lt;br /&gt;
|Breakfast&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:30 - 08:45&lt;br /&gt;
|Welcome and introductions&lt;br /&gt;
|Dave Page&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|08:45 - 09:45&lt;br /&gt;
|Parallel Query Execution&lt;br /&gt;
|Bruce/Noah&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|09:45 - 10:30&lt;br /&gt;
|Logical changeset generation review &amp;amp; integration&lt;br /&gt;
|Andres&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|10:30 - 10:45&lt;br /&gt;
|Coffee break&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|10:45 - 11:00&lt;br /&gt;
|Utilization of upcoming non-volatile RAM devices&lt;br /&gt;
|KaiGai&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|11:00 - 11:30&lt;br /&gt;
|Pluggable plan/exec nodes&lt;br /&gt;
|KaiGai&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|11:30 - 11:50&lt;br /&gt;
|Representing severity&lt;br /&gt;
|Peter G.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|11:50 - 12:30&lt;br /&gt;
|Error logging infrastructure&lt;br /&gt;
|Peter G.&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|12:30 - 13:30&lt;br /&gt;
|Lunch	&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|13:30 - 14:15&lt;br /&gt;
|GIN generalization&lt;br /&gt;
|Alexander&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|14:15 - 15:00&lt;br /&gt;
|An Extensibility Roadmap&lt;br /&gt;
|Dimitri&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|15:00 - 15:15&lt;br /&gt;
|Tea break&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|15:15 - 15:30&lt;br /&gt;
|9.4 Commitfest schedule&lt;br /&gt;
|All&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|15:30 - 16:45&lt;br /&gt;
|Goals, priorities, and resources for 9.4&lt;br /&gt;
|All&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|16:45 - 17:00&lt;br /&gt;
|Any other business/group photo&lt;br /&gt;
|Dave Page&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|17:00&lt;br /&gt;
|Finish&lt;br /&gt;
|	&lt;br /&gt;
|}&lt;/div&gt;</description>
			<pubDate>Mon, 13 May 2013 20:18:10 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:PgCon_2013_Developer_Meeting</comments>		</item>
		<item>
			<title>PgCon 2013 Developer Meeting</title>
			<link>http://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting</link>
			<guid>http://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Proposed Agenda Items */ Add proposed agenda item for incremental maintenance of materialized views&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A meeting of the most active PostgreSQL developers is being planned for Wednesday 22nd May, 2013 near the University of Ottawa, prior to pgCon 2013. In order to keep the numbers manageable, this meeting is '''by invitation only'''. Unfortunately it is quite possible that we've overlooked important code developers during the planning of the event - if you feel you fall into this category and would like to attend, please contact Dave Page (dpage@pgadmin.org). &lt;br /&gt;
&lt;br /&gt;
Please note that this year the attendee numbers have been kept low in order to keep the meeting more productive. Invitations have been sent only to developers that have been highly active on the database server over the 9.3 release cycle. We have not invited any contributors based on their contributions to related projects, or seniority in regional user groups or sponsoring companies, unlike in previous years.&lt;br /&gt;
&lt;br /&gt;
This is a PostgreSQL Community event. Room and refreshments/food sponsored by EnterpriseDB. Other companies sponsored attendance for their developers.&lt;br /&gt;
 &lt;br /&gt;
== Time &amp;amp; Location ==&lt;br /&gt;
&lt;br /&gt;
The meeting will be from 8:30AM to 5PM, and will be in the &amp;quot;Red Experience&amp;quot; room at:&lt;br /&gt;
&lt;br /&gt;
 Novotel Ottawa&lt;br /&gt;
 33 Nicholas Street&lt;br /&gt;
 Ottawa&lt;br /&gt;
 Ontario&lt;br /&gt;
 K1N 9M7&lt;br /&gt;
 &lt;br /&gt;
Food and drink will be provided throughout the day, including breakfast from 8AM.&lt;br /&gt;
&lt;br /&gt;
[http://maps.google.ca/maps?f=q&amp;amp;source=s_q&amp;amp;hl=en&amp;amp;geocode=&amp;amp;q=novotel+ottawa&amp;amp;aq=&amp;amp;sll=49.891235,-97.15369&amp;amp;sspn=36.237851,79.013672&amp;amp;ie=UTF8&amp;amp;hq=novotel+ottawa&amp;amp;hnear=&amp;amp;ll=45.421528,-75.683699&amp;amp;spn=0.036869,0.077162&amp;amp;z=14&amp;amp;iwloc=A&amp;amp;layer=c&amp;amp;cbll=45.425741,-75.689638&amp;amp;panoid=Z4FUGnkZkdHAOkIxyjjS9Q&amp;amp;cbp=12,25.83,,0,-0.6 View on Google Maps]&lt;br /&gt;
&lt;br /&gt;
== Attendees ==&lt;br /&gt;
&lt;br /&gt;
The following people have RSVPed to the meeting (in alphabetical order, by surname):&lt;br /&gt;
&lt;br /&gt;
* Josh Berkus (secretary)&lt;br /&gt;
* Jeff Davis&lt;br /&gt;
* Andrew Dunstan&lt;br /&gt;
* Peter Eisentraut&lt;br /&gt;
* Dimitri Fontaine&lt;br /&gt;
* Andres Freund&lt;br /&gt;
* Stephen Frost&lt;br /&gt;
* Peter Geoghegan&lt;br /&gt;
* Kevin Grittner&lt;br /&gt;
* Robert Haas&lt;br /&gt;
* Magnus Hagander&lt;br /&gt;
* KaiGai Kohei&lt;br /&gt;
* Alexander Korotkov&lt;br /&gt;
* Tom Lane&lt;br /&gt;
* Fujii Masao&lt;br /&gt;
* Noah Misch&lt;br /&gt;
* Bruce Momjian&lt;br /&gt;
* Dave Page (chair)&lt;br /&gt;
* Simon Riggs&lt;br /&gt;
&lt;br /&gt;
== Proposed Agenda Items ==&lt;br /&gt;
&lt;br /&gt;
Please list proposed agenda items here:&lt;br /&gt;
&lt;br /&gt;
* 9.4 Commitfest schedule&lt;br /&gt;
* [http://wiki.postgresql.org/wiki/Parallel_Query_Execution Parallel Query Execution] (Bruce, Noah)&lt;br /&gt;
* logical changeset generation review &amp;amp; integration (Andres)&lt;br /&gt;
* utilization of upcoming non-volatile RAM device (Kaigai)&lt;br /&gt;
* pluggable plan/exec nodes (Kaigai)&lt;br /&gt;
** to offload targetlist calculation, sorting, aggregates, ...&lt;br /&gt;
* [[GIN generalization]] (Alexander)&lt;br /&gt;
* An Extensibility Roadmap (dim)&lt;br /&gt;
* Representing severity - derive severity from SQLSTATE (Peter Geoghegan - see http://www.postgresql.org/message-id/CA+TgmoZEjq7va+SfDZQwk6E4emEWThENNyxfqEGhB3iuoT1OJw@mail.gmail.com)&lt;br /&gt;
* Error logging infrastructure - store normalized statistics about errors in a circular buffer (Peter Geoghegan). Arguably this could be discussed alongside SQLSTATE item.&lt;br /&gt;
* Failback with backup (Fujii Masao - related discussion is: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtbJgWrFu513s+Q@mail.gmail.com)&lt;br /&gt;
* Volume Management (Stephen Frost - wiki page will be forthcoming before the meeting)&lt;br /&gt;
* AXLE Project - Big data analytics for Postgres (Simon Riggs) - an overview of the feature plan, how project works and what community can expect&lt;br /&gt;
* Incremental maintenance of materialized views - differential REFRESH and infrastructure for ''counting'' algorithm&lt;br /&gt;
&lt;br /&gt;
== Agenda ==&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;4&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Item&lt;br /&gt;
!Presenter&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:00&lt;br /&gt;
|Breakfast&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:30 - 08:45&lt;br /&gt;
|Welcome and introductions&lt;br /&gt;
|Dave Page&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|08:45 - 09:45&lt;br /&gt;
|Parallel Query Execution&lt;br /&gt;
|Bruce/Noah&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|09:45 - 10:30&lt;br /&gt;
|Logical changeset generation review &amp;amp; integration&lt;br /&gt;
|Andres&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|10:30 - 10:45&lt;br /&gt;
|Coffee break&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|10:45 - 11:00&lt;br /&gt;
|Utilization of upcoming non-volatile RAM devices&lt;br /&gt;
|KaiGai&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|11:00 - 11:30&lt;br /&gt;
|Pluggable plan/exec nodes&lt;br /&gt;
|KaiGai&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|11:30 - 11:50&lt;br /&gt;
|Representing severity&lt;br /&gt;
|Peter G.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|11:50 - 12:30&lt;br /&gt;
|Error logging infrastructure&lt;br /&gt;
|Peter G.&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|12:30 - 13:30&lt;br /&gt;
|Lunch	&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|13:30 - 14:15&lt;br /&gt;
|GIN generalization&lt;br /&gt;
|Alexander&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|14:15 - 15:00&lt;br /&gt;
|An Extensibility Roadmap&lt;br /&gt;
|Dimitri&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|15:00 - 15:15&lt;br /&gt;
|Tea break&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|15:15 - 15:30&lt;br /&gt;
|9.4 Commitfest schedule&lt;br /&gt;
|All&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|15:30 - 16:45&lt;br /&gt;
|Goals, priorities, and resources for 9.4&lt;br /&gt;
|All&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|16:45 - 17:00&lt;br /&gt;
|Any other business/group photo&lt;br /&gt;
|Dave Page&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|17:00&lt;br /&gt;
|Finish&lt;br /&gt;
|	&lt;br /&gt;
|}&lt;/div&gt;</description>
			<pubDate>Mon, 13 May 2013 20:17:03 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:PgCon_2013_Developer_Meeting</comments>		</item>
		<item>
			<title>What's new in PostgreSQL 9.3</title>
			<link>http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.3</link>
			<guid>http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.3</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Materialized Views */ Add a link to the overview in the docs (which is located under &amp;quot;The Rule System&amp;quot;)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains additional information about PostgreSQL Version 9.3's features, including descriptions, testing information, and usage information. See also the [http://www.postgresql.org/docs/devel/static/release-9-3.html Release Notes] and [[PostgreSQL_9.3_Blog_Posts|this page]] for a list of blog posts explaining some of the new features.&lt;br /&gt;
&lt;br /&gt;
= New features =&lt;br /&gt;
&lt;br /&gt;
''... in no particular order ...'' &lt;br /&gt;
&lt;br /&gt;
Items marked as &amp;quot;(DONE)&amp;quot; have a basic description + links, but can of course be improved / expanded.&lt;br /&gt;
&lt;br /&gt;
* Writeable Foreign Tables: write to external databases as well as read from them&lt;br /&gt;
* pgsql_fdw driver for federation of PostgreSQL databases&lt;br /&gt;
* VIEW features&lt;br /&gt;
** Automatically updatable VIEWs (DONE)&lt;br /&gt;
** MATERIALIZED VIEWs declaration (DONE)&lt;br /&gt;
** Recursive view declaration  (DONE)&lt;br /&gt;
* LATERAL JOINs&lt;br /&gt;
* Additional JSON constructor and extractor functions&lt;br /&gt;
* Switch to Posix shared memory and mmap(). (DONE)&lt;br /&gt;
* Indexed regular expression search&lt;br /&gt;
* Disk page checksums to detect filesystem failures&lt;br /&gt;
* Replication improvements&lt;br /&gt;
** Streaming-only remastering of replicas&lt;br /&gt;
** Streaming replication protocol is now architecture-independent.&lt;br /&gt;
** Faster promotion of a streaming standby to primary (&amp;quot;Standby promotion is almost instant, allowing 99.999% availability for a replicated cluster.&amp;quot;)&lt;br /&gt;
* Performance and locking improvements for Foreign Key locks&lt;br /&gt;
* Parallel pg_dump for faster backups (DONE)&lt;br /&gt;
* Directories for configuration files (DONE)&lt;br /&gt;
* pg_isready database connection checker  (DONE)&lt;br /&gt;
* 64-bit Large Object API&lt;br /&gt;
* COPY FREEZE for reduced IO bulk loading&lt;br /&gt;
* User-defined background workers for automating database tasks (DONE)&lt;br /&gt;
&lt;br /&gt;
== Configuration directive 'include_dir' ==&lt;br /&gt;
&lt;br /&gt;
In addition to including separate configuration files via the 'include' directive, postgresql.conf now also provides the 'include_dir' directive which reads all files ending in &amp;quot;.conf&amp;quot; in the specified directory or directories.&lt;br /&gt;
&lt;br /&gt;
Directories can be specified either as an absolute path or relative from the location of the main configuration file. Directories will be read in the order they occur, while files will be read sorted by C locale rules. It is possible for included files to contain their own 'include_dir' directives. &lt;br /&gt;
&lt;br /&gt;
'''Links'''&lt;br /&gt;
&lt;br /&gt;
* [http://www.postgresql.org/docs/devel/static/config-setting.html#CONFIG-INCLUDES Documentation]&lt;br /&gt;
&lt;br /&gt;
== Custom Background Workers ==&lt;br /&gt;
&lt;br /&gt;
This functionality enables modules to register themselves as &amp;quot;background worker processes&amp;quot;, effectively operating as customised server processes. This is a powerful new feature with a wide variety of possible use cases, such as monitoring server activity, performing tasks at pre-defined intervals, customised logging etc.&lt;br /&gt;
&lt;br /&gt;
Background worker processes can attach to PostgreSQL's shared memory area and to connect to databases internally; by linking to libpq they can also connect to the server in the same way as a regular client application. Background worker processes are written in C, and as server processes they have unrestricted access to all data and can potentially impact other server processes, meaning so they represent a potential security / stability risk. Consequently background worker processes should be developed and deployed with appropriate caution.&lt;br /&gt;
&lt;br /&gt;
Providing an example would go beyond the scope of this article; please refer to the blogs linked below, which provide annotated sample code. The PostgreSQL source also contains a sample background worker process in contrib/worker_spi.&lt;br /&gt;
&lt;br /&gt;
'''Links'''&lt;br /&gt;
&lt;br /&gt;
* [http://www.postgresql.org/docs/devel/static/bgworker.html Documentation]&lt;br /&gt;
* [http://www.depesz.com/2012/12/07/waiting-for-9-3-background-worker-processes/ Background worker processes] &lt;br /&gt;
* [http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-handling-signals-with-custom-bgworkers/ Postgres 9.3 feature highlight: handling signals with custom bgworkers] &lt;br /&gt;
* [http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-custom-background-workers/ Custom background workers] &lt;br /&gt;
* [http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-hello-world-with-custom-bgworkers/ &amp;quot;Hello World&amp;quot; with custom bgworkers]&lt;br /&gt;
&lt;br /&gt;
== Parallel pg_dump for faster backups ==&lt;br /&gt;
&lt;br /&gt;
The new ''-j '''njobs''''' (''--jobs=''''njobs'''''') option enables pg_dump to dump '''njobs''' tables simultaneously, reducing the time it takes to dump a database. Example:&lt;br /&gt;
&lt;br /&gt;
  pg_dump -U postgres -j4  -Fd -f /tmp/mydb-dump mydb&lt;br /&gt;
&lt;br /&gt;
This dumps the contents of database &amp;quot;mydb&amp;quot; to the directory &amp;quot;/tmp/mydb-dump&amp;quot; using four simultaneous connections.&lt;br /&gt;
&lt;br /&gt;
Caveats:&lt;br /&gt;
* Parallel dumps can only be in directory format&lt;br /&gt;
* Parellel dumps will place more load on the database, although total dump time should be shorter&lt;br /&gt;
* pg_dump will open njobs + 1 connections to the database, so max_connections should be set appropriately&lt;br /&gt;
* Requesting exclusive locks on database objects while running a parallel dump could cause the dump to fail&lt;br /&gt;
* Parallel dumps from pre-9.2 servers need special attention&lt;br /&gt;
&lt;br /&gt;
An ad-hoc test of this feature on a 4.5GB database (which compresses to around 370MB as a dump) with different values of  ''-j '' produced following timings:&lt;br /&gt;
&lt;br /&gt;
* (''no -j''): 1m3s&lt;br /&gt;
* -j2: 0m28s&lt;br /&gt;
* -j3: 0m24s&lt;br /&gt;
* -j4: 0m24s&lt;br /&gt;
* -j5: 0m25s&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Links'''&lt;br /&gt;
&lt;br /&gt;
* [http://www.postgresql.org/docs/devel/static/app-pgdump.html pg_dump documentation]&lt;br /&gt;
* [http://www.depesz.com/2013/03/26/2646/ Waiting for 9.3 – Add parallel pg_dump option]&lt;br /&gt;
* [http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-parallel-pg_dump/ Postgres 9.3 feature highlight: parallel pg_dump]&lt;br /&gt;
&lt;br /&gt;
== 'pg_isready' server monitoring tool ==&lt;br /&gt;
&lt;br /&gt;
pg_isready is a wrapper for PQping created as a standard client application. It accepts a libpq-style connection string and returns one of four exit statuses:&lt;br /&gt;
&lt;br /&gt;
* 0: server is accepting connections normally&lt;br /&gt;
* 1: server is rejecting connections (for example during startup)&lt;br /&gt;
* 2: server did not response to the connection attempt&lt;br /&gt;
* 3: no connection attempt was made (e.g. due to invalid connection parameters)&lt;br /&gt;
&lt;br /&gt;
Example usage:&lt;br /&gt;
&lt;br /&gt;
 barwick@localhost:~$ pg_isready&lt;br /&gt;
 /tmp:5432 - accepting connections&lt;br /&gt;
 barwick@localhost:~$ pg_isready --quiet &amp;amp;&amp;amp; echo &amp;quot;OK&amp;quot;&lt;br /&gt;
 OK&lt;br /&gt;
 barwick@localhost:~$ pg_isready -p5431 -h localhost&lt;br /&gt;
 localhost:5431 - accepting connections&lt;br /&gt;
 barwick@localhost:~$ pg_isready -h example.com&lt;br /&gt;
 example.com:5432 - no response&lt;br /&gt;
&lt;br /&gt;
'''Links'''&lt;br /&gt;
&lt;br /&gt;
* [http://www.postgresql.org/docs/devel/static/app-pg-isready.html Documentation]&lt;br /&gt;
* [http://www.depesz.com/2013/01/26/waiting-for-9-3-pg_isready/ pg_isready]&lt;br /&gt;
* [http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-server-monitoring-with-pg_isready/ Server monitoring with pg_isready]&lt;br /&gt;
&lt;br /&gt;
== pgsql_fdw driver for federation of PostgreSQL databases ==&lt;br /&gt;
&lt;br /&gt;
New PostgreSQL-to-PostgreSQL foreign data wrapper, which allows writes and &amp;quot;pushdown&amp;quot; of some query clauses to the external server.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Switch to Posix shared memory and mmap() ==&lt;br /&gt;
&lt;br /&gt;
In 9.3, PostgreSQL has switched from using SysV shared memory to using Posix shared memory and mmap for memory management.  This allows easier installation and configuration of PostgreSQL, and means that except in usual cases, system parameters such as SHMMAX and SHMALL no longer need to be adjusted. We need users to rigorously test and ensure that no memory management issues have been introduced by the change. &lt;br /&gt;
&lt;br /&gt;
'''Links'''&lt;br /&gt;
&lt;br /&gt;
* [http://www.postgresql.org/docs/devel/static/kernel-resources.html#SYSVIPC Documentation]&lt;br /&gt;
&lt;br /&gt;
== VIEW Features ==&lt;br /&gt;
=== Materialized Views ===&lt;br /&gt;
&lt;br /&gt;
Materialized views are a special kind of view which cache the view's output as a physical table, rather than executing the underlying query on every access. Conceptually they are similar to &amp;quot;CREATE TABLE AS&amp;quot;, but store the view definition so it can be easily refreshed.&lt;br /&gt;
&lt;br /&gt;
Note that materialized views cannot be auto-refreshed; refreshes are not incremental; and the base table cannot be manipulated. It is currently uncertain whether they will be automatically populated by pg_restore (''??? need to confirm this'')&lt;br /&gt;
&lt;br /&gt;
'''Contrived example'''&lt;br /&gt;
&lt;br /&gt;
Create and populate a table with some arbitrary data:&lt;br /&gt;
&lt;br /&gt;
 CREATE TABLE matview_test_table (&lt;br /&gt;
  id SERIAL PRIMARY KEY,&lt;br /&gt;
  ts TIMESTAMPTZ NOT NULL&lt;br /&gt;
 )&lt;br /&gt;
&lt;br /&gt;
 INSERT INTO matview_test_table VALUES (&lt;br /&gt;
  DEFAULT,&lt;br /&gt;
  ((NOW() - '2 days'::INTERVAL) + (generate_series(1,1000) || ' seconds')::INTERVAL)::TIMESTAMPTZ&lt;br /&gt;
 )&lt;br /&gt;
&lt;br /&gt;
Create a materialized view which lists the 5 most recent entries:&lt;br /&gt;
&lt;br /&gt;
 CREATE MATERIALIZED VIEW matview_test_view AS&lt;br /&gt;
   SELECT id, ts&lt;br /&gt;
     FROM matview_test_table&lt;br /&gt;
 ORDER BY id DESC &lt;br /&gt;
    LIMIT 5&lt;br /&gt;
&lt;br /&gt;
 postgres=# SELECT * from matview_test_view ;&lt;br /&gt;
   id  |              ts               &lt;br /&gt;
 ------+-------------------------------&lt;br /&gt;
  1000 | 2013-05-06 12:02:10.974711+09&lt;br /&gt;
   999 | 2013-05-06 12:02:09.974711+09&lt;br /&gt;
   998 | 2013-05-06 12:02:08.974711+09&lt;br /&gt;
   997 | 2013-05-06 12:02:07.974711+09&lt;br /&gt;
   996 | 2013-05-06 12:02:06.974711+09&lt;br /&gt;
 (5 rows)&lt;br /&gt;
&lt;br /&gt;
Add more data to the table:&lt;br /&gt;
&lt;br /&gt;
 INSERT INTO matview_test_table VALUES (&lt;br /&gt;
  DEFAULT,&lt;br /&gt;
  ((NOW() - '1 days'::INTERVAL) + (generate_series(1,1000) || ' seconds')::INTERVAL)::TIMESTAMPTZ&lt;br /&gt;
 )&lt;br /&gt;
&lt;br /&gt;
View output does not change:&lt;br /&gt;
&lt;br /&gt;
 postgres=# SELECT * from matview_test_view ;&lt;br /&gt;
   id  |              ts               &lt;br /&gt;
 ------+-------------------------------&lt;br /&gt;
  1000 | 2013-05-06 12:02:10.974711+09&lt;br /&gt;
   999 | 2013-05-06 12:02:09.974711+09&lt;br /&gt;
   998 | 2013-05-06 12:02:08.974711+09&lt;br /&gt;
   997 | 2013-05-06 12:02:07.974711+09&lt;br /&gt;
   996 | 2013-05-06 12:02:06.974711+09&lt;br /&gt;
 (5 rows)&lt;br /&gt;
&lt;br /&gt;
Refresh the view to display the latest table entries:&lt;br /&gt;
&lt;br /&gt;
 postgres=# REFRESH MATERIALIZED VIEW matview_test_view ;&lt;br /&gt;
 REFRESH MATERIALIZED VIEW&lt;br /&gt;
 postgres=# SELECT * from matview_test_view ;&lt;br /&gt;
   id  |              ts               &lt;br /&gt;
 ------+-------------------------------&lt;br /&gt;
  2001 | 2013-05-07 12:03:10.696626+09&lt;br /&gt;
  2000 | 2013-05-07 12:03:09.696626+09&lt;br /&gt;
  1999 | 2013-05-07 12:03:08.696626+09&lt;br /&gt;
  1998 | 2013-05-07 12:03:07.696626+09&lt;br /&gt;
  1997 | 2013-05-07 12:03:06.696626+09&lt;br /&gt;
 (5 rows)&lt;br /&gt;
&lt;br /&gt;
The links below contain more detailed information and examples.&lt;br /&gt;
&lt;br /&gt;
'''Links'''&lt;br /&gt;
* Documentation:&lt;br /&gt;
** [http://www.postgresql.org/docs/devel/static/rules-materializedviews.html Overview]&lt;br /&gt;
** [http://www.postgresql.org/docs/devel/static/sql-creatematerializedview.html CREATE command]&lt;br /&gt;
* [http://www.depesz.com/2013/03/04/waiting-for-9-3-add-a-materialized-view-relations/ Waiting for 9.3 – Add a materialized view relations]&lt;br /&gt;
* [http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-materialized-views/ Postgres 9.3 feature highlight: Materialized views]&lt;br /&gt;
&lt;br /&gt;
=== Recursive View Syntax ===&lt;br /&gt;
&lt;br /&gt;
The CREATE RECURSIVE VIEW syntax provides a shorthand way of formulating a recursive common table expression (CTE) as a view.&lt;br /&gt;
&lt;br /&gt;
Taking the example from the [http://www.postgresql.org/docs/current/static/queries-with.html#QUERIES-WITH-SELECT CTE documentation]:&lt;br /&gt;
&lt;br /&gt;
 WITH RECURSIVE t(n) AS (&lt;br /&gt;
     VALUES (1)&lt;br /&gt;
   UNION ALL&lt;br /&gt;
     SELECT n+1 FROM t WHERE n &amp;lt; 100&lt;br /&gt;
 )&lt;br /&gt;
 SELECT * FROM t;&lt;br /&gt;
&lt;br /&gt;
This can be created as a recursive view as follows:&lt;br /&gt;
&lt;br /&gt;
 CREATE RECURSIVE VIEW t(n) AS&lt;br /&gt;
     VALUES (1)&lt;br /&gt;
   UNION ALL&lt;br /&gt;
     SELECT n+1 FROM t WHERE n &amp;lt; 100;&lt;br /&gt;
&lt;br /&gt;
'''Links'''&lt;br /&gt;
* [http://www.postgresql.org/docs/devel/static/sql-createview.html Documentation]&lt;br /&gt;
* [http://www.depesz.com/2013/03/04/waiting-for-9-3-add-create-recursive-view-syntax/ Waiting for 9.3 – Add CREATE RECURSIVE VIEW syntax]&lt;br /&gt;
&lt;br /&gt;
=== Updatable Views ===&lt;br /&gt;
&lt;br /&gt;
Simple views can now be updated in the same way as regular tables. The view can only reference one table (or another updatable view) and must not contain more complex operators, join types etc. &lt;br /&gt;
&lt;br /&gt;
If the view has a WHERE condition, UPDATEs and DELETEs on the underlying table will be restricted to those rows it defines. However UPDATEs may change a row so that it is no longer visible in the view, and an INSERT command can potentiall insert rows which do not satisfy the WHERE condition.&lt;br /&gt;
&lt;br /&gt;
More complex views can be made updatable as before using INSTEAD OF triggers or INSTEAD rules.&lt;br /&gt;
&lt;br /&gt;
Simple example using the following table and view:&lt;br /&gt;
&amp;lt;code&amp;gt; &lt;br /&gt;
 CREATE TABLE postgres_versions (&lt;br /&gt;
  version VARCHAR(3) PRIMARY KEY,&lt;br /&gt;
  nickname TEXT NOT NULL&lt;br /&gt;
 );&lt;br /&gt;
&lt;br /&gt;
 INSERT INTO postgres_versions VALUES&lt;br /&gt;
  ('8.0', 'Excitable Element'),&lt;br /&gt;
  ('8.1', 'Fishy Foreign Key'),&lt;br /&gt;
  ('8.2', 'Grumpy Grant'),&lt;br /&gt;
  ('8.3', 'Hysterical Hstore'),&lt;br /&gt;
  ('8.4', 'Insane Index'),&lt;br /&gt;
  ('9.0', 'Jumpy Join'),&lt;br /&gt;
  ('9.1', 'Killer Key'),&lt;br /&gt;
  ('9.2', 'Laconical Lexer'),&lt;br /&gt;
  ('9.3', 'Morose Module');&lt;br /&gt;
&lt;br /&gt;
 CREATE VIEW postgres_versions_9 AS&lt;br /&gt;
  SELECT version, nickname&lt;br /&gt;
    FROM postgres_versions&lt;br /&gt;
   WHERE version LIKE '9.%';&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt; &lt;br /&gt;
 postgres=# SELECT * from postgres_versions_9;&lt;br /&gt;
  version |    nickname     &lt;br /&gt;
 ---------+-----------------&lt;br /&gt;
  9.0     | Jumpy Join&lt;br /&gt;
  9.1     | Killer Key&lt;br /&gt;
  9.2     | Laconical Lexer&lt;br /&gt;
  9.3     | Morose Module&lt;br /&gt;
 (4 rows)&lt;br /&gt;
 &lt;br /&gt;
 postgres=# UPDATE postgres_versions_9 SET nickname='Maniac Master' WHERE version='9.3';&lt;br /&gt;
 UPDATE 1&lt;br /&gt;
 postgres=# SELECT * from postgres_versions_9;&lt;br /&gt;
  version |    nickname     &lt;br /&gt;
 ---------+-----------------&lt;br /&gt;
  9.0     | Jumpy Join&lt;br /&gt;
  9.1     | Killer Key&lt;br /&gt;
  9.2     | Laconical Lexer&lt;br /&gt;
  9.3     | Maniac Master&lt;br /&gt;
 (4 rows)&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Links'''&lt;br /&gt;
* [http://www.postgresql.org/docs/devel/static/sql-createview.html#SQL-CREATEVIEW-UPDATABLE-VIEWS Documentation]&lt;br /&gt;
* [http://www.depesz.com/2012/12/11/waiting-for-9-3-support-automatically-updatable-views/ Waiting for 9.3 – Support automatically-updatable views]&lt;br /&gt;
* [http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-auto-updatable-views/ Postgres 9.3 feature highlight: auto-updatable views]&lt;br /&gt;
&lt;br /&gt;
== Writeable Foreign Tables ==&lt;br /&gt;
&lt;br /&gt;
Foreign data sources can now be written to, as well as read, provided that the FDW driver supports it.  As of this writing, the Redis driver supports this, but other drivers are expected to add this enhancement before 9.3 final.&lt;br /&gt;
&lt;br /&gt;
=Backward compatibility=&lt;br /&gt;
&lt;br /&gt;
These changes may incur regressions in your applications.&lt;br /&gt;
&lt;br /&gt;
== CREATE TABLE output ==&lt;br /&gt;
&lt;br /&gt;
CREATE TABLE will no longer output messages about implicit index and sequence creation unless the log level is set to DEBUG1.&lt;br /&gt;
&lt;br /&gt;
== Server settings ==&lt;br /&gt;
&lt;br /&gt;
* Parameter 'commit_delay' is restricted to superusers only&lt;br /&gt;
* Parameter 'replication_timeout' has been renamed to 'wal_sender_timeout'&lt;br /&gt;
* Parameter 'unix_socket_directory' has been replaced 'unix_socket_directories'&lt;br /&gt;
* In-memory sorts to use their full memory allocation; if work_mem was set on the basis of the pre-9.3 behavior, its value may need to be reviewed.&lt;br /&gt;
&lt;br /&gt;
== WAL filenames may end in FF ==&lt;br /&gt;
&lt;br /&gt;
WAL files will now be written in a continuous stream, rather than skipping the last 16MB segment every 4GB, meaning  WAL filenames may end in FF. WAL backup or restore scripts may need to be adapted.&lt;/div&gt;</description>
			<pubDate>Wed, 08 May 2013 20:56:34 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:What%27s_new_in_PostgreSQL_9.3</comments>		</item>
		<item>
			<title>PgCon2013CanadaClusterSummit</title>
			<link>http://wiki.postgresql.org/wiki/PgCon2013CanadaClusterSummit</link>
			<guid>http://wiki.postgresql.org/wiki/PgCon2013CanadaClusterSummit</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* RSVP List */ Add Kevin Grittner&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Clustering and Replication Developers Summit pgCon 2013 = &lt;br /&gt;
&lt;br /&gt;
Tuesday, May 21st&lt;br /&gt;
&lt;br /&gt;
9:30AM to 2:30pm&lt;br /&gt;
&lt;br /&gt;
Followed by PostgresXC Summit 3pm to 6pm&lt;br /&gt;
&lt;br /&gt;
University of Ottawa&lt;br /&gt;
&lt;br /&gt;
Room TBD&lt;br /&gt;
&lt;br /&gt;
'''Sponsored by NTT Open Source'''&lt;br /&gt;
&lt;br /&gt;
=== Agenda ===&lt;br /&gt;
&lt;br /&gt;
==== 9AM to 9:30AM ====&lt;br /&gt;
&lt;br /&gt;
Seating and coffee.  Please bring any last-minute agenda items to Josh Berkus at this time.&lt;br /&gt;
&lt;br /&gt;
==== 9:30AM to 10:15AM ====&lt;br /&gt;
&lt;br /&gt;
Introductions, and status reports from Replication/Clustering Projects:&lt;br /&gt;
&lt;br /&gt;
Status updates:&lt;br /&gt;
&lt;br /&gt;
* pgPoolII: Tatsuo Ishii&lt;br /&gt;
* Postgres-XC: Koichi Suzuki&lt;br /&gt;
* Built-in Replication: Simon Riggs&lt;br /&gt;
&lt;br /&gt;
If you are at the summit representing a specific replication or clustering tool, you should prepare a 1-3 minute summary of current progress and issues.  If you want to use slides, please provide slides in PDF form to Josh Berkus by Friday, May 17.&lt;br /&gt;
&lt;br /&gt;
==== 10:15AM to 10:30 AM ====&lt;br /&gt;
&lt;br /&gt;
Break&lt;br /&gt;
&lt;br /&gt;
==== 10:30AM to 11:30AM ====&lt;br /&gt;
&lt;br /&gt;
Summary of Clustering API projects.&lt;br /&gt;
&lt;br /&gt;
Summit attendees who have been working on [[ClusterFeatures|core clustering features]] should give an update as to progress and current issues.  Please present a 5-10 minute summary.  Attendees may use their own laptops for slides, or give slides to Josh Berkus.&lt;br /&gt;
&lt;br /&gt;
Topics:&lt;br /&gt;
&lt;br /&gt;
* Event Triggers: &lt;br /&gt;
* Exportable Snapshots:&lt;br /&gt;
* &lt;br /&gt;
&lt;br /&gt;
==== 11:30AM to 12:30PM ====&lt;br /&gt;
&lt;br /&gt;
Discussion of priorities, progress and ideas for core clustering projects and APIs.&lt;br /&gt;
&lt;br /&gt;
Goal of this discussion is to modify the list of core clustering features and get commitments for hackers to work on specific features.  Also, to supply discussion items for the following day's Developer Meeting&lt;br /&gt;
&lt;br /&gt;
==== 12:30PM to 1:30PM ====&lt;br /&gt;
&lt;br /&gt;
Lunch and follow-up discussion.  Box lunches will be supplied.&lt;br /&gt;
&lt;br /&gt;
==== 1:30PM to 2:30PM ====&lt;br /&gt;
&lt;br /&gt;
Follow-up discussion.  Consolidation of future clustering API goals and projects.&lt;br /&gt;
&lt;br /&gt;
=== RSVP List ===&lt;br /&gt;
&lt;br /&gt;
# Josh Berkus&lt;br /&gt;
# Koichi Suzuki&lt;br /&gt;
# Kevin Grittner&lt;br /&gt;
&lt;br /&gt;
= PostgresXC Summit =&lt;br /&gt;
&lt;br /&gt;
3pm to 6pm&lt;br /&gt;
&lt;br /&gt;
Same room as clustering summit.&lt;br /&gt;
&lt;br /&gt;
Agenda TBD.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:PostgreSQL Events]]&lt;/div&gt;</description>
			<pubDate>Wed, 13 Feb 2013 10:56:26 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:PgCon2013CanadaClusterSummit</comments>		</item>
		<item>
			<title>User:Kgrittn</title>
			<link>http://wiki.postgresql.org/wiki/User:Kgrittn</link>
			<guid>http://wiki.postgresql.org/wiki/User:Kgrittn</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Bio */ Replace with new bio based on job at EDB&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about Kevin Grittner and his activities.&lt;br /&gt;
&lt;br /&gt;
[[image:Kevin-Grittner.jpg]]&lt;br /&gt;
&lt;br /&gt;
== Bio ==&lt;br /&gt;
&lt;br /&gt;
I've been making my living working with computers since 1972. In 1980 I founded a consulting company I ran until 2005; during that time I worked with a wide variety of applications, organizations, and technical environments. I'm now with EnterpriseDB as a database architect. I'm a committer for the free, open source version of PostgreSQL, and an active member of the community.&lt;br /&gt;
&lt;br /&gt;
By far the largest patch I've worked on for PostgreSQL has been the Serializable Snapshot Isolation (SSI) implementation which went into PostgreSQL version 9.1. This was a joint project with Dan R.K. Ports of MIT, with helpful input and support from many in the PostgreSQL community. Our paper on that effort can be found here:&lt;br /&gt;
&lt;br /&gt;
http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf&lt;br /&gt;
&lt;br /&gt;
For the PostgreSQL docs on the feature, see:&lt;br /&gt;
&lt;br /&gt;
http://www.postgresql.org/docs/current/interactive/transaction-iso.html&lt;br /&gt;
&lt;br /&gt;
For a number of examples see:&lt;br /&gt;
&lt;br /&gt;
http://wiki.postgresql.org/wiki/SSI&lt;br /&gt;
&lt;br /&gt;
== Current Work In Process ==&lt;br /&gt;
&lt;br /&gt;
=== Declarative materialized views ===&lt;br /&gt;
&lt;br /&gt;
For the 9.3 release.&lt;br /&gt;
&lt;br /&gt;
== Possible Future Work ==&lt;br /&gt;
&lt;br /&gt;
=== Rewrite tsearch parser to use regular expressions ===&lt;br /&gt;
&lt;br /&gt;
In reviewing a patch to fix some performance problems in the current parser, I became interested in the possibility of rewriting the current state machine implementation with a regular expression implementation.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/200912102005.16560.andres@anarazel.de Re: tsearch parser inefficiency if text includes urls or emails - new version]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B210D9E020000250002D344@gw.wicourts.gov tsearch parser overhaul]&lt;br /&gt;
&lt;br /&gt;
=== Deleted WAL files held open by backends in Linux ===&lt;br /&gt;
&lt;br /&gt;
I wasted some time tracking down an oddity which is more of an annoyance in system administration than a real problem, but might look at cleaning it up to save others the bother of investigating the issue.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/15412.1259630304@sss.pgh.pa.us Deleted WAL files held open by backends in Linux]&lt;br /&gt;
&lt;br /&gt;
=== Temporal data improvements ===&lt;br /&gt;
&lt;br /&gt;
Temporal data handling is weak in SQL in general, and the PostgreSQL implementation seems skewed toward scientific or engineering applications, leaving it weaker than some databases on business applications.  (One example would be clean handling of monthy payment schedules.)  Enhancements to this area might be interesting.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4AE944BD.90809@comcast.net Proposal - temporal contrib module]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/48692c2d0911171231h6ab16a64yc4db35a6e26909e0@mail.gmail.com Re: Timezones (in 8.5?)]&lt;br /&gt;
&lt;br /&gt;
=== LSB script compliance ===&lt;br /&gt;
&lt;br /&gt;
I came up with a pretty LSB compliant script for Linux; however, the community feels that most of the logic handled in the shell in this script should be moved into pg_ctl.  There's a pretty long and winding thread on the topic.  The last version of the script might be useful to identify what issues need to be covered in the current pg_ctl code.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A8C41EC.3080708@agliodbs.com We should Axe /contrib/start-scripts]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A9581A5020000250002A37A@gw.wicourts.gov Linux LSB init script]&lt;br /&gt;
&lt;br /&gt;
=== TOAST improvements ===&lt;br /&gt;
&lt;br /&gt;
There have been a few posts to the lists about specific use cases where TOAST defaults are far from optimal.  Allowing some tuning here might be helpful.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A6088D50200002500028940@gw.wicourts.gov Higher TOAST compression]&lt;br /&gt;
&lt;br /&gt;
=== Literal and NULL handling anomalies ===&lt;br /&gt;
&lt;br /&gt;
There are a few corner cases where differences between standard and PostgreSQL behaviors with string literals or NULLs astonish those new to PostgreSQL.  These aren't easily solved, but might be worth the effort.&lt;br /&gt;
&lt;br /&gt;
=== README files ===&lt;br /&gt;
&lt;br /&gt;
Language is not always as readable as it could be, and some files still largely read like proposals for features which are now implemented.&lt;/div&gt;</description>
			<pubDate>Thu, 31 Jan 2013 20:29:36 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/User_talk:Kgrittn</comments>		</item>
		<item>
			<title>Server Configuration</title>
			<link>http://wiki.postgresql.org/wiki/Server_Configuration</link>
			<guid>http://wiki.postgresql.org/wiki/Server_Configuration</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Use two separate SELECT queries rather than a UNION.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This shows all of the server configuration changes made via updates to the postgresql.conf file, from a running server:&lt;br /&gt;
{{SnippetInfo|Dependency display|lang=SQL|category=Administrative}}&lt;br /&gt;
&amp;lt;source lang=&amp;quot;sql&amp;quot;&amp;gt;&lt;br /&gt;
SELECT version();&lt;br /&gt;
SELECT name, current_setting(name), source&lt;br /&gt;
  FROM pg_settings&lt;br /&gt;
  WHERE source NOT IN ('default', 'override');&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
[[Category:SQL]]&lt;/div&gt;</description>
			<pubDate>Sun, 20 Jan 2013 14:51:17 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Server_Configuration</comments>		</item>
		<item>
			<title>Server Configuration</title>
			<link>http://wiki.postgresql.org/wiki/Server_Configuration</link>
			<guid>http://wiki.postgresql.org/wiki/Server_Configuration</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Add source and use NOT IN ('default', 'override') instead of explicit list of excluded settings, per recommendation of Tom Lane.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This shows all of the server configuration changes made via updates to the postgresql.conf file, from a running server:&lt;br /&gt;
{{SnippetInfo|Dependency display|lang=SQL|category=Administrative}}&lt;br /&gt;
&amp;lt;source lang=&amp;quot;sql&amp;quot;&amp;gt;&lt;br /&gt;
SELECT  'version'::text AS name, version() AS current_setting, 'version()'::text as source&lt;br /&gt;
UNION ALL&lt;br /&gt;
SELECT  name, current_setting(name), source&lt;br /&gt;
FROM    pg_settings&lt;br /&gt;
WHERE   source NOT IN ('default', 'override');&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
[[Category:SQL]]&lt;/div&gt;</description>
			<pubDate>Sat, 19 Jan 2013 17:55:48 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Server_Configuration</comments>		</item>
		<item>
			<title>RRReviewers</title>
			<link>http://wiki.postgresql.org/wiki/RRReviewers</link>
			<guid>http://wiki.postgresql.org/wiki/RRReviewers</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* What do I need to do to become an RRR? */ Delete reference to obsolete link.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== What's a Round Robin Reviewer (RRR)? ==&lt;br /&gt;
&lt;br /&gt;
Round Robin Reviewers are hackers who volunteer to be assigned random patches (usually according to their level of ability) to review during a commitfest by the [[Running a CommitFest|CommitFest Manager]].  They generally have at least good C skills and a general knowledge of PostgreSQL, but some are accomplished PostgreSQL hackers.&lt;br /&gt;
&lt;br /&gt;
== What do I need to do to become an RRR? ==&lt;br /&gt;
&lt;br /&gt;
# Subscribe to the [http://archives.postgresql.org/pgsql-hackers/ pgsql-hackers] and the [http://archives.postgresql.org/pgsql-rrreviewers/ pgsql-rrreviewers] mailing lists.&lt;br /&gt;
# Volunteer to review on pgsql-rrreviewers.&lt;br /&gt;
# Sign up for an account on this wiki.&lt;br /&gt;
# Read the [[Reviewing a Patch]] page, and the pages &amp;amp; docs they links to.&lt;br /&gt;
&lt;br /&gt;
== What happens next? ==&lt;br /&gt;
&lt;br /&gt;
# The CommitFest Manager will assign you a patch by e-mail.&lt;br /&gt;
# Visit [https://commitfest.postgresql.org/ commitfest.postgresql.org], edit the patch, and put your name down as a reviewer.  This will confirm that you have set up the community account needed to make entries into the web app, and that you have accepted the assigned patch.&lt;br /&gt;
# If you can't or don't want to review that patch, you should reject it within 24 hours (and you will be assigned a new patch if appropriate).&lt;br /&gt;
# Read the information on the patch and search the mailing list archives for other comments related to the patch.  If you find threads related to the patch which are not yet referenced from the web app entry, please add Comment entries with the ID of a message from the thread.&lt;br /&gt;
# Review the patch within 4 days.&lt;br /&gt;
# Reply to mail thread on -hackers with comments;  make sure the submitter is cc'd.&lt;br /&gt;
# Put a summary of your comments on commitfest.postgresql.org by clicking on the patch, setting Comment Type to Review, pasting in the message ID of your email to the -hackers list, and putting in a brief (one or two line) summary of the review.&lt;br /&gt;
# Unless this was a preliminary review, and you intend to post an additional review very soon (perhaps after a brief on-list discussion), the status of the patch should change:&lt;br /&gt;
## If the patch needs work, and it seems reasonable that the author could complete that work within the current CommitFest, change the status to &amp;quot;Waiting on Author&amp;quot;.&lt;br /&gt;
## If the patch seems to you ready to accept, change the status to &amp;quot;Ready for Committer&amp;quot;.&lt;br /&gt;
## If the purpose of the patch is something the community wants, and the overall approach is viable, but more work is needed than is reasonable to expect that the author can complete within the CF, change the status to &amp;quot;Returned with Feedback&amp;quot; and enter &amp;quot;Date Closed&amp;quot;.&lt;br /&gt;
## If the purpose of the patch is not something the community wants, or the overall approach is not appropriate, change the status to &amp;quot;Rejected&amp;quot; and enter &amp;quot;Date Closed&amp;quot;.&lt;br /&gt;
# In any case, please e-mail the CommitFest Manager that you are ready to work with another patch.&lt;br /&gt;
&lt;br /&gt;
Note that, if you are assigned a &amp;quot;WIP&amp;quot; patch, it is unlikely to be ready for committing.  Instead, you should test the patch as best you can, and report on it to pgsql-hackers.&lt;br /&gt;
&lt;br /&gt;
== Guidelines for Review ==&lt;br /&gt;
&lt;br /&gt;
* Be polite to the patch submitters.  You'll probably be one yourself someday.&lt;br /&gt;
* Raise ''any'' questions you have on pgsql-hackers mailing list.&lt;br /&gt;
* Don't hesitate to ask for help; a poorly done review will slow down the CommitFest far more than not reviewing at all.&lt;br /&gt;
* You are helping out in order to reduce the workload on the committers. Do as much as you can, then stop.&lt;br /&gt;
* Contact the CommitFest Manager immediately if you won't be able to complete a review.&lt;br /&gt;
* When reading from the mail archives, make sure to read the ''entire'' thread about the patch.&lt;br /&gt;
&lt;br /&gt;
[[Category:CommitFest]]&lt;/div&gt;</description>
			<pubDate>Tue, 26 Jun 2012 19:50:37 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:RRReviewers</comments>		</item>
		<item>
			<title>Committing with Git</title>
			<link>http://wiki.postgresql.org/wiki/Committing_with_Git</link>
			<guid>http://wiki.postgresql.org/wiki/Committing_with_Git</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Dependent Clone per Branch, Pushing and Pulling From a Local Repository */ Local clone must fetch from master, not pull.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This document is intended for PostgreSQL project [[Committers]].  Regular project contributors should see the introductions for [[Submitting a Patch]].  This is not a complete tutorial on using git.  See [[Working with Git]] or the [http://git-scm.com/documentation git documentation].&lt;br /&gt;
&lt;br /&gt;
= Common setup =&lt;br /&gt;
&lt;br /&gt;
1. To connect to the gitmaster repository, you will need to have your PostgreSQL SSH key loaded into your SSH agent, or available somewhere that SSH knows to look for it, such as &amp;lt;tt&amp;gt;~/.ssh/authorized_keys&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
2. Since merge commits may not be pushed, it is a good idea to set up your repository to rebase rather than merging.  Without this, if someone else pushes a commit after you pull and before you push, your local repository will contain a merge commit that you'll need to manually remove before you can push.&lt;br /&gt;
&lt;br /&gt;
 cd postgresql&lt;br /&gt;
 git config branch.master.rebase true&lt;br /&gt;
 git config branch.autosetuprebase always&lt;br /&gt;
&lt;br /&gt;
The last of these commands ensures that any subsequent tracking branches that you create will have branch.&amp;lt;name&amp;gt;.rebase configured automatically.&lt;br /&gt;
&lt;br /&gt;
3. Any commits you push must have matching author and committer tags, and your name and email address must match those configured on the server.  So, if you are &amp;lt;tt&amp;gt;Foo Bar &amp;amp;lt;fbar@postgresql.org&amp;amp;gt;&amp;lt;/tt&amp;gt;, do:&lt;br /&gt;
&lt;br /&gt;
 git config user.name &amp;quot;Foo Bar&amp;quot;&lt;br /&gt;
 git config user.email fbar@postgresql.org&lt;br /&gt;
&lt;br /&gt;
If you use the same email address for all of the git repositories where you commit, you can (or perhaps already have) configure it globally instead; this will update &amp;lt;tt&amp;gt;~/.gitconfig&amp;lt;/tt&amp;gt; rather than the &amp;lt;tt&amp;gt;.git/config&amp;lt;/tt&amp;gt; for the current repository:&lt;br /&gt;
&lt;br /&gt;
 git config --global user.name &amp;quot;Foo Bar&amp;quot;&lt;br /&gt;
 git config --global user.email foo@bar.net&lt;br /&gt;
&lt;br /&gt;
4. Always use &amp;quot;git push --dry-run&amp;quot; option before the real thing!&lt;br /&gt;
&lt;br /&gt;
=Committing Using a Single Clone=&lt;br /&gt;
&lt;br /&gt;
1. Clone the master repository.  Note that this is not the same as the public repository; however, changes from the master repository are regularly pushed to the public repository.&lt;br /&gt;
&lt;br /&gt;
 git clone ssh://git@gitmaster.postgresql.org/postgresql.git&lt;br /&gt;
&lt;br /&gt;
2. To commit a patch to the master branch (the equivalent of CVS HEAD), you can use any of the normal git commands.  For example, if you've manually modified files or applied a patch from the mailing list that modifies existing files but does not create any new ones, you can just do:&lt;br /&gt;
&lt;br /&gt;
 git commit -a&lt;br /&gt;
&lt;br /&gt;
If you've added new files, you must &amp;lt;tt&amp;gt;git add&amp;lt;/tt&amp;gt; them first.&lt;br /&gt;
&lt;br /&gt;
 git add file1 file2 file3&lt;br /&gt;
 git commit -a&lt;br /&gt;
&lt;br /&gt;
Or, if the changes you want to commit are on a local branch, you can collapse the commits on the branch into a single commit on the tracking branch using:&lt;br /&gt;
&lt;br /&gt;
 git merge --squash branchname&lt;br /&gt;
&lt;br /&gt;
Make sure to use &amp;lt;tt&amp;gt;--squash&amp;lt;/tt&amp;gt;, or you'll end up with a merge commit.&lt;br /&gt;
&lt;br /&gt;
3. To back-patch, you can check out the appropriate branch; a local tracking branch will automatically be created.  For example:&lt;br /&gt;
&lt;br /&gt;
 git checkout REL9_0_STABLE&lt;br /&gt;
 ...hack, hack...&lt;br /&gt;
 git commit -a&lt;br /&gt;
&lt;br /&gt;
4. Finally, you must push your changes back to the server.&lt;br /&gt;
&lt;br /&gt;
 git push&lt;br /&gt;
&lt;br /&gt;
This will push changes in all branches you've updated, but only branches that also exist on the remote side will be pushed; thus, you can have local working branches that won't be pushed.  Or, for the avoidance of error, you can configure your repository to push only the current branch:&lt;br /&gt;
&lt;br /&gt;
 git config push.default tracking&lt;br /&gt;
&lt;br /&gt;
5. To pull down changes others have committed, you can of course use:&lt;br /&gt;
&lt;br /&gt;
 git pull&lt;br /&gt;
&lt;br /&gt;
If you have unpushed changes to any of the branches that have changed on the server, then (1) git will automatically attempt to rebase the currently checked-out branch (because of the configuration you did in step #2) and (2) each other branch that needs to be rebased will be out-of-sync with the server.  The easiest way to fix this is to just check out the offending branch and re-pull, e.g.&lt;br /&gt;
&lt;br /&gt;
 git checkout REL9_0_STABLE&lt;br /&gt;
 git pull&lt;br /&gt;
&lt;br /&gt;
6. If one of your tracking branches gets messed up somehow (e.g. you accidentally merge into it, or commit something with the wrong author name/tag) and you can't figure out how to fix it, you can just snap it back to the state in which it exists on the master, throwing away your local changes, e.g.&lt;br /&gt;
&lt;br /&gt;
 git checkout master&lt;br /&gt;
 git reset --hard origin/master&lt;br /&gt;
&lt;br /&gt;
Make sure to use the same branch name in both commands.&lt;br /&gt;
&lt;br /&gt;
=Committing Using Multiple Clones=&lt;br /&gt;
&lt;br /&gt;
When applying a patch to many branches, it can become tedious to keep switching branches; it can be nice to be able to see all the different versions side by side.  Furthermore, the PostgreSQL build system isn't smart enough to do the right thing if you switch major releases without cleaning out all the intermediates, so if you do switch branches you'll need to do a complete rebuild each time.  (&amp;lt;tt&amp;gt;git clean -dfx&amp;lt;/tt&amp;gt; is a useful way to clean all the cruft out of your repository, but be careful that you don't have any untracked files there that you meant to keep.)&lt;br /&gt;
&lt;br /&gt;
The use &amp;lt;tt&amp;gt;git clone --reference&amp;lt;/tt&amp;gt; is not recommended except for short-term, throw-away copies, because a subsequent &amp;lt;tt&amp;gt;git gc&amp;lt;/tt&amp;gt; can result in data loss.  Instead, use one of the techniques described below.&lt;br /&gt;
&lt;br /&gt;
==Independent Clone per Branch==&lt;br /&gt;
&lt;br /&gt;
The simplest way to commit using multiple clones is to create them as described above for a single clone, and keep a different branch checked out in each copy.  You can configure each clone to pull only the branch you care about for that clone.  For example, for the REL9_0_STABLE branch:&lt;br /&gt;
&lt;br /&gt;
 git clone ssh://git@gitmaster.postgresql.org/postgresql.git REL9_0_STABLE &lt;br /&gt;
 cd REL9_0_STABLE&lt;br /&gt;
 git checkout REL9_0_STABLE&lt;br /&gt;
 git config branch.REL9_0_STABLE.rebase true&lt;br /&gt;
 git config user.name &amp;quot;Foo Bar&amp;quot;&lt;br /&gt;
 git config user.email fbar@postgresql.org&lt;br /&gt;
 git config remote.origin.fetch '+refs/heads/REL9_0_STABLE:refs/remotes/origin/REL9_0_STABLE'&lt;br /&gt;
 git branch -D master&lt;br /&gt;
&lt;br /&gt;
One disadvantage of this approach is that you will use more disk space: the &amp;lt;tt&amp;gt;.git&amp;lt;/tt&amp;gt; directory for each repository, as of this writing, is a bit more than 220MB.  If this is a concern, use one of the methods described below.&lt;br /&gt;
&lt;br /&gt;
==Dependent Clone per Branch, Pushing and Pulling From a Local Repository==&lt;br /&gt;
&lt;br /&gt;
Git will automatically use hard links when cloning a repository stored on the local machine.  So, you could do this:&lt;br /&gt;
&lt;br /&gt;
 git clone --bare --mirror ssh://git@gitmaster.postgresql.org/postgresql.git&lt;br /&gt;
 git clone postgresql REL9_0_STABLE&lt;br /&gt;
 cd REL9_0_STABLE&lt;br /&gt;
 git checkout REL9_0_STABLE&lt;br /&gt;
 git config branch.REL9_0_STABLE.rebase true&lt;br /&gt;
 git config user.name &amp;quot;Foo Bar&amp;quot;&lt;br /&gt;
 git config user.email fbar@postgresql.org&lt;br /&gt;
 git config remote.origin.fetch '+refs/heads/REL9_0_STABLE:refs/remotes/origin/REL9_0_STABLE'&lt;br /&gt;
 git branch -D master&lt;br /&gt;
&lt;br /&gt;
All of these steps except the first should be repeated for each branch for which you wish to maintain a clone.  (If your user name and/or email are configured globally, you need not configure them again for each new repository.)&lt;br /&gt;
&lt;br /&gt;
With this approach, the REL9_0_STABLE repository will pull from and push to the local postgresql repository, which in turn will pull from and push to the master server.  Thus, you must do this to update (supposing both repositories are in your home directory):&lt;br /&gt;
&lt;br /&gt;
 cd ~/postgresql.git&lt;br /&gt;
 git fetch&lt;br /&gt;
 cd ~/REL9_0_STABLE&lt;br /&gt;
 git pull&lt;br /&gt;
&lt;br /&gt;
And to push, you must do this:&lt;br /&gt;
&lt;br /&gt;
 cd ~/REL9_0_STABLE&lt;br /&gt;
 git push&lt;br /&gt;
 cd ~/postgresql.git&lt;br /&gt;
 git push&lt;br /&gt;
&lt;br /&gt;
It would probably be wise to script this, if you plan to do it regularly and with multiple branches.&lt;br /&gt;
&lt;br /&gt;
==Clone Locally, Repoint Origin==&lt;br /&gt;
&lt;br /&gt;
The solution described in the previous section can be inconvenient, since it requires pushing and pulling each commit twice.  One possible way to avoid this is to create a single clone from the master, then clone it multiple times locally, then repoint the origin server for each such local clone at the master.  The history existing at the time of the initial clone will be shared among all the local clones (using hard links), but any new history fetched after the initial setup will consume separate storage in each local clone.  This still represents a substantial savings in disk space, while avoiding the inconvenience of pushing and pulling twice.&lt;br /&gt;
&lt;br /&gt;
To do this, set up each clone as described in the previous section and then perform the following additional steps:&lt;br /&gt;
&lt;br /&gt;
 git remote set-url origin ssh://git@gitmaster.postgresql.org/postgresql.git&lt;br /&gt;
 git remote update&lt;br /&gt;
 git remote prune origin &lt;br /&gt;
&lt;br /&gt;
==Dependent Clone per Branch, Pulling From a Local Repository and Pushing to the Remote Repository==&lt;br /&gt;
&lt;br /&gt;
This method is like the &amp;quot;Dependent Clone per Branch, Pushing and Pulling From a Local Repository&amp;quot; recipe, but instead of pushing back to the local repository you set it up push direct to the remote repository, by doing, in each clone of your local mirror:&lt;br /&gt;
&lt;br /&gt;
  git remote set-url --push origin ssh://git@gitmaster.postgresql.org/postgresql.git&lt;br /&gt;
&lt;br /&gt;
This requires a fairly modern version of git.&lt;br /&gt;
&lt;br /&gt;
You can also avoid having to fetch into the mirror and then pull into the clone in separate steps by using a shell alias. Here is a function that works with bash to combine these steps:&lt;br /&gt;
&lt;br /&gt;
  function pgpull () &lt;br /&gt;
  {&lt;br /&gt;
    pushd /path/to/mirror &amp;gt; /dev/null&lt;br /&gt;
    git fetch&lt;br /&gt;
    popd &amp;gt; /dev/null&lt;br /&gt;
    git pull&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
= Committing Using a Single Clone and multiple workdirs =&lt;br /&gt;
&lt;br /&gt;
This method is similar to the method with single clone, but you can keep each active branch checked out all the time.&lt;br /&gt;
&lt;br /&gt;
1. Create a directory to hold all the cloned repository and all the workdirs (makes it easier to remember that they're all linked to the same clone).&lt;br /&gt;
&lt;br /&gt;
 mkdir pgsql-git; cd pgsql-git&lt;br /&gt;
&lt;br /&gt;
2. Clone the master repository.  Note that this is not the same as the public repository; however, changes from the master repository are regularly pushed to the public repository.&lt;br /&gt;
&lt;br /&gt;
 git clone ssh://git@gitmaster.postgresql.org/postgresql.git&lt;br /&gt;
&lt;br /&gt;
This creates a directory called &amp;quot;postgresql&amp;quot;, and the master branch is automatically checked out into that working directory. The cloned git repository is in &amp;quot;postgresql/.git&amp;quot;, which is shared with all the other workdirs we create later.&lt;br /&gt;
&lt;br /&gt;
3. Prevent automatic git garbage collection.&lt;br /&gt;
&lt;br /&gt;
 git --git-dir=postgresql/.git/ config gc.auto 0&lt;br /&gt;
&lt;br /&gt;
Rationale: With this method, you have multiple checkouts from a single repository, but &amp;quot;git gc&amp;quot; does not know about the other working directories. That is not a problem in general, but if you run &amp;quot;git gc&amp;quot; when you have staged but uncommitted in a workdir other than the master one, those changes can be lost. This is a known limitation with &amp;lt;tt&amp;gt;git-new-workdir&amp;lt;/tt&amp;gt;, see [http://kerneltrap.org/mailarchive/git/2007/10/11/335637]. Make sure you don't run &amp;quot;git gc&amp;quot; when all the back-branch checkouts are not in a clean state, and you should be safe.&lt;br /&gt;
&lt;br /&gt;
4. Create workdirs for all active backbranches. The &amp;lt;tt&amp;gt;git-new-workdir&amp;lt;/tt&amp;gt; tool is in git contrib directory, the path in the example below is for Debian:&lt;br /&gt;
&lt;br /&gt;
 sh /usr/share/doc/git/contrib/workdir/git-new-workdir postgresql/.git/ 90stable&lt;br /&gt;
 cd 90stable&lt;br /&gt;
 git checkout -b REL9_0_STABLE origin/REL9_0_STABLE&lt;br /&gt;
 cd ..&lt;br /&gt;
&lt;br /&gt;
  &amp;lt;repeat for every active back-branch&amp;gt;&lt;br /&gt;
&lt;br /&gt;
5. You can now work separately on the checkouts as you would on CVS. All the local branches, tracking the corresponding remote branches, are in the repository shared by all workdirs. To commit a patch to any of the branches, you can use any of the normal git commands.  For example, if you've manually modified files or applied a patch from the mailing list that modifies existing files but does not create any new ones, you can just do:&lt;br /&gt;
&lt;br /&gt;
 git commit -a&lt;br /&gt;
&lt;br /&gt;
If you've added new files, you must &amp;lt;tt&amp;gt;git add&amp;lt;/tt&amp;gt; them first.&lt;br /&gt;
&lt;br /&gt;
 git add file1 file2 file3&lt;br /&gt;
 git commit -a&lt;br /&gt;
&lt;br /&gt;
Or, if the changes you want to commit are on a local branch, you can collapse the commits on the branch into a single commit on the tracking branch using:&lt;br /&gt;
&lt;br /&gt;
 git merge --squash branchname&lt;br /&gt;
&lt;br /&gt;
Make sure to use &amp;lt;tt&amp;gt;--squash&amp;lt;/tt&amp;gt;, or you'll end up with a merge commit.&lt;br /&gt;
&lt;br /&gt;
6. Finally, you must push your changes back to the server. This will push changes in *all* branches you've committed to:&lt;br /&gt;
&lt;br /&gt;
 git push --dry-run&lt;br /&gt;
&lt;br /&gt;
This will show what's being pushed without doing anything yet, Double-check the changes, using &amp;quot;git log&amp;quot; and &amp;quot;git diff&amp;quot; with the commitid ranges printed out. Once your satisfied, push them for real:&lt;br /&gt;
&lt;br /&gt;
 git push&lt;br /&gt;
&lt;br /&gt;
7. To pull down changes others have committed, you can of course use:&lt;br /&gt;
&lt;br /&gt;
 git pull&lt;br /&gt;
&lt;br /&gt;
If you have unpushed changes to any of the branches that have changed on the server, then (1) git will automatically attempt to rebase the currently checked-out branch (because of the configuration you did earlier) and (2) each other branch that needs to be rebased will be out-of-sync with the server.  The easiest way to fix this is to just check out the offending branch and re-pull, e.g.&lt;br /&gt;
&lt;br /&gt;
 git checkout REL9_0_STABLE&lt;br /&gt;
 git pull&lt;br /&gt;
&lt;br /&gt;
8. If one of your tracking branches gets messed up somehow (e.g. you accidentally merge into it, or commit something with the wrong author name/tag) and you can't figure out how to fix it, you can just snap it back to the state in which it exists on the master, throwing away your local changes, e.g.&lt;br /&gt;
&lt;br /&gt;
 git checkout master&lt;br /&gt;
 git reset --hard origin/master&lt;br /&gt;
&lt;br /&gt;
Make sure to use the branch name matching the workdir you're in in both commands.&lt;br /&gt;
&lt;br /&gt;
= Making a new release branch on origin =&lt;br /&gt;
&lt;br /&gt;
To create a new branch in the gitmaster repo starting from the current tip of ''master'', do this:&lt;br /&gt;
&lt;br /&gt;
 git pull           # be sure you have the latest &amp;quot;master&amp;quot;&lt;br /&gt;
 git push origin master:refs/heads/&amp;quot;new-branch-name&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for example&lt;br /&gt;
&lt;br /&gt;
 git push origin master:refs/heads/REL9_1_STABLE&lt;br /&gt;
&lt;br /&gt;
After this, check out the branch locally following whichever of the previous arrangements you are using.&lt;br /&gt;
&lt;br /&gt;
By convention, only release branches should be pushed to the gitmaster repo; don't push experimental or feature branches there.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Git]]&lt;/div&gt;</description>
			<pubDate>Mon, 11 Jun 2012 16:46:03 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Committing_with_Git</comments>		</item>
		<item>
			<title>Number Of Database Connections</title>
			<link>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</link>
			<guid>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* How to Find the Optimal Database Connection Pool Size */ Change reference to 9.3 to 9.2.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can often support more concurrent users by reducing the number of database connections and using some form of connection pooling.  This page attempts to explain why that is.&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
&lt;br /&gt;
A database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.  Once all of the resources are in use, you won't push any more work through by having more connections competing for the resources.  In fact, throughput starts to fall off due to the overhead from that contention.  You can generally improve both latency and throughput by limiting the number of database connections with active transactions to match the available number of resources, and queuing any requests to start a new database transaction which come in while at the limit.&lt;br /&gt;
&lt;br /&gt;
Contrary to many people's initial intuitive impulses, you will often see a transaction ''reach completion sooner'' if you queue it when it is ready but the system is busy enough to have saturated resources and ''start it later'' when resources become available.&lt;br /&gt;
&lt;br /&gt;
== The Need for an External Pool ==&lt;br /&gt;
&lt;br /&gt;
If you look at any graph of PostgreSQL performance with number of connections on the ''x'' axis and tps on the ''y'' access (with nothing else changing), you will see performance climb as connections rise until you hit saturation, and then you have a &amp;quot;knee&amp;quot; after which performance falls off.  A lot of work has been done for version 9.2 to push that knee to the right and make the fall-off more gradual, but the issue is intrinsic -- without a built-in connection pool or at least an admission control policy, the knee and subsequent performance degradation will always be there.&lt;br /&gt;
&lt;br /&gt;
The decision not to include a connection pooler inside the PostgreSQL server itself has been taken deliberately and with good reason.  In many cases you will get better performance if the connection pooler is running on a separate machine.  Also, you can get improved functionality by incorporating a connection pool into client-side software.  Many frameworks do the pooling in a process running on the the database server machine (to minimize latency effects from the database protocol) and accept high-level requests to run a certain function with a given set of parameters, with the entire function running as a single database transaction. This ensures that network latency or connection failures can't cause a transaction to hang while waiting for something from the network, and provides a simple way to retry any database transaction which rolls back with a serialization failure (SQLSTATE 40001 or 40P01).&lt;br /&gt;
&lt;br /&gt;
Since a pooler built in to the database engine would be inferior (for the above reasons), the community has decided not to go that route.&lt;br /&gt;
&lt;br /&gt;
== Reasons for Performance Reduction Past the &amp;quot;Knee&amp;quot; ==&lt;br /&gt;
&lt;br /&gt;
There are a number of independent reasons that performance falls off with more database connections.&lt;br /&gt;
&lt;br /&gt;
* '''Context switches'''.  The processor is interrupted from working on one query and has to switch to another, which involves saving state and restoring state.  While the core is busy swapping states it is not doing any useful work on any query.&lt;br /&gt;
&lt;br /&gt;
* '''Cache line contention'''.  One query is likely to be working on a particular area of RAM, and the query taking its place is likely to be working on a different area; causing data cached on the CPU chip to be discarded, only to need to be reloaded to continue the other query.  Besides that the various processes will be grabbing control of cache lines from each other, causing stalls.  (Humorous note, in one oprofile run of a heavily contended load, 10% of CPU time was attributed to a 1-byte noop; analysis showed that it was because it needed to wait on a cache line for the following machine code operation.)&lt;br /&gt;
&lt;br /&gt;
* '''Lock contention'''.  This happens at various levels: spinlocks, LW locks, and all the locks that show up in pg_locks.  As more processes compete for the spinlocks (which protect LW locks acquisition and release, which in turn protect the heavyweight and predicate lock acquisition and release) they account for a high percentage of CPU time used.&lt;br /&gt;
&lt;br /&gt;
* '''RAM usage'''.  The work_mem setting can have a big impact on performance.  If it is too small, hash tables and sorts spill to disk, bitmap heap scans become &amp;quot;lossy&amp;quot;, requiring more work on each page access, etc.  So you want it to be big.  But work_mem RAM can be allocated for each node of a query on each connection, all at the same time.  So a big work_mem with a large number of connections can cause a lot of the OS cache to be periodically discarded, forcing more accesses to disk; or it could even put the system into swapping.  So the more connections you have, the more you need to make a choice between slow plans and trashing cache/swapping.&lt;br /&gt;
&lt;br /&gt;
* '''Disk access'''.  If you ''do'' need to go to disk for random access, a large number of connections can tend to force more tables and indexes to be accessed at the same time, causing heavier seeking all over the disk.&lt;br /&gt;
&lt;br /&gt;
* '''General scaling'''.  Some internal structures allocated based on max_connections scale at O(N^2) or O(N*log(N)).  Some types of overhead which are negligible at a lower number of connections can become significant with a large number of connections.&lt;br /&gt;
&lt;br /&gt;
== How to Find the Optimal Database Connection Pool Size ==&lt;br /&gt;
&lt;br /&gt;
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((''core_count'' * 2) + ''effective_spindle_count'').  Core count should not include HT threads, even if hyperthreading is enabled.  Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls.  Benchmarks of WIP for version 9.2 suggest that this formula will need adjustment on that release.  There hasn't been any analysis so far regarding how well the formula works with SSDs.&lt;br /&gt;
&lt;br /&gt;
However you choose a starting point for a connection pool size, you should probably try incremental adjustments with your production system to find the actual &amp;quot;sweet spot&amp;quot; for your hardware and workload.&lt;/div&gt;</description>
			<pubDate>Thu, 17 May 2012 10:48:44 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Number_Of_Database_Connections</comments>		</item>
		<item>
			<title>Number Of Database Connections</title>
			<link>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</link>
			<guid>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* The Need for an External Pool */ Fix reference to 9.3 to say 9.2.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can often support more concurrent users by reducing the number of database connections and using some form of connection pooling.  This page attempts to explain why that is.&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
&lt;br /&gt;
A database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.  Once all of the resources are in use, you won't push any more work through by having more connections competing for the resources.  In fact, throughput starts to fall off due to the overhead from that contention.  You can generally improve both latency and throughput by limiting the number of database connections with active transactions to match the available number of resources, and queuing any requests to start a new database transaction which come in while at the limit.&lt;br /&gt;
&lt;br /&gt;
Contrary to many people's initial intuitive impulses, you will often see a transaction ''reach completion sooner'' if you queue it when it is ready but the system is busy enough to have saturated resources and ''start it later'' when resources become available.&lt;br /&gt;
&lt;br /&gt;
== The Need for an External Pool ==&lt;br /&gt;
&lt;br /&gt;
If you look at any graph of PostgreSQL performance with number of connections on the ''x'' axis and tps on the ''y'' access (with nothing else changing), you will see performance climb as connections rise until you hit saturation, and then you have a &amp;quot;knee&amp;quot; after which performance falls off.  A lot of work has been done for version 9.2 to push that knee to the right and make the fall-off more gradual, but the issue is intrinsic -- without a built-in connection pool or at least an admission control policy, the knee and subsequent performance degradation will always be there.&lt;br /&gt;
&lt;br /&gt;
The decision not to include a connection pooler inside the PostgreSQL server itself has been taken deliberately and with good reason.  In many cases you will get better performance if the connection pooler is running on a separate machine.  Also, you can get improved functionality by incorporating a connection pool into client-side software.  Many frameworks do the pooling in a process running on the the database server machine (to minimize latency effects from the database protocol) and accept high-level requests to run a certain function with a given set of parameters, with the entire function running as a single database transaction. This ensures that network latency or connection failures can't cause a transaction to hang while waiting for something from the network, and provides a simple way to retry any database transaction which rolls back with a serialization failure (SQLSTATE 40001 or 40P01).&lt;br /&gt;
&lt;br /&gt;
Since a pooler built in to the database engine would be inferior (for the above reasons), the community has decided not to go that route.&lt;br /&gt;
&lt;br /&gt;
== Reasons for Performance Reduction Past the &amp;quot;Knee&amp;quot; ==&lt;br /&gt;
&lt;br /&gt;
There are a number of independent reasons that performance falls off with more database connections.&lt;br /&gt;
&lt;br /&gt;
* '''Context switches'''.  The processor is interrupted from working on one query and has to switch to another, which involves saving state and restoring state.  While the core is busy swapping states it is not doing any useful work on any query.&lt;br /&gt;
&lt;br /&gt;
* '''Cache line contention'''.  One query is likely to be working on a particular area of RAM, and the query taking its place is likely to be working on a different area; causing data cached on the CPU chip to be discarded, only to need to be reloaded to continue the other query.  Besides that the various processes will be grabbing control of cache lines from each other, causing stalls.  (Humorous note, in one oprofile run of a heavily contended load, 10% of CPU time was attributed to a 1-byte noop; analysis showed that it was because it needed to wait on a cache line for the following machine code operation.)&lt;br /&gt;
&lt;br /&gt;
* '''Lock contention'''.  This happens at various levels: spinlocks, LW locks, and all the locks that show up in pg_locks.  As more processes compete for the spinlocks (which protect LW locks acquisition and release, which in turn protect the heavyweight and predicate lock acquisition and release) they account for a high percentage of CPU time used.&lt;br /&gt;
&lt;br /&gt;
* '''RAM usage'''.  The work_mem setting can have a big impact on performance.  If it is too small, hash tables and sorts spill to disk, bitmap heap scans become &amp;quot;lossy&amp;quot;, requiring more work on each page access, etc.  So you want it to be big.  But work_mem RAM can be allocated for each node of a query on each connection, all at the same time.  So a big work_mem with a large number of connections can cause a lot of the OS cache to be periodically discarded, forcing more accesses to disk; or it could even put the system into swapping.  So the more connections you have, the more you need to make a choice between slow plans and trashing cache/swapping.&lt;br /&gt;
&lt;br /&gt;
* '''Disk access'''.  If you ''do'' need to go to disk for random access, a large number of connections can tend to force more tables and indexes to be accessed at the same time, causing heavier seeking all over the disk.&lt;br /&gt;
&lt;br /&gt;
* '''General scaling'''.  Some internal structures allocated based on max_connections scale at O(N^2) or O(N*log(N)).  Some types of overhead which are negligible at a lower number of connections can become significant with a large number of connections.&lt;br /&gt;
&lt;br /&gt;
== How to Find the Optimal Database Connection Pool Size ==&lt;br /&gt;
&lt;br /&gt;
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((''core_count'' * 2) + ''effective_spindle_count'').  Core count should not include HT threads, even if hyperthreading is enabled.  Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls.  Benchmarks of WIP for version 9.3 suggest that this formula will need adjustment on that release.  There hasn't been any analysis so far regarding how well the formula works with SSDs.&lt;br /&gt;
&lt;br /&gt;
However you choose a starting point for a connection pool size, you should probably try incremental adjustments with your production system to find the actual &amp;quot;sweet spot&amp;quot; for your hardware and workload.&lt;/div&gt;</description>
			<pubDate>Thu, 17 May 2012 10:48:02 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Number_Of_Database_Connections</comments>		</item>
		<item>
			<title>Number Of Database Connections</title>
			<link>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</link>
			<guid>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* How to Find the Optimal Database Connection Pool Size */ Fix typo.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can often support more concurrent users by reducing the number of database connections and using some form of connection pooling.  This page attempts to explain why that is.&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
&lt;br /&gt;
A database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.  Once all of the resources are in use, you won't push any more work through by having more connections competing for the resources.  In fact, throughput starts to fall off due to the overhead from that contention.  You can generally improve both latency and throughput by limiting the number of database connections with active transactions to match the available number of resources, and queuing any requests to start a new database transaction which come in while at the limit.&lt;br /&gt;
&lt;br /&gt;
Contrary to many people's initial intuitive impulses, you will often see a transaction ''reach completion sooner'' if you queue it when it is ready but the system is busy enough to have saturated resources and ''start it later'' when resources become available.&lt;br /&gt;
&lt;br /&gt;
== The Need for an External Pool ==&lt;br /&gt;
&lt;br /&gt;
If you look at any graph of PostgreSQL performance with number of connections on the ''x'' axis and tps on the ''y'' access (with nothing else changing), you will see performance climb as connections rise until you hit saturation, and then you have a &amp;quot;knee&amp;quot; after which performance falls off.  A lot of work has been done for version 9.3 to push that knee to the right and make the fall-off more gradual, but the issue is intrinsic -- without a built-in connection pool or at least an admission control policy, the knee and subsequent performance degradation will always be there.&lt;br /&gt;
&lt;br /&gt;
The decision not to include a connection pooler inside the PostgreSQL server itself has been taken deliberately and with good reason.  In many cases you will get better performance if the connection pooler is running on a separate machine.  Also, you can get improved functionality by incorporating a connection pool into client-side software.  Many frameworks do the pooling in a process running on the the database server machine (to minimize latency effects from the database protocol) and accept high-level requests to run a certain function with a given set of parameters, with the entire function running as a single database transaction. This ensures that network latency or connection failures can't cause a transaction to hang while waiting for something from the network, and provides a simple way to retry any database transaction which rolls back with a serialization failure (SQLSTATE 40001 or 40P01).&lt;br /&gt;
&lt;br /&gt;
Since a pooler built in to the database engine would be inferior (for the above reasons), the community has decided not to go that route.&lt;br /&gt;
&lt;br /&gt;
== Reasons for Performance Reduction Past the &amp;quot;Knee&amp;quot; ==&lt;br /&gt;
&lt;br /&gt;
There are a number of independent reasons that performance falls off with more database connections.&lt;br /&gt;
&lt;br /&gt;
* '''Context switches'''.  The processor is interrupted from working on one query and has to switch to another, which involves saving state and restoring state.  While the core is busy swapping states it is not doing any useful work on any query.&lt;br /&gt;
&lt;br /&gt;
* '''Cache line contention'''.  One query is likely to be working on a particular area of RAM, and the query taking its place is likely to be working on a different area; causing data cached on the CPU chip to be discarded, only to need to be reloaded to continue the other query.  Besides that the various processes will be grabbing control of cache lines from each other, causing stalls.  (Humorous note, in one oprofile run of a heavily contended load, 10% of CPU time was attributed to a 1-byte noop; analysis showed that it was because it needed to wait on a cache line for the following machine code operation.)&lt;br /&gt;
&lt;br /&gt;
* '''Lock contention'''.  This happens at various levels: spinlocks, LW locks, and all the locks that show up in pg_locks.  As more processes compete for the spinlocks (which protect LW locks acquisition and release, which in turn protect the heavyweight and predicate lock acquisition and release) they account for a high percentage of CPU time used.&lt;br /&gt;
&lt;br /&gt;
* '''RAM usage'''.  The work_mem setting can have a big impact on performance.  If it is too small, hash tables and sorts spill to disk, bitmap heap scans become &amp;quot;lossy&amp;quot;, requiring more work on each page access, etc.  So you want it to be big.  But work_mem RAM can be allocated for each node of a query on each connection, all at the same time.  So a big work_mem with a large number of connections can cause a lot of the OS cache to be periodically discarded, forcing more accesses to disk; or it could even put the system into swapping.  So the more connections you have, the more you need to make a choice between slow plans and trashing cache/swapping.&lt;br /&gt;
&lt;br /&gt;
* '''Disk access'''.  If you ''do'' need to go to disk for random access, a large number of connections can tend to force more tables and indexes to be accessed at the same time, causing heavier seeking all over the disk.&lt;br /&gt;
&lt;br /&gt;
* '''General scaling'''.  Some internal structures allocated based on max_connections scale at O(N^2) or O(N*log(N)).  Some types of overhead which are negligible at a lower number of connections can become significant with a large number of connections.&lt;br /&gt;
&lt;br /&gt;
== How to Find the Optimal Database Connection Pool Size ==&lt;br /&gt;
&lt;br /&gt;
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((''core_count'' * 2) + ''effective_spindle_count'').  Core count should not include HT threads, even if hyperthreading is enabled.  Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls.  Benchmarks of WIP for version 9.3 suggest that this formula will need adjustment on that release.  There hasn't been any analysis so far regarding how well the formula works with SSDs.&lt;br /&gt;
&lt;br /&gt;
However you choose a starting point for a connection pool size, you should probably try incremental adjustments with your production system to find the actual &amp;quot;sweet spot&amp;quot; for your hardware and workload.&lt;/div&gt;</description>
			<pubDate>Fri, 11 May 2012 03:28:33 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Number_Of_Database_Connections</comments>		</item>
		<item>
			<title>Number Of Database Connections</title>
			<link>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</link>
			<guid>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Summary */ Add a rough description of the desired pool to the summary, and the counter-intuitive nature of a later start time resulting in earlier completion.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can often support more concurrent users by reducing the number of database connections and using some form of connection pooling.  This page attempts to explain why that is.&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
&lt;br /&gt;
A database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.  Once all of the resources are in use, you won't push any more work through by having more connections competing for the resources.  In fact, throughput starts to fall off due to the overhead from that contention.  You can generally improve both latency and throughput by limiting the number of database connections with active transactions to match the available number of resources, and queuing any requests to start a new database transaction which come in while at the limit.&lt;br /&gt;
&lt;br /&gt;
Contrary to many people's initial intuitive impulses, you will often see a transaction ''reach completion sooner'' if you queue it when it is ready but the system is busy enough to have saturated resources and ''start it later'' when resources become available.&lt;br /&gt;
&lt;br /&gt;
== The Need for an External Pool ==&lt;br /&gt;
&lt;br /&gt;
If you look at any graph of PostgreSQL performance with number of connections on the ''x'' axis and tps on the ''y'' access (with nothing else changing), you will see performance climb as connections rise until you hit saturation, and then you have a &amp;quot;knee&amp;quot; after which performance falls off.  A lot of work has been done for version 9.3 to push that knee to the right and make the fall-off more gradual, but the issue is intrinsic -- without a built-in connection pool or at least an admission control policy, the knee and subsequent performance degradation will always be there.&lt;br /&gt;
&lt;br /&gt;
The decision not to include a connection pooler inside the PostgreSQL server itself has been taken deliberately and with good reason.  In many cases you will get better performance if the connection pooler is running on a separate machine.  Also, you can get improved functionality by incorporating a connection pool into client-side software.  Many frameworks do the pooling in a process running on the the database server machine (to minimize latency effects from the database protocol) and accept high-level requests to run a certain function with a given set of parameters, with the entire function running as a single database transaction. This ensures that network latency or connection failures can't cause a transaction to hang while waiting for something from the network, and provides a simple way to retry any database transaction which rolls back with a serialization failure (SQLSTATE 40001 or 40P01).&lt;br /&gt;
&lt;br /&gt;
Since a pooler built in to the database engine would be inferior (for the above reasons), the community has decided not to go that route.&lt;br /&gt;
&lt;br /&gt;
== Reasons for Performance Reduction Past the &amp;quot;Knee&amp;quot; ==&lt;br /&gt;
&lt;br /&gt;
There are a number of independent reasons that performance falls off with more database connections.&lt;br /&gt;
&lt;br /&gt;
* '''Context switches'''.  The processor is interrupted from working on one query and has to switch to another, which involves saving state and restoring state.  While the core is busy swapping states it is not doing any useful work on any query.&lt;br /&gt;
&lt;br /&gt;
* '''Cache line contention'''.  One query is likely to be working on a particular area of RAM, and the query taking its place is likely to be working on a different area; causing data cached on the CPU chip to be discarded, only to need to be reloaded to continue the other query.  Besides that the various processes will be grabbing control of cache lines from each other, causing stalls.  (Humorous note, in one oprofile run of a heavily contended load, 10% of CPU time was attributed to a 1-byte noop; analysis showed that it was because it needed to wait on a cache line for the following machine code operation.)&lt;br /&gt;
&lt;br /&gt;
* '''Lock contention'''.  This happens at various levels: spinlocks, LW locks, and all the locks that show up in pg_locks.  As more processes compete for the spinlocks (which protect LW locks acquisition and release, which in turn protect the heavyweight and predicate lock acquisition and release) they account for a high percentage of CPU time used.&lt;br /&gt;
&lt;br /&gt;
* '''RAM usage'''.  The work_mem setting can have a big impact on performance.  If it is too small, hash tables and sorts spill to disk, bitmap heap scans become &amp;quot;lossy&amp;quot;, requiring more work on each page access, etc.  So you want it to be big.  But work_mem RAM can be allocated for each node of a query on each connection, all at the same time.  So a big work_mem with a large number of connections can cause a lot of the OS cache to be periodically discarded, forcing more accesses to disk; or it could even put the system into swapping.  So the more connections you have, the more you need to make a choice between slow plans and trashing cache/swapping.&lt;br /&gt;
&lt;br /&gt;
* '''Disk access'''.  If you ''do'' need to go to disk for random access, a large number of connections can tend to force more tables and indexes to be accessed at the same time, causing heavier seeking all over the disk.&lt;br /&gt;
&lt;br /&gt;
* '''General scaling'''.  Some internal structures allocated based on max_connections scale at O(N^2) or O(N*log(N)).  Some types of overhead which are negligible at a lower number of connections can become significant with a large number of connections.&lt;br /&gt;
&lt;br /&gt;
== How to Find the Optimal Database Connection Pool Size ==&lt;br /&gt;
&lt;br /&gt;
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((''core_count'' * 2) + ''effective_spindle_count'').  Core count should not include HT threads, even if hyperthreading is enabled.  Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls.  Benchmarks of WIP for version 9.3 suggest that this formula will need adjustment on that release.  There hasn't been any analysis so far regarding how well the formula works with SDDs.&lt;br /&gt;
&lt;br /&gt;
However you choose a starting point for a connection pool size, you should probably try incremental adjustments with your production system to find the actual &amp;quot;sweet spot&amp;quot; for your hardware and workload.&lt;/div&gt;</description>
			<pubDate>Thu, 10 May 2012 15:46:53 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Number_Of_Database_Connections</comments>		</item>
		<item>
			<title>Number Of Database Connections</title>
			<link>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</link>
			<guid>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* How to Find Optimal Database Connection Pool Size */ Add &amp;quot;the&amp;quot; to section title.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can often support more concurrent users by reducing the number of database connections and using some form of connection pooling.  This page attempts to explain why that is.&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
&lt;br /&gt;
A database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.  Once all of the resources are in use, you won't push any more through by having more connections competing for the resources.  In fact, throughput starts to fall off due to the overhead from that contention.&lt;br /&gt;
&lt;br /&gt;
== The Need for an External Pool ==&lt;br /&gt;
&lt;br /&gt;
If you look at any graph of PostgreSQL performance with number of connections on the ''x'' axis and tps on the ''y'' access (with nothing else changing), you will see performance climb as connections rise until you hit saturation, and then you have a &amp;quot;knee&amp;quot; after which performance falls off.  A lot of work has been done for version 9.3 to push that knee to the right and make the fall-off more gradual, but the issue is intrinsic -- without a built-in connection pool or at least an admission control policy, the knee and subsequent performance degradation will always be there.&lt;br /&gt;
&lt;br /&gt;
The decision not to include a connection pooler inside the PostgreSQL server itself has been taken deliberately and with good reason.  In many cases you will get better performance if the connection pooler is running on a separate machine.  Also, you can get improved functionality by incorporating a connection pool into client-side software.  Many frameworks do the pooling in a process running on the the database server machine (to minimize latency effects from the database protocol) and accept high-level requests to run a certain function with a given set of parameters, with the entire function running as a single database transaction. This ensures that network latency or connection failures can't cause a transaction to hang while waiting for something from the network, and provides a simple way to retry any database transaction which rolls back with a serialization failure (SQLSTATE 40001 or 40P01).&lt;br /&gt;
&lt;br /&gt;
Since a pooler built in to the database engine would be inferior (for the above reasons), the community has decided not to go that route.&lt;br /&gt;
&lt;br /&gt;
== Reasons for Performance Reduction Past the &amp;quot;Knee&amp;quot; ==&lt;br /&gt;
&lt;br /&gt;
There are a number of independent reasons that performance falls off with more database connections.&lt;br /&gt;
&lt;br /&gt;
* '''Context switches'''.  The processor is interrupted from working on one query and has to switch to another, which involves saving state and restoring state.  While the core is busy swapping states it is not doing any useful work on any query.&lt;br /&gt;
&lt;br /&gt;
* '''Cache line contention'''.  One query is likely to be working on a particular area of RAM, and the query taking its place is likely to be working on a different area; causing data cached on the CPU chip to be discarded, only to need to be reloaded to continue the other query.  Besides that the various processes will be grabbing control of cache lines from each other, causing stalls.  (Humorous note, in one oprofile run of a heavily contended load, 10% of CPU time was attributed to a 1-byte noop; analysis showed that it was because it needed to wait on a cache line for the following machine code operation.)&lt;br /&gt;
&lt;br /&gt;
* '''Lock contention'''.  This happens at various levels: spinlocks, LW locks, and all the locks that show up in pg_locks.  As more processes compete for the spinlocks (which protect LW locks acquisition and release, which in turn protect the heavyweight and predicate lock acquisition and release) they account for a high percentage of CPU time used.&lt;br /&gt;
&lt;br /&gt;
* '''RAM usage'''.  The work_mem setting can have a big impact on performance.  If it is too small, hash tables and sorts spill to disk, bitmap heap scans become &amp;quot;lossy&amp;quot;, requiring more work on each page access, etc.  So you want it to be big.  But work_mem RAM can be allocated for each node of a query on each connection, all at the same time.  So a big work_mem with a large number of connections can cause a lot of the OS cache to be periodically discarded, forcing more accesses to disk; or it could even put the system into swapping.  So the more connections you have, the more you need to make a choice between slow plans and trashing cache/swapping.&lt;br /&gt;
&lt;br /&gt;
* '''Disk access'''.  If you ''do'' need to go to disk for random access, a large number of connections can tend to force more tables and indexes to be accessed at the same time, causing heavier seeking all over the disk.&lt;br /&gt;
&lt;br /&gt;
* '''General scaling'''.  Some internal structures allocated based on max_connections scale at O(N^2) or O(N*log(N)).  Some types of overhead which are negligible at a lower number of connections can become significant with a large number of connections.&lt;br /&gt;
&lt;br /&gt;
== How to Find the Optimal Database Connection Pool Size ==&lt;br /&gt;
&lt;br /&gt;
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((''core_count'' * 2) + ''effective_spindle_count'').  Core count should not include HT threads, even if hyperthreading is enabled.  Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls.  Benchmarks of WIP for version 9.3 suggest that this formula will need adjustment on that release.  There hasn't been any analysis so far regarding how well the formula works with SDDs.&lt;br /&gt;
&lt;br /&gt;
However you choose a starting point for a connection pool size, you should probably try incremental adjustments with your production system to find the actual &amp;quot;sweet spot&amp;quot; for your hardware and workload.&lt;/div&gt;</description>
			<pubDate>Thu, 10 May 2012 15:36:36 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Number_Of_Database_Connections</comments>		</item>
		<item>
			<title>Number Of Database Connections</title>
			<link>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</link>
			<guid>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Reasons for Performance Reduction Past &amp;quot;Knee&amp;quot; */ add &amp;quot;the&amp;quot; to section title&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can often support more concurrent users by reducing the number of database connections and using some form of connection pooling.  This page attempts to explain why that is.&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
&lt;br /&gt;
A database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.  Once all of the resources are in use, you won't push any more through by having more connections competing for the resources.  In fact, throughput starts to fall off due to the overhead from that contention.&lt;br /&gt;
&lt;br /&gt;
== The Need for an External Pool ==&lt;br /&gt;
&lt;br /&gt;
If you look at any graph of PostgreSQL performance with number of connections on the ''x'' axis and tps on the ''y'' access (with nothing else changing), you will see performance climb as connections rise until you hit saturation, and then you have a &amp;quot;knee&amp;quot; after which performance falls off.  A lot of work has been done for version 9.3 to push that knee to the right and make the fall-off more gradual, but the issue is intrinsic -- without a built-in connection pool or at least an admission control policy, the knee and subsequent performance degradation will always be there.&lt;br /&gt;
&lt;br /&gt;
The decision not to include a connection pooler inside the PostgreSQL server itself has been taken deliberately and with good reason.  In many cases you will get better performance if the connection pooler is running on a separate machine.  Also, you can get improved functionality by incorporating a connection pool into client-side software.  Many frameworks do the pooling in a process running on the the database server machine (to minimize latency effects from the database protocol) and accept high-level requests to run a certain function with a given set of parameters, with the entire function running as a single database transaction. This ensures that network latency or connection failures can't cause a transaction to hang while waiting for something from the network, and provides a simple way to retry any database transaction which rolls back with a serialization failure (SQLSTATE 40001 or 40P01).&lt;br /&gt;
&lt;br /&gt;
Since a pooler built in to the database engine would be inferior (for the above reasons), the community has decided not to go that route.&lt;br /&gt;
&lt;br /&gt;
== Reasons for Performance Reduction Past the &amp;quot;Knee&amp;quot; ==&lt;br /&gt;
&lt;br /&gt;
There are a number of independent reasons that performance falls off with more database connections.&lt;br /&gt;
&lt;br /&gt;
* '''Context switches'''.  The processor is interrupted from working on one query and has to switch to another, which involves saving state and restoring state.  While the core is busy swapping states it is not doing any useful work on any query.&lt;br /&gt;
&lt;br /&gt;
* '''Cache line contention'''.  One query is likely to be working on a particular area of RAM, and the query taking its place is likely to be working on a different area; causing data cached on the CPU chip to be discarded, only to need to be reloaded to continue the other query.  Besides that the various processes will be grabbing control of cache lines from each other, causing stalls.  (Humorous note, in one oprofile run of a heavily contended load, 10% of CPU time was attributed to a 1-byte noop; analysis showed that it was because it needed to wait on a cache line for the following machine code operation.)&lt;br /&gt;
&lt;br /&gt;
* '''Lock contention'''.  This happens at various levels: spinlocks, LW locks, and all the locks that show up in pg_locks.  As more processes compete for the spinlocks (which protect LW locks acquisition and release, which in turn protect the heavyweight and predicate lock acquisition and release) they account for a high percentage of CPU time used.&lt;br /&gt;
&lt;br /&gt;
* '''RAM usage'''.  The work_mem setting can have a big impact on performance.  If it is too small, hash tables and sorts spill to disk, bitmap heap scans become &amp;quot;lossy&amp;quot;, requiring more work on each page access, etc.  So you want it to be big.  But work_mem RAM can be allocated for each node of a query on each connection, all at the same time.  So a big work_mem with a large number of connections can cause a lot of the OS cache to be periodically discarded, forcing more accesses to disk; or it could even put the system into swapping.  So the more connections you have, the more you need to make a choice between slow plans and trashing cache/swapping.&lt;br /&gt;
&lt;br /&gt;
* '''Disk access'''.  If you ''do'' need to go to disk for random access, a large number of connections can tend to force more tables and indexes to be accessed at the same time, causing heavier seeking all over the disk.&lt;br /&gt;
&lt;br /&gt;
* '''General scaling'''.  Some internal structures allocated based on max_connections scale at O(N^2) or O(N*log(N)).  Some types of overhead which are negligible at a lower number of connections can become significant with a large number of connections.&lt;br /&gt;
&lt;br /&gt;
== How to Find Optimal Database Connection Pool Size ==&lt;br /&gt;
&lt;br /&gt;
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((''core_count'' * 2) + ''effective_spindle_count'').  Core count should not include HT threads, even if hyperthreading is enabled.  Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls.  Benchmarks of WIP for version 9.3 suggest that this formula will need adjustment on that release.  There hasn't been any analysis so far regarding how well the formula works with SDDs.&lt;br /&gt;
&lt;br /&gt;
However you choose a starting point for a connection pool size, you should probably try incremental adjustments with your production system to find the actual &amp;quot;sweet spot&amp;quot; for your hardware and workload.&lt;/div&gt;</description>
			<pubDate>Thu, 10 May 2012 15:35:42 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Number_Of_Database_Connections</comments>		</item>
		<item>
			<title>Number Of Database Connections</title>
			<link>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</link>
			<guid>http://wiki.postgresql.org/wiki/Number_Of_Database_Connections</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Initial page creation&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;You can often support more concurrent users by reducing the number of database connections and using some form of connection pooling.  This page attempts to explain why that is.&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
&lt;br /&gt;
A database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.  Once all of the resources are in use, you won't push any more through by having more connections competing for the resources.  In fact, throughput starts to fall off due to the overhead from that contention.&lt;br /&gt;
&lt;br /&gt;
== The Need for an External Pool ==&lt;br /&gt;
&lt;br /&gt;
If you look at any graph of PostgreSQL performance with number of connections on the ''x'' axis and tps on the ''y'' access (with nothing else changing), you will see performance climb as connections rise until you hit saturation, and then you have a &amp;quot;knee&amp;quot; after which performance falls off.  A lot of work has been done for version 9.3 to push that knee to the right and make the fall-off more gradual, but the issue is intrinsic -- without a built-in connection pool or at least an admission control policy, the knee and subsequent performance degradation will always be there.&lt;br /&gt;
&lt;br /&gt;
The decision not to include a connection pooler inside the PostgreSQL server itself has been taken deliberately and with good reason.  In many cases you will get better performance if the connection pooler is running on a separate machine.  Also, you can get improved functionality by incorporating a connection pool into client-side software.  Many frameworks do the pooling in a process running on the the database server machine (to minimize latency effects from the database protocol) and accept high-level requests to run a certain function with a given set of parameters, with the entire function running as a single database transaction. This ensures that network latency or connection failures can't cause a transaction to hang while waiting for something from the network, and provides a simple way to retry any database transaction which rolls back with a serialization failure (SQLSTATE 40001 or 40P01).&lt;br /&gt;
&lt;br /&gt;
Since a pooler built in to the database engine would be inferior (for the above reasons), the community has decided not to go that route.&lt;br /&gt;
&lt;br /&gt;
== Reasons for Performance Reduction Past &amp;quot;Knee&amp;quot; ==&lt;br /&gt;
&lt;br /&gt;
There are a number of independent reasons that performance falls off with more database connections.&lt;br /&gt;
&lt;br /&gt;
* '''Context switches'''.  The processor is interrupted from working on one query and has to switch to another, which involves saving state and restoring state.  While the core is busy swapping states it is not doing any useful work on any query.&lt;br /&gt;
&lt;br /&gt;
* '''Cache line contention'''.  One query is likely to be working on a particular area of RAM, and the query taking its place is likely to be working on a different area; causing data cached on the CPU chip to be discarded, only to need to be reloaded to continue the other query.  Besides that the various processes will be grabbing control of cache lines from each other, causing stalls.  (Humorous note, in one oprofile run of a heavily contended load, 10% of CPU time was attributed to a 1-byte noop; analysis showed that it was because it needed to wait on a cache line for the following machine code operation.)&lt;br /&gt;
&lt;br /&gt;
* '''Lock contention'''.  This happens at various levels: spinlocks, LW locks, and all the locks that show up in pg_locks.  As more processes compete for the spinlocks (which protect LW locks acquisition and release, which in turn protect the heavyweight and predicate lock acquisition and release) they account for a high percentage of CPU time used.&lt;br /&gt;
&lt;br /&gt;
* '''RAM usage'''.  The work_mem setting can have a big impact on performance.  If it is too small, hash tables and sorts spill to disk, bitmap heap scans become &amp;quot;lossy&amp;quot;, requiring more work on each page access, etc.  So you want it to be big.  But work_mem RAM can be allocated for each node of a query on each connection, all at the same time.  So a big work_mem with a large number of connections can cause a lot of the OS cache to be periodically discarded, forcing more accesses to disk; or it could even put the system into swapping.  So the more connections you have, the more you need to make a choice between slow plans and trashing cache/swapping.&lt;br /&gt;
&lt;br /&gt;
* '''Disk access'''.  If you ''do'' need to go to disk for random access, a large number of connections can tend to force more tables and indexes to be accessed at the same time, causing heavier seeking all over the disk.&lt;br /&gt;
&lt;br /&gt;
* '''General scaling'''.  Some internal structures allocated based on max_connections scale at O(N^2) or O(N*log(N)).  Some types of overhead which are negligible at a lower number of connections can become significant with a large number of connections.&lt;br /&gt;
&lt;br /&gt;
== How to Find Optimal Database Connection Pool Size ==&lt;br /&gt;
&lt;br /&gt;
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((''core_count'' * 2) + ''effective_spindle_count'').  Core count should not include HT threads, even if hyperthreading is enabled.  Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls.  Benchmarks of WIP for version 9.3 suggest that this formula will need adjustment on that release.  There hasn't been any analysis so far regarding how well the formula works with SDDs.&lt;br /&gt;
&lt;br /&gt;
However you choose a starting point for a connection pool size, you should probably try incremental adjustments with your production system to find the actual &amp;quot;sweet spot&amp;quot; for your hardware and workload.&lt;/div&gt;</description>
			<pubDate>Thu, 10 May 2012 15:34:36 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Number_Of_Database_Connections</comments>		</item>
		<item>
			<title>PgCon 2012 Developer Meeting</title>
			<link>http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting</link>
			<guid>http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Agenda */ Correct apparent typo in agenda item for CF schedule: we want to discuss 9.3 CF schedule, not 9.2&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A meeting of the most active PostgreSQL developers is being planned for Wednesday 16th May, 2012 near the University of Ottawa, prior to pgCon 2012. In order to keep the numbers manageable, this meeting is '''by invitation only'''. Unfortunately it is quite possible that we've overlooked important code developers during the planning of the event - if you feel you fall into this category and would like to attend, please contact Dave Page (dpage@pgadmin.org). &lt;br /&gt;
&lt;br /&gt;
Please note that this year the attendee numbers have been cut to try to keep the meeting more productive. Invitations have been sent only to developers that have been highly active on the database server over the 9.2 release cycle. We have not invited any contributors based on their contributions to related projects, or seniority in regional user groups or sponsoring companies, unlike in previous years.&lt;br /&gt;
&lt;br /&gt;
This is a PostgreSQL Community event. Room and refreshments/food sponsored by EnterpriseDB. Other companies sponsored attendance for their developers.&lt;br /&gt;
 &lt;br /&gt;
== Time &amp;amp; Location ==&lt;br /&gt;
&lt;br /&gt;
The meeting will be from 9AM to 5PM, and will be in the &amp;quot;Red Experience&amp;quot; room at:&lt;br /&gt;
&lt;br /&gt;
 Novotel Ottawa&lt;br /&gt;
 33 Nicholas Street&lt;br /&gt;
 Ottawa&lt;br /&gt;
 Ontario&lt;br /&gt;
 K1N 9M7&lt;br /&gt;
 &lt;br /&gt;
Food and drink will be provided throughout the day, including breakfast from 8AM.&lt;br /&gt;
&lt;br /&gt;
[http://maps.google.ca/maps?f=q&amp;amp;source=s_q&amp;amp;hl=en&amp;amp;geocode=&amp;amp;q=novotel+ottawa&amp;amp;aq=&amp;amp;sll=49.891235,-97.15369&amp;amp;sspn=36.237851,79.013672&amp;amp;ie=UTF8&amp;amp;hq=novotel+ottawa&amp;amp;hnear=&amp;amp;ll=45.421528,-75.683699&amp;amp;spn=0.036869,0.077162&amp;amp;z=14&amp;amp;iwloc=A&amp;amp;layer=c&amp;amp;cbll=45.425741,-75.689638&amp;amp;panoid=Z4FUGnkZkdHAOkIxyjjS9Q&amp;amp;cbp=12,25.83,,0,-0.6 View on Google Maps]&lt;br /&gt;
&lt;br /&gt;
== Attendees ==&lt;br /&gt;
&lt;br /&gt;
The following people have RSVPed to the meeting (in alphabetical order, by surname):&lt;br /&gt;
&lt;br /&gt;
* Oleg Bartunov&lt;br /&gt;
* Josh Berkus (Secretary)&lt;br /&gt;
* Jeff Davis&lt;br /&gt;
* Andrew Dunstan&lt;br /&gt;
* Dimitri Fontaine&lt;br /&gt;
* Stephen Frost&lt;br /&gt;
* Peter Geoghegan&lt;br /&gt;
* Kevin Grittner&lt;br /&gt;
* Robert Haas&lt;br /&gt;
* Magnus Hagander&lt;br /&gt;
* Shigeru Hanada&lt;br /&gt;
* Hitoshi Harada&lt;br /&gt;
* KaiGai Kohei&lt;br /&gt;
* Tom Lane&lt;br /&gt;
* Noah Misch&lt;br /&gt;
* Bruce Momjian&lt;br /&gt;
* Dave Page (Chair)&lt;br /&gt;
* Simon Riggs&lt;br /&gt;
* Teodor Sigaev&lt;br /&gt;
* Greg Smith&lt;br /&gt;
&lt;br /&gt;
== Proposed Agenda Items ==&lt;br /&gt;
&lt;br /&gt;
Please list proposed agenda items here:&lt;br /&gt;
&lt;br /&gt;
* Agree CommitFest schedule for 9.3 (Strawman from Simon)&lt;br /&gt;
** CF1 June 15, 2012 - 1 month&lt;br /&gt;
** CF2 Sep 15, 2012 - 1 month&lt;br /&gt;
** CF3 Nov 15, 2012 - 1 month&lt;br /&gt;
** CF4 Jan 15, 2013 - 2 months&lt;br /&gt;
* Priorities for 9.3 [All]&lt;br /&gt;
** Description: discuss what people are working on and what's likely to be in 9.3.&lt;br /&gt;
** Goals: set expectations and coordinate work schedules for 9.3.&lt;br /&gt;
* Queuing [Dimitri, Kevin]&lt;br /&gt;
** Description: efficient and transactional queuing is a very common need for application using databases, and could help implementing some internal features&lt;br /&gt;
** Goals: get an agreement that core is the right place where to solve that problem, and what parts of it we want in core exactly&lt;br /&gt;
* Materialized views [Kevin]&lt;br /&gt;
** Description: Declarative materialized views are a frequently requested feature, but means many things to many people.  It's not likely that an initial implementation will address everything.  We need a base set of functionality on which to build.&lt;br /&gt;
** Goals: Reach consensus on what a minimum feature set for commit would be.&lt;br /&gt;
* Partitioning and Segment Exclusion [Dimitri]&lt;br /&gt;
** Description: to solve partitioning, we need to agree on a global approach&lt;br /&gt;
** Goals: agreeing on SE as a basis for better partitioning, having a &amp;quot;GO&amp;quot; on working on SE&lt;br /&gt;
* The MERGE statement: Challenges and priorities [Peter G]&lt;br /&gt;
** Description: Implementing the MERGE statement for 9.3. It is envisaged specifically as an atomic &amp;quot;upsert&amp;quot; operation.&lt;br /&gt;
** Goals: To get buy-in on various aspects of the feature's development, and, ideally, to secure reviewer resources or other support. Because of the complexity of the feature, early interest from reviewers is preferable.&lt;br /&gt;
* Row-level Access Control and SELinux [KaiGai]&lt;br /&gt;
** Security label on user tables&lt;br /&gt;
** Dynamic expandable enum data types&lt;br /&gt;
** Enforcement of triggers by extension&lt;br /&gt;
* Enhancement of FDW at v9.3 [KaiGai]&lt;br /&gt;
** Writable foreign tables&lt;br /&gt;
** Stuffs to be pushed down (Join, Aggregate, Sort, ...)&lt;br /&gt;
** Inheritance of foreign/regular tables&lt;br /&gt;
** Constraint (PK/FK) &amp;amp; Trigger support.&lt;br /&gt;
* Type registry [Andrew]&lt;br /&gt;
** Provide for known OIDs for non-builtin types, and possibly for their IO functions too&lt;br /&gt;
** Would make it possible to write code in core or in extension X that handles a type defined in extension Y.&lt;br /&gt;
* Ending CommitFests in a timely fashion, especially the last one.  Avoiding a crush of massive feature patches at the end of the cycle.  Handling big patches that aren't quite ready yet.  Getting more people to help with patch review. [Robert]&lt;br /&gt;
* What Developers Want [Josh]&lt;br /&gt;
** Description: a top-5 list of features and obstacles to developer adoption of PostgreSQL (with slides)&lt;br /&gt;
** Goal: to set priorities for some features aimed at application users&lt;br /&gt;
* In-Place Upgrades &amp;amp; Checksums [Greg Smith, Simon]&lt;br /&gt;
** Description:  Revisit in-place upgrades of the page format, now that pg_upgrade is available and multiple checksum implementations needing it have been proposed.&lt;br /&gt;
** Goal:  Nail down some incremental milestones for 9.3 development to aim at.&lt;br /&gt;
* Autonomous Transactions [Simon]&lt;br /&gt;
** Overview of idea, relationship to stored procedures&lt;br /&gt;
** Feedback, buy-in and/or alternatives&lt;br /&gt;
* Parallel Query [Bruce Momjian]&lt;br /&gt;
** Hope to get buy-in for what parallel operations we are hoping to add in upcoming releases&lt;br /&gt;
* Report from Clustering Meeting [Josh] (10 min)&lt;br /&gt;
** Description: to summarize the discussions of the cluster-hackers meeting from the previous day&lt;br /&gt;
** Goal: inter-team synchronization.  Possibly, decisions requested on specific in-core features.&lt;br /&gt;
* Double Write Buffers [Simon]&lt;br /&gt;
** Is anyone committing to do that for 9.3?&lt;br /&gt;
* Summarise Commitments at End of Play [Simon]&lt;br /&gt;
** For roadmap and planning purposes, confirm who is doing what, assign interested reviewers at start&lt;br /&gt;
** Check gaps, identif priorities early on in cycle&lt;br /&gt;
&lt;br /&gt;
== Agenda ==&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;4&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Item&lt;br /&gt;
!Presenter&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:00&lt;br /&gt;
|Breakfast&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:30 - 08:45&lt;br /&gt;
|Welcome and introductions&lt;br /&gt;
|Dave Page&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|08:45 - 09:10&lt;br /&gt;
|Goals for 9.3&lt;br /&gt;
|Josh Berkus&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|09:10 - 09:35&lt;br /&gt;
|Commitfest management&lt;br /&gt;
|Robert Haas&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|09:35 - 09:50&lt;br /&gt;
|9.3 commitfest schedule&lt;br /&gt;
|Simon Riggs&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|09:50 - 10:10&lt;br /&gt;
|Type registry&lt;br /&gt;
|Andrew Dunstan&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|10:10 - 10:30&lt;br /&gt;
|Access control and SELinux&lt;br /&gt;
|KaiGai Kohei&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|10:30 - 10:45&lt;br /&gt;
|Coffee break&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|10:45 - 11:15&lt;br /&gt;
|Enhancement of FDWs in 9.3&lt;br /&gt;
|KaiGai Kohei&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|11:15 - 11:40&lt;br /&gt;
|Autonomous transactions&lt;br /&gt;
|Simon Riggs&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|11:40 - 12:05&lt;br /&gt;
|Partitioning and segment exclusion&lt;br /&gt;
|Dimitri Fontaine&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|12:05 - 12:30&lt;br /&gt;
|Queuing&lt;br /&gt;
|Dimitri Fontaine/Kevin Grittner&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|12:30 - 13:30&lt;br /&gt;
|Lunch	&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|13:30 - 14:00&lt;br /&gt;
|What developers want&lt;br /&gt;
|Josh Berkus&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|14:00 - 14:30&lt;br /&gt;
|The MERGE statement: Challenges and priorities&lt;br /&gt;
|Peter Geoghegan&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|14:30 - 15:00&lt;br /&gt;
|Materialised views&lt;br /&gt;
|Kevin Grittner&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|15:00 - 15:15&lt;br /&gt;
|Tea break&lt;br /&gt;
|&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|15:15 - 15:45&lt;br /&gt;
|In place upgrades and checksums&lt;br /&gt;
|Simon Riggs/Greg Smith&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|15:45 - 16:15&lt;br /&gt;
|Parallel Query&lt;br /&gt;
|Bruce Momjian&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|16:15 - 16:25&lt;br /&gt;
|Report from the Clustering Meeting&lt;br /&gt;
|Josh Berkus&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|16:25 - 16:45&lt;br /&gt;
|Summarise commitments and identify priorities&lt;br /&gt;
|Simon Riggs&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|16:45 - 17:00&lt;br /&gt;
|Any other business/group photo&lt;br /&gt;
|Dave Page&lt;br /&gt;
&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|17:00&lt;br /&gt;
|Finish&lt;br /&gt;
|	&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Minutes==&lt;/div&gt;</description>
			<pubDate>Wed, 09 May 2012 21:48:07 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:PgCon_2012_Developer_Meeting</comments>		</item>
		<item>
			<title>Lock dependency information</title>
			<link>http://wiki.postgresql.org/wiki/Lock_dependency_information</link>
			<guid>http://wiki.postgresql.org/wiki/Lock_dependency_information</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Use section headers to visually separate the two views.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{SnippetInfo|Lock dependency info|lang=SQL|category=Performance}}&lt;br /&gt;
At times it is very usefull to see which locks depend uppon each other.&lt;br /&gt;
&lt;br /&gt;
== Flat View of Blocking ==&lt;br /&gt;
&lt;br /&gt;
All columns prefixed with ''waiting_'' hold information about the not granted lock, while the colums prefixed with ''other_'' hold information about other locks on the same relation respectively transactionid &lt;br /&gt;
&amp;lt;source lang=&amp;quot;SQL&amp;quot;&amp;gt;SELECT &lt;br /&gt;
    waiting.locktype           AS waiting_locktype,&lt;br /&gt;
    waiting.relation::regclass AS waiting_table,&lt;br /&gt;
    waiting_stm.current_query  AS waiting_query,&lt;br /&gt;
    waiting.mode               AS waiting_mode,&lt;br /&gt;
    waiting.pid                AS waiting_pid,&lt;br /&gt;
    other.locktype             AS other_locktype,&lt;br /&gt;
    other.relation::regclass   AS other_table,&lt;br /&gt;
    other_stm.current_query    AS other_query,&lt;br /&gt;
    other.mode                 AS other_mode,&lt;br /&gt;
    other.pid                  AS other_pid,&lt;br /&gt;
    other.granted              AS other_granted&lt;br /&gt;
FROM&lt;br /&gt;
    pg_catalog.pg_locks AS waiting&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_stat_activity AS waiting_stm&lt;br /&gt;
    ON (&lt;br /&gt;
        waiting_stm.procpid = waiting.pid&lt;br /&gt;
    )&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_locks AS other&lt;br /&gt;
    ON (&lt;br /&gt;
        (&lt;br /&gt;
            waiting.&amp;quot;database&amp;quot; = other.&amp;quot;database&amp;quot;&lt;br /&gt;
        AND waiting.relation  = other.relation&lt;br /&gt;
        )&lt;br /&gt;
        OR waiting.transactionid = other.transactionid&lt;br /&gt;
    )&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_stat_activity AS other_stm&lt;br /&gt;
    ON (&lt;br /&gt;
        other_stm.procpid = other.pid&lt;br /&gt;
    )&lt;br /&gt;
WHERE&lt;br /&gt;
    NOT waiting.granted&lt;br /&gt;
AND&lt;br /&gt;
    waiting.pid &amp;lt;&amp;gt; other.pid&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It would be useful to add extra columns indicating how long the waiting statement has been blocked.&lt;br /&gt;
&lt;br /&gt;
== Recursive View of Blocking ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;SQL&amp;quot;&amp;gt;WITH RECURSIVE&lt;br /&gt;
     c(requested, current) AS&lt;br /&gt;
       ( VALUES&lt;br /&gt;
         ('AccessShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('RowShareLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('RowShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'RowShareLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'AccessShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'RowShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text)&lt;br /&gt;
       ),&lt;br /&gt;
     l AS&lt;br /&gt;
       (&lt;br /&gt;
         SELECT&lt;br /&gt;
             (locktype,database,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS target,&lt;br /&gt;
             virtualtransaction,&lt;br /&gt;
             pid,&lt;br /&gt;
             mode,&lt;br /&gt;
             granted&lt;br /&gt;
           FROM pg_catalog.pg_locks&lt;br /&gt;
       ),&lt;br /&gt;
     t AS&lt;br /&gt;
       (&lt;br /&gt;
         SELECT&lt;br /&gt;
             blocker.target  AS blocker_target,&lt;br /&gt;
             blocker.pid     AS blocker_pid,&lt;br /&gt;
             blocker.mode    AS blocker_mode,&lt;br /&gt;
             '1'::int        AS depth,&lt;br /&gt;
             blocked.target  AS target,&lt;br /&gt;
             blocked.pid     AS pid,&lt;br /&gt;
             blocked.mode    AS mode,&lt;br /&gt;
             blocker.pid::text || ',' || blocked.pid::text AS seq&lt;br /&gt;
           FROM l blocker&lt;br /&gt;
           JOIN l blocked&lt;br /&gt;
             ON ( not blocked.granted&lt;br /&gt;
              AND blocked.target IS NOT DISTINCT FROM blocker.target)&lt;br /&gt;
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)&lt;br /&gt;
         UNION ALL&lt;br /&gt;
         SELECT&lt;br /&gt;
             blocker.target,&lt;br /&gt;
             blocker.pid,&lt;br /&gt;
             blocker.mode,&lt;br /&gt;
             depth + 1,&lt;br /&gt;
             blocked.target,&lt;br /&gt;
             blocked.pid,&lt;br /&gt;
             blocked.mode,&lt;br /&gt;
             blocker.seq || ',' || blocked.pid::text&lt;br /&gt;
           FROM t blocker&lt;br /&gt;
           JOIN l blocked&lt;br /&gt;
             ON ( not blocked.granted&lt;br /&gt;
              AND blocked.target IS NOT DISTINCT FROM blocker.target)&lt;br /&gt;
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)&lt;br /&gt;
           WHERE depth &amp;lt; 1000&lt;br /&gt;
       )&lt;br /&gt;
SELECT target, blocker_pid, blocker_mode, depth, pid as blocked_pid, mode as blocked_mode, seq&lt;br /&gt;
  FROM t ORDER BY seq;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;/div&gt;</description>
			<pubDate>Mon, 07 May 2012 17:31:47 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Lock_dependency_information</comments>		</item>
		<item>
			<title>PgCon 2012 Developer Meeting</title>
			<link>http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting</link>
			<guid>http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Proposed Agenda Items */ Fill in description and goals for Materialized views.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A meeting of the most active PostgreSQL developers is being planned for Wednesday 16th May, 2012 near the University of Ottawa, prior to pgCon 2012. In order to keep the numbers manageable, this meeting is '''by invitation only'''. Unfortunately it is quite possible that we've overlooked important code developers during the planning of the event - if you feel you fall into this category and would like to attend, please contact Dave Page (dpage@pgadmin.org). &lt;br /&gt;
&lt;br /&gt;
Please note that this year the attendee numbers have been cut to try to keep the meeting more productive. Invitations have been sent only to developers that have been highly active on the database server over the 9.2 release cycle. We have not invited any contributors based on their contributions to related projects, or seniority in regional user groups or sponsoring companies, unlike in previous years.&lt;br /&gt;
&lt;br /&gt;
This is a PostgreSQL Community event. Room and refreshments/food sponsored by EnterpriseDB. Other companies sponsored attendance for their developers.&lt;br /&gt;
 &lt;br /&gt;
== Time &amp;amp; Location ==&lt;br /&gt;
&lt;br /&gt;
The meeting will be from 9AM to 5PM, and will be in the &amp;quot;Red Experience&amp;quot; room at:&lt;br /&gt;
&lt;br /&gt;
 Novotel Ottawa&lt;br /&gt;
 33 Nicholas Street&lt;br /&gt;
 Ottawa&lt;br /&gt;
 Ontario&lt;br /&gt;
 K1N 9M7&lt;br /&gt;
 &lt;br /&gt;
Food and drink will be provided throughout the day, including breakfast from 8AM.&lt;br /&gt;
&lt;br /&gt;
[http://maps.google.ca/maps?f=q&amp;amp;source=s_q&amp;amp;hl=en&amp;amp;geocode=&amp;amp;q=novotel+ottawa&amp;amp;aq=&amp;amp;sll=49.891235,-97.15369&amp;amp;sspn=36.237851,79.013672&amp;amp;ie=UTF8&amp;amp;hq=novotel+ottawa&amp;amp;hnear=&amp;amp;ll=45.421528,-75.683699&amp;amp;spn=0.036869,0.077162&amp;amp;z=14&amp;amp;iwloc=A&amp;amp;layer=c&amp;amp;cbll=45.425741,-75.689638&amp;amp;panoid=Z4FUGnkZkdHAOkIxyjjS9Q&amp;amp;cbp=12,25.83,,0,-0.6 View on Google Maps]&lt;br /&gt;
&lt;br /&gt;
== Attendees ==&lt;br /&gt;
&lt;br /&gt;
The following people have RSVPed to the meeting (in alphabetical order, by surname):&lt;br /&gt;
&lt;br /&gt;
* Oleg Bartunov&lt;br /&gt;
* Josh Berkus (Secretary)&lt;br /&gt;
* Jeff Davis&lt;br /&gt;
* Andrew Dunstan&lt;br /&gt;
* Dimitri Fontaine&lt;br /&gt;
* Stephen Frost&lt;br /&gt;
* Peter Geoghegan&lt;br /&gt;
* Kevin Grittner&lt;br /&gt;
* Robert Haas&lt;br /&gt;
* Magnus Hagander&lt;br /&gt;
* Shigeru Hanada&lt;br /&gt;
* Hitoshi Harada&lt;br /&gt;
* KaiGai Kohei&lt;br /&gt;
* Tom Lane&lt;br /&gt;
* Noah Misch&lt;br /&gt;
* Bruce Momjian&lt;br /&gt;
* Dave Page (Chair)&lt;br /&gt;
* Simon Riggs&lt;br /&gt;
* Teodor Sigaev&lt;br /&gt;
* Greg Smith&lt;br /&gt;
&lt;br /&gt;
== Proposed Agenda Items ==&lt;br /&gt;
&lt;br /&gt;
Please list proposed agenda items here:&lt;br /&gt;
&lt;br /&gt;
* Agree CommitFest schedule for 9.3 (Strawman from Simon)&lt;br /&gt;
** CF1 June 15, 2012 - 1 month&lt;br /&gt;
** CF2 Sep 15, 2012 - 1 month&lt;br /&gt;
** CF3 Nov 15, 2012 - 1 month&lt;br /&gt;
** CF4 Jan 15, 2013 - 2 months&lt;br /&gt;
* Priorities for 9.3 [All]&lt;br /&gt;
** Description: discuss what people are working on and what's likely to be in 9.3.&lt;br /&gt;
** Goals: set expectations and coordinate work schedules for 9.3.&lt;br /&gt;
* Queuing [Dimitri, Kevin]&lt;br /&gt;
** Description: efficient and transactional queuing is a very common need for application using databases, and could help implementing some internal features&lt;br /&gt;
** Goals: get an agreement that core is the right place where to solve that problem, and what parts of it we want in core exactly&lt;br /&gt;
* Materialized views [Kevin]&lt;br /&gt;
** Description: Declarative materialized views are a frequently requested feature, but means many things to many people.  It's not likely that an initial implementation will address everything.  We need a base set of functionality on which to build.&lt;br /&gt;
** Goals: Reach consensus on what a minimum feature set for commit would be.&lt;br /&gt;
* Partitioning and Segment Exclusion [Dimitri]&lt;br /&gt;
** Description: to solve partitioning, we need to agree on a global approach&lt;br /&gt;
** Goals: agreeing on SE as a basis for better partitioning, having a &amp;quot;GO&amp;quot; on working on SE&lt;br /&gt;
* The MERGE statement: Challenges and priorities [Peter G]&lt;br /&gt;
** Description: Implementing the MERGE statement for 9.3. It is envisaged specifically as an atomic &amp;quot;upsert&amp;quot; operation.&lt;br /&gt;
** Goals: To get buy-in on various aspects of the feature's development, and, ideally, to secure reviewer resources or other support. Because of the complexity of the feature, early interest from reviewers is preferable.&lt;br /&gt;
* Row-level Access Control and SELinux [KaiGai]&lt;br /&gt;
** Security label on user tables&lt;br /&gt;
** Dynamic expandable enum data types&lt;br /&gt;
** Enforcement of triggers by extension&lt;br /&gt;
* Enhancement of FDW at v9.3 [KaiGai]&lt;br /&gt;
** Writable foreign tables&lt;br /&gt;
** Stuffs to be pushed down (Join, Aggregate, Sort, ...)&lt;br /&gt;
** Inheritance of foreign/regular tables&lt;br /&gt;
** Constraint (PK/FK) &amp;amp; Trigger support.&lt;br /&gt;
* Type registry [Andrew]&lt;br /&gt;
** Provide for known OIDs for non-builtin types, and possibly for their IO functions too&lt;br /&gt;
** Would make it possible to write code in core or in extension X that handles a type defined in extension Y.&lt;br /&gt;
* Ending CommitFests in a timely fashion, especially the last one.  Avoiding a crush of massive feature patches at the end of the cycle.  Handling big patches that aren't quite ready yet.  Getting more people to help with patch review. [Robert]&lt;br /&gt;
* What Developers Want [Josh]&lt;br /&gt;
** Description: a top-5 list of features and obstacles to developer adoption of PostgreSQL (with slides)&lt;br /&gt;
** Goal: to set priorities for some features aimed at application users&lt;br /&gt;
* In-Place Upgrades &amp;amp; Checksums [Greg Smith, Simon]&lt;br /&gt;
** Description:  Revisit in-place upgrades of the page format, now that pg_upgrade is available and multiple checksum implementations needing it have been proposed.&lt;br /&gt;
** Goal:  Nail down some incremental milestones for 9.3 development to aim at.&lt;br /&gt;
* Autonomous Transactions [Simon]&lt;br /&gt;
** Overview of idea, relationship to stored procedures&lt;br /&gt;
** Feedback, buy-in and/or alternatives&lt;br /&gt;
* Parallel Query [Bruce Momjian]&lt;br /&gt;
** Hope to get buy-in for what parallel operations we are hoping to add in upcoming releases&lt;br /&gt;
* Report from Clustering Meeting [Josh] (10 min)&lt;br /&gt;
** Description: to summarize the discussions of the cluster-hackers meeting from the previous day&lt;br /&gt;
** Goal: inter-team synchronization.  Possibly, decisions requested on specific in-core features.&lt;br /&gt;
* Double Write Buffers [Simon]&lt;br /&gt;
** Is anyone committing to do that for 9.3?&lt;br /&gt;
* Summarise Commitments at End of Play [Simon]&lt;br /&gt;
** For roadmap and planning purposes, confirm who is doing what, assign interested reviewers at start&lt;br /&gt;
** Check gaps, identif priorities early on in cycle&lt;br /&gt;
&lt;br /&gt;
== Agenda ==&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;4&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Item&lt;br /&gt;
!Presenter&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:00&lt;br /&gt;
|Breakfast&lt;br /&gt;
|&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:45 - 09:00&lt;br /&gt;
|Welcome and introductions&lt;br /&gt;
|Dave Page&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|10:30 - 10:45&lt;br /&gt;
|Coffee break&lt;br /&gt;
|&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|12:30 - 13:30&lt;br /&gt;
|Lunch	&lt;br /&gt;
|&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|15:00 - 15:15&lt;br /&gt;
|Tea break&lt;br /&gt;
|&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|16:45 - 17:00&lt;br /&gt;
|Any other business/group photo&lt;br /&gt;
|Dave Page&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|17:00&lt;br /&gt;
|Finish&lt;br /&gt;
|	&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Minutes==&lt;/div&gt;</description>
			<pubDate>Mon, 07 May 2012 15:58:24 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:PgCon_2012_Developer_Meeting</comments>		</item>
		<item>
			<title>Lock dependency information</title>
			<link>http://wiki.postgresql.org/wiki/Lock_dependency_information</link>
			<guid>http://wiki.postgresql.org/wiki/Lock_dependency_information</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Eliminate duplicate target from final results, and rename some final result columns.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{SnippetInfo|Lock dependency info|lang=SQL|category=Performance}}&lt;br /&gt;
At times it is very usefull to see which locks depend uppon each other.&lt;br /&gt;
&lt;br /&gt;
All columns prefixed with ''waiting_'' hold information about the not granted lock, while the colums prefixed with ''other_'' hold information about other locks on the same relation respectively transactionid &lt;br /&gt;
&amp;lt;source lang=&amp;quot;SQL&amp;quot;&amp;gt;SELECT &lt;br /&gt;
    waiting.locktype           AS waiting_locktype,&lt;br /&gt;
    waiting.relation::regclass AS waiting_table,&lt;br /&gt;
    waiting_stm.current_query  AS waiting_query,&lt;br /&gt;
    waiting.mode               AS waiting_mode,&lt;br /&gt;
    waiting.pid                AS waiting_pid,&lt;br /&gt;
    other.locktype             AS other_locktype,&lt;br /&gt;
    other.relation::regclass   AS other_table,&lt;br /&gt;
    other_stm.current_query    AS other_query,&lt;br /&gt;
    other.mode                 AS other_mode,&lt;br /&gt;
    other.pid                  AS other_pid,&lt;br /&gt;
    other.granted              AS other_granted&lt;br /&gt;
FROM&lt;br /&gt;
    pg_catalog.pg_locks AS waiting&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_stat_activity AS waiting_stm&lt;br /&gt;
    ON (&lt;br /&gt;
        waiting_stm.procpid = waiting.pid&lt;br /&gt;
    )&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_locks AS other&lt;br /&gt;
    ON (&lt;br /&gt;
        (&lt;br /&gt;
            waiting.&amp;quot;database&amp;quot; = other.&amp;quot;database&amp;quot;&lt;br /&gt;
        AND waiting.relation  = other.relation&lt;br /&gt;
        )&lt;br /&gt;
        OR waiting.transactionid = other.transactionid&lt;br /&gt;
    )&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_stat_activity AS other_stm&lt;br /&gt;
    ON (&lt;br /&gt;
        other_stm.procpid = other.pid&lt;br /&gt;
    )&lt;br /&gt;
WHERE&lt;br /&gt;
    NOT waiting.granted&lt;br /&gt;
AND&lt;br /&gt;
    waiting.pid &amp;lt;&amp;gt; other.pid&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It would be useful to add extra columns indicating how long the waiting statement has been blocked.&lt;br /&gt;
&lt;br /&gt;
Here's an attempt at a recursive version:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;SQL&amp;quot;&amp;gt;WITH RECURSIVE&lt;br /&gt;
     c(requested, current) AS&lt;br /&gt;
       ( VALUES&lt;br /&gt;
         ('AccessShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('RowShareLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('RowShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'RowShareLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'AccessShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'RowShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text)&lt;br /&gt;
       ),&lt;br /&gt;
     l AS&lt;br /&gt;
       (&lt;br /&gt;
         SELECT&lt;br /&gt;
             (locktype,database,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS target,&lt;br /&gt;
             virtualtransaction,&lt;br /&gt;
             pid,&lt;br /&gt;
             mode,&lt;br /&gt;
             granted&lt;br /&gt;
           FROM pg_catalog.pg_locks&lt;br /&gt;
       ),&lt;br /&gt;
     t AS&lt;br /&gt;
       (&lt;br /&gt;
         SELECT&lt;br /&gt;
             blocker.target  AS blocker_target,&lt;br /&gt;
             blocker.pid     AS blocker_pid,&lt;br /&gt;
             blocker.mode    AS blocker_mode,&lt;br /&gt;
             '1'::int        AS depth,&lt;br /&gt;
             blocked.target  AS target,&lt;br /&gt;
             blocked.pid     AS pid,&lt;br /&gt;
             blocked.mode    AS mode,&lt;br /&gt;
             blocker.pid::text || ',' || blocked.pid::text AS seq&lt;br /&gt;
           FROM l blocker&lt;br /&gt;
           JOIN l blocked&lt;br /&gt;
             ON ( not blocked.granted&lt;br /&gt;
              AND blocked.target IS NOT DISTINCT FROM blocker.target)&lt;br /&gt;
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)&lt;br /&gt;
         UNION ALL&lt;br /&gt;
         SELECT&lt;br /&gt;
             blocker.target,&lt;br /&gt;
             blocker.pid,&lt;br /&gt;
             blocker.mode,&lt;br /&gt;
             depth + 1,&lt;br /&gt;
             blocked.target,&lt;br /&gt;
             blocked.pid,&lt;br /&gt;
             blocked.mode,&lt;br /&gt;
             blocker.seq || ',' || blocked.pid::text&lt;br /&gt;
           FROM t blocker&lt;br /&gt;
           JOIN l blocked&lt;br /&gt;
             ON ( not blocked.granted&lt;br /&gt;
              AND blocked.target IS NOT DISTINCT FROM blocker.target)&lt;br /&gt;
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)&lt;br /&gt;
           WHERE depth &amp;lt; 1000&lt;br /&gt;
       )&lt;br /&gt;
SELECT target, blocker_pid, blocker_mode, depth, pid as blocked_pid, mode as blocked_mode, seq&lt;br /&gt;
  FROM t ORDER BY seq;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;/div&gt;</description>
			<pubDate>Wed, 02 May 2012 20:59:07 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Lock_dependency_information</comments>		</item>
		<item>
			<title>Lock dependency information</title>
			<link>http://wiki.postgresql.org/wiki/Lock_dependency_information</link>
			<guid>http://wiki.postgresql.org/wiki/Lock_dependency_information</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Start depth of recursive query at 1 instead of 0.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{SnippetInfo|Lock dependency info|lang=SQL|category=Performance}}&lt;br /&gt;
At times it is very usefull to see which locks depend uppon each other.&lt;br /&gt;
&lt;br /&gt;
All columns prefixed with ''waiting_'' hold information about the not granted lock, while the colums prefixed with ''other_'' hold information about other locks on the same relation respectively transactionid &lt;br /&gt;
&amp;lt;source lang=&amp;quot;SQL&amp;quot;&amp;gt;SELECT &lt;br /&gt;
    waiting.locktype           AS waiting_locktype,&lt;br /&gt;
    waiting.relation::regclass AS waiting_table,&lt;br /&gt;
    waiting_stm.current_query  AS waiting_query,&lt;br /&gt;
    waiting.mode               AS waiting_mode,&lt;br /&gt;
    waiting.pid                AS waiting_pid,&lt;br /&gt;
    other.locktype             AS other_locktype,&lt;br /&gt;
    other.relation::regclass   AS other_table,&lt;br /&gt;
    other_stm.current_query    AS other_query,&lt;br /&gt;
    other.mode                 AS other_mode,&lt;br /&gt;
    other.pid                  AS other_pid,&lt;br /&gt;
    other.granted              AS other_granted&lt;br /&gt;
FROM&lt;br /&gt;
    pg_catalog.pg_locks AS waiting&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_stat_activity AS waiting_stm&lt;br /&gt;
    ON (&lt;br /&gt;
        waiting_stm.procpid = waiting.pid&lt;br /&gt;
    )&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_locks AS other&lt;br /&gt;
    ON (&lt;br /&gt;
        (&lt;br /&gt;
            waiting.&amp;quot;database&amp;quot; = other.&amp;quot;database&amp;quot;&lt;br /&gt;
        AND waiting.relation  = other.relation&lt;br /&gt;
        )&lt;br /&gt;
        OR waiting.transactionid = other.transactionid&lt;br /&gt;
    )&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_stat_activity AS other_stm&lt;br /&gt;
    ON (&lt;br /&gt;
        other_stm.procpid = other.pid&lt;br /&gt;
    )&lt;br /&gt;
WHERE&lt;br /&gt;
    NOT waiting.granted&lt;br /&gt;
AND&lt;br /&gt;
    waiting.pid &amp;lt;&amp;gt; other.pid&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It would be useful to add extra columns indicating how long the waiting statement has been blocked.&lt;br /&gt;
&lt;br /&gt;
Here's an attempt at a recursive version:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;SQL&amp;quot;&amp;gt;WITH RECURSIVE&lt;br /&gt;
     c(requested, current) AS&lt;br /&gt;
       ( VALUES&lt;br /&gt;
         ('AccessShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('RowShareLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('RowShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'RowShareLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'AccessShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'RowShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text)&lt;br /&gt;
       ),&lt;br /&gt;
     l AS&lt;br /&gt;
       (&lt;br /&gt;
         SELECT&lt;br /&gt;
             (locktype,database,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS target,&lt;br /&gt;
             virtualtransaction,&lt;br /&gt;
             pid,&lt;br /&gt;
             mode,&lt;br /&gt;
             granted&lt;br /&gt;
           FROM pg_catalog.pg_locks&lt;br /&gt;
       ),&lt;br /&gt;
     t AS&lt;br /&gt;
       (&lt;br /&gt;
         SELECT&lt;br /&gt;
             blocker.target  AS blocker_target,&lt;br /&gt;
             blocker.pid     AS blocker_pid,&lt;br /&gt;
             blocker.mode    AS blocker_mode,&lt;br /&gt;
             '1'::int        AS depth,&lt;br /&gt;
             blocked.target  AS target,&lt;br /&gt;
             blocked.pid     AS pid,&lt;br /&gt;
             blocked.mode    AS mode,&lt;br /&gt;
             blocker.pid::text || ',' || blocked.pid::text AS seq&lt;br /&gt;
           FROM l blocker&lt;br /&gt;
           JOIN l blocked&lt;br /&gt;
             ON ( not blocked.granted&lt;br /&gt;
              AND blocked.target IS NOT DISTINCT FROM blocker.target)&lt;br /&gt;
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)&lt;br /&gt;
         UNION ALL&lt;br /&gt;
         SELECT&lt;br /&gt;
             blocker.target,&lt;br /&gt;
             blocker.pid,&lt;br /&gt;
             blocker.mode,&lt;br /&gt;
             depth + 1,&lt;br /&gt;
             blocked.target,&lt;br /&gt;
             blocked.pid,&lt;br /&gt;
             blocked.mode,&lt;br /&gt;
             blocker.seq || ',' || blocked.pid::text&lt;br /&gt;
           FROM t blocker&lt;br /&gt;
           JOIN l blocked&lt;br /&gt;
             ON ( not blocked.granted&lt;br /&gt;
              AND blocked.target IS NOT DISTINCT FROM blocker.target)&lt;br /&gt;
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)&lt;br /&gt;
           WHERE depth &amp;lt; 1000&lt;br /&gt;
       )&lt;br /&gt;
SELECT * FROM t ORDER BY seq;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;/div&gt;</description>
			<pubDate>Wed, 02 May 2012 20:53:08 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Lock_dependency_information</comments>		</item>
		<item>
			<title>Lock dependency information</title>
			<link>http://wiki.postgresql.org/wiki/Lock_dependency_information</link>
			<guid>http://wiki.postgresql.org/wiki/Lock_dependency_information</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Add a first attempt at a recursive version of the query.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{SnippetInfo|Lock dependency info|lang=SQL|category=Performance}}&lt;br /&gt;
At times it is very usefull to see which locks depend uppon each other.&lt;br /&gt;
&lt;br /&gt;
All columns prefixed with ''waiting_'' hold information about the not granted lock, while the colums prefixed with ''other_'' hold information about other locks on the same relation respectively transactionid &lt;br /&gt;
&amp;lt;source lang=&amp;quot;SQL&amp;quot;&amp;gt;SELECT &lt;br /&gt;
    waiting.locktype           AS waiting_locktype,&lt;br /&gt;
    waiting.relation::regclass AS waiting_table,&lt;br /&gt;
    waiting_stm.current_query  AS waiting_query,&lt;br /&gt;
    waiting.mode               AS waiting_mode,&lt;br /&gt;
    waiting.pid                AS waiting_pid,&lt;br /&gt;
    other.locktype             AS other_locktype,&lt;br /&gt;
    other.relation::regclass   AS other_table,&lt;br /&gt;
    other_stm.current_query    AS other_query,&lt;br /&gt;
    other.mode                 AS other_mode,&lt;br /&gt;
    other.pid                  AS other_pid,&lt;br /&gt;
    other.granted              AS other_granted&lt;br /&gt;
FROM&lt;br /&gt;
    pg_catalog.pg_locks AS waiting&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_stat_activity AS waiting_stm&lt;br /&gt;
    ON (&lt;br /&gt;
        waiting_stm.procpid = waiting.pid&lt;br /&gt;
    )&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_locks AS other&lt;br /&gt;
    ON (&lt;br /&gt;
        (&lt;br /&gt;
            waiting.&amp;quot;database&amp;quot; = other.&amp;quot;database&amp;quot;&lt;br /&gt;
        AND waiting.relation  = other.relation&lt;br /&gt;
        )&lt;br /&gt;
        OR waiting.transactionid = other.transactionid&lt;br /&gt;
    )&lt;br /&gt;
JOIN&lt;br /&gt;
    pg_catalog.pg_stat_activity AS other_stm&lt;br /&gt;
    ON (&lt;br /&gt;
        other_stm.procpid = other.pid&lt;br /&gt;
    )&lt;br /&gt;
WHERE&lt;br /&gt;
    NOT waiting.granted&lt;br /&gt;
AND&lt;br /&gt;
    waiting.pid &amp;lt;&amp;gt; other.pid&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It would be useful to add extra columns indicating how long the waiting statement has been blocked.&lt;br /&gt;
&lt;br /&gt;
Here's an attempt at a recursive version:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;SQL&amp;quot;&amp;gt;WITH RECURSIVE&lt;br /&gt;
     c(requested, current) AS&lt;br /&gt;
       ( VALUES&lt;br /&gt;
         ('AccessShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('RowShareLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('RowShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'RowShareLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('ExclusiveLock'::text, 'AccessExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'AccessShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'RowShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'ExclusiveLock'::text),&lt;br /&gt;
         ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text)&lt;br /&gt;
       ),&lt;br /&gt;
     l AS&lt;br /&gt;
       (&lt;br /&gt;
         SELECT&lt;br /&gt;
             (locktype,database,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS target,&lt;br /&gt;
             virtualtransaction,&lt;br /&gt;
             pid,&lt;br /&gt;
             mode,&lt;br /&gt;
             granted&lt;br /&gt;
           FROM pg_catalog.pg_locks&lt;br /&gt;
       ),&lt;br /&gt;
     t AS&lt;br /&gt;
       (&lt;br /&gt;
         SELECT&lt;br /&gt;
             blocker.target  AS blocker_target,&lt;br /&gt;
             blocker.pid     AS blocker_pid,&lt;br /&gt;
             blocker.mode    AS blocker_mode,&lt;br /&gt;
             '0'::int        AS depth,&lt;br /&gt;
             blocked.target  AS target,&lt;br /&gt;
             blocked.pid     AS pid,&lt;br /&gt;
             blocked.mode    AS mode,&lt;br /&gt;
             blocker.pid::text || ',' || blocked.pid::text AS seq&lt;br /&gt;
           FROM l blocker&lt;br /&gt;
           JOIN l blocked&lt;br /&gt;
             ON ( not blocked.granted&lt;br /&gt;
              AND blocked.target IS NOT DISTINCT FROM blocker.target)&lt;br /&gt;
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)&lt;br /&gt;
         UNION ALL&lt;br /&gt;
         SELECT&lt;br /&gt;
             blocker.target,&lt;br /&gt;
             blocker.pid,&lt;br /&gt;
             blocker.mode,&lt;br /&gt;
             depth + 1,&lt;br /&gt;
             blocked.target,&lt;br /&gt;
             blocked.pid,&lt;br /&gt;
             blocked.mode,&lt;br /&gt;
             blocker.seq || ',' || blocked.pid::text&lt;br /&gt;
           FROM t blocker&lt;br /&gt;
           JOIN l blocked&lt;br /&gt;
             ON ( not blocked.granted&lt;br /&gt;
              AND blocked.target IS NOT DISTINCT FROM blocker.target)&lt;br /&gt;
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)&lt;br /&gt;
           WHERE depth &amp;lt; 1000&lt;br /&gt;
       )&lt;br /&gt;
SELECT * FROM t ORDER BY seq;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;/div&gt;</description>
			<pubDate>Fri, 27 Apr 2012 20:14:31 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Lock_dependency_information</comments>		</item>
		<item>
			<title>Guide to reporting problems</title>
			<link>http://wiki.postgresql.org/wiki/Guide_to_reporting_problems</link>
			<guid>http://wiki.postgresql.org/wiki/Guide_to_reporting_problems</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Replace &amp;quot;TL;DR:&amp;quot; section with (hopefully) friendlier text making the same point.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Please read all the way through this document, even though it is pretty long.  It provides information which will help you to frame your question or state your problem in a way which will allow others to be more helpful.  If any of the suggested information is missing, you may need to go back-and-forth with people while they try to gather enough information to help, or they may make guesses in the absence of actual data which could lead to off-target answers.&lt;br /&gt;
&lt;br /&gt;
== Why were you sent this link? ==&lt;br /&gt;
&lt;br /&gt;
Your question or post probably '''didn't include enough information for anyone to be able to help you''' or required too much guesswork to answer.&lt;br /&gt;
&lt;br /&gt;
'''If you don't read and act the advice in this document, you will probably not get useful help. If you read this and act on it you will probably get better help, faster.'''&lt;br /&gt;
&lt;br /&gt;
As a bonus, you'll often figure out your own question half way through writing an explanation of it to the mailing list.&lt;br /&gt;
&lt;br /&gt;
== How to get it right (and get good help more quickly) ==&lt;br /&gt;
&lt;br /&gt;
When writing requests for help, think about the reader and read your message from their point of view. What questions will they have to ask?&lt;br /&gt;
&lt;br /&gt;
Remember, they can't see your screen. They don't know what you are trying to do. They can't tell what programs or OS you use.&lt;br /&gt;
&lt;br /&gt;
'''The people reading your question only know what you tell them.''' The better the information you give, the better help they can offer.&lt;br /&gt;
&lt;br /&gt;
Remember that the postgresql-general mailing list is populated by people helping out ''in their spare time''. Show that you respect the time they're spending to help you by following the advice given here before posting.&lt;br /&gt;
&lt;br /&gt;
== Particular kinds of problem ==&lt;br /&gt;
&lt;br /&gt;
'''Slow query?''' Read: [[SlowQueryQuestions|Guide to Posting Slow Query Questions]].&lt;br /&gt;
&lt;br /&gt;
'''Installation problems?''' Read: [[Troubleshooting Installation]] and (on Windows) the [[Running &amp;amp; Installing PostgreSQL On Native Windows#Common installation errors|Windows installation FAQ]].&lt;br /&gt;
&lt;br /&gt;
'''For (rare but nasty) suspected data/index corruption issues:''' [[Corruption|What to do if you suspect your database or indexes are corrupt]]&lt;br /&gt;
&lt;br /&gt;
=== Things you need to mention in problem reports ===&lt;br /&gt;
&lt;br /&gt;
To get a quick and helpful response, you must include at least the information shown in the list below. '''If you leave things out, your question may not be answered or you will be sent another link to this page and told to try again'''. Save yourself time: do it right the first time.&lt;br /&gt;
&lt;br /&gt;
Please do not send screen shots or photographs of text. Make sure you copy and paste the text into the report email instead.&lt;br /&gt;
&lt;br /&gt;
Make sure you include:&lt;br /&gt;
&lt;br /&gt;
* '''A description of what you are trying to achieve and what results you expect'''.&lt;br /&gt;
** Describe ''in as much detail as possible'', step by step, including command lines, SQL output, etc.&lt;br /&gt;
* The '''EXACT PostgreSQL version''' you are running&lt;br /&gt;
** Run &amp;quot;SELECT version();&amp;quot; in &amp;lt;code&amp;gt;psql&amp;lt;/code&amp;gt; or PgAdmin III and ''provide the full, exact output''. Paste it &lt;br /&gt;
* '''How you installed PostgreSQL'''&lt;br /&gt;
** Downloaded the EnterpriseDB One-click installer?&lt;br /&gt;
** From Linux distro package management? If so, what repository?&lt;br /&gt;
** Direct downloads of rpm/deb packages? From where?&lt;br /&gt;
** From BSD ports, MacPorts, etc?&lt;br /&gt;
** Downloaded and compiled the sources. If so, what options did you pass to &amp;lt;code&amp;gt;configure&amp;lt;/code&amp;gt;? What compiler and version did you use?&lt;br /&gt;
** If you're having installer problems, make sure to include the installer log from the temporary folder. Windows users see [[Running &amp;amp; Installing PostgreSQL On Native Windows]].&lt;br /&gt;
* '''Changes made to the settings in the postgresql.conf file''':  see [[Server Configuration]] for a quick way to list them all.&lt;br /&gt;
* '''Operating system and version'''&lt;br /&gt;
** Linux users:&lt;br /&gt;
*** Linux distro and version&lt;br /&gt;
*** Kernel details (run &amp;lt;code&amp;gt;uname -a&amp;lt;/code&amp;gt; on the terminal)&lt;br /&gt;
** Windows users:&lt;br /&gt;
*** This means your Windows OS version, variant, and service pack. For example, &amp;quot;Windows XP Pro Service Pack 3&amp;quot;. You can get this from the &amp;quot;winver&amp;quot; command - &amp;quot;Start -&amp;gt; Run -&amp;gt; winver.exe&amp;quot;.&lt;br /&gt;
*** Whether you're running a 32-bit or 64-bit version of Windows&lt;br /&gt;
* For questions about any kind of error:&lt;br /&gt;
** '''What you were doing when the error happened / how to cause the error.'''&lt;br /&gt;
** '''The EXACT TEXT of the error message you're getting''' if there is one. Copy and paste the message to the email, do not send a screenshot.&lt;br /&gt;
* '''What program you're using to connect to PostgreSQL'''&lt;br /&gt;
** What version of the ODBC/JDBC/ADO/etc driver you're using, if any&lt;br /&gt;
** If you're using a connection pool, load balancer or application server, which one you're using and its version&lt;br /&gt;
* '''Is there anything remotely unusual in the PostgreSQL server logs?'''&lt;br /&gt;
** On Linux this depends a bit on distro, but you'll usually find them in &amp;lt;code&amp;gt;/var/log/postgresql/&amp;lt;/code&amp;gt;.&lt;br /&gt;
** On Windows these are in your data directory. On a default PostgreSQL install that'll be in &amp;lt;code&amp;gt;C:\Program Files\PostgreSQL\8.4\data\pg_log&amp;lt;/code&amp;gt; (assuming you're using 8.4)&lt;br /&gt;
** Windows users should also check the Event Viewer for service startup messages.&lt;br /&gt;
&lt;br /&gt;
For errors or problems with queries:&lt;br /&gt;
&lt;br /&gt;
* '''The EXACT text of the query you ran, if any'''&lt;br /&gt;
* '''The EXACT output of that query''' if it's short enough to be reasonable to post, otherwise a sample of it&lt;br /&gt;
* The SQL definition of any tables, views and user-defined functions your query uses, or psql &amp;lt;code&amp;gt;\d+&amp;lt;/code&amp;gt; output for them.&lt;br /&gt;
* If at all possible, a '''self contained test case''' demonstrating your problem&lt;br /&gt;
* If you think the query result is wrong, what you think should've been produced instead and why&lt;br /&gt;
* For slow query problems, the information in the [[SlowQueryQuestions|Guide to Posting Slow Query Questions]] page.&lt;br /&gt;
&lt;br /&gt;
Additionally, if your question relates to performance (query speed, memory use, etc) and/or data corruption, please include information about:&lt;br /&gt;
&lt;br /&gt;
* CPU manufacturer and model, eg &amp;quot;AMD Athlon X2&amp;quot; or &amp;quot;Intel Core 2 Duo&amp;quot;&lt;br /&gt;
* Amount and size of RAM installed, eg &amp;quot;2GB RAM&amp;quot;&lt;br /&gt;
* Storage details (important for performance and corruption questions)&lt;br /&gt;
** Do you use a RAID controller? If so, what type of controller? eg &amp;quot;3Ware Escalade 8500-8&amp;quot;&lt;br /&gt;
*** Does it have a battery backed cache module?&lt;br /&gt;
*** Is write-back caching enabled?&lt;br /&gt;
** Do you use software RAID? If so, what software and what version? eg &amp;quot;Linux software RAID (md) 2.6.18-5-686 SMP mod_unload 686 REGPARM gcc-4.1&amp;quot;.&lt;br /&gt;
*** In the case of Linux software RAID you can get the details from the &amp;quot;modinfo md_mod&amp;quot; command&lt;br /&gt;
** Is your PostgreSQL database on a SAN?&lt;br /&gt;
*** Who made it, what kind, etc? Provide what details you can.&lt;br /&gt;
** How many hard disks are connected to the system and what types are they? You need to say more than just &amp;quot;6 disks&amp;quot;. At least give maker, otational speed and interface type, eg &amp;quot;6 15,000rpm Seagate SAS disks&amp;quot;.&lt;br /&gt;
** How are your disks arranged for storage? Are you using RAID? If so, what RAID level(s)? What PostgreSQL data is on what disks / disk sets? What file system(s) are in use?&lt;br /&gt;
*** eg: &amp;quot;Two disks in RAID 1, with all PostgreSQL data and programs stored on one ext3 file system.&amp;quot; &lt;br /&gt;
*** eg: &amp;quot;4 disks in RAID 5 holding the pg data directory on an ext3 file system. 2 disks in RAID 1 holding pg_clog, pg_xlog, the temporary tablespace, and the sort scratch space, also on ext3.&amp;quot;.&lt;br /&gt;
*** eg: &amp;quot;Default Windows install of PostgreSQL&amp;quot;&lt;br /&gt;
** In case of corruption data reports:&lt;br /&gt;
*** Have you ''ever'' set &amp;lt;code&amp;gt;fsync=off&amp;lt;/code&amp;gt; in the postgresql config file?&lt;br /&gt;
*** Have you had any unexpected power loss lately? Replaced a failed RAID disk? Had an operating system crash?&lt;br /&gt;
*** Have you run a file system check? (&amp;lt;code&amp;gt;chkdsk&amp;lt;/code&amp;gt; / &amp;lt;code&amp;gt;fsck&amp;lt;/code&amp;gt;)&lt;br /&gt;
*** Are there any error messages in the system logs? (unix/linux: &amp;lt;code&amp;gt;dmesg&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;/var/log/syslog&amp;lt;/code&amp;gt; ; Windows: Event Viewer in Control Panel -&amp;gt; Administrative Tools )&lt;br /&gt;
&lt;br /&gt;
== Things Not To Do ==&lt;br /&gt;
&lt;br /&gt;
People who answer your questions on mailing lists aren't being paid to do so.  They're doing it out of community spirit and a desire to help other PostgreSQL users, so that they can get help when they need it.  As such, they have to ''want'' to help you; if you make yourself obnoxious, you won't get any assistance.  Here's a number of bad practices which will make people more likely to ignore your request than answer it:&lt;br /&gt;
&lt;br /&gt;
* '''&amp;quot;It's an Emergency!&amp;quot;''': community support is peer-to-peer, free, and at the helper's convenience.  If it's a real emergency, get a support contract with a commercial support company. Most of them will do per-incident support without an existing contract.&lt;br /&gt;
* '''Cross-Posting''': do not post the same question to 2 or more mailing lists at the same time.  Not only does this annoy the people you want help from, it's likely to get your e-mails spam-filtered out.  If you post it to one list, and don't get a response in 2 days or more, then it's ok to post it to another list.&lt;br /&gt;
* '''User Questions on Hacker Lists''': pgsql-hackers, pgsql-committers, pgsql-testers, pgsql-www and pgsql-rrreviewers are all for people who work on the PostgreSQL database engine and our community infrastructure.  Asking user questions on any of those is more liable to get you flamed than any answers.  There are over 3 dozen user mailing lists, use them. Many code contributors and committers do read the -general, -bugs, etc lists, so you'll be heard if there's something wrong.&lt;br /&gt;
* '''Insist on Asking on the Wrong List''': sometimes you will post a question on one mailing list and will be told that you need to post it on a different list.  Don't insist on pursuing the topic on the original list to everyone's irritation.&lt;br /&gt;
* '''Send Email Directly to a List Member Without Copying the List''':  when someone responds on the list to try to help, any reply you make to that should normally be kept on the list.  There are several reasons for this; among them are that the discussion may be useful to others who later run into the same problem, and that there may be other members of the community who can also provide help.&lt;br /&gt;
* '''Re-posting''': don't repost your question multiple times to the same list.  If people haven't answered, they're busy, don't know the answer, or don't want to help you.  If a week goes by without a response, you might &amp;quot;reply&amp;quot; to your own post, with an &amp;quot;Help?  Anyone?&amp;quot;, but more likely you should ask about what other list might be more appropriate for the question. If you re-post, sometimes discussion gets split across the threads under your different posts, confusing everybody and making it harder to help you.&lt;br /&gt;
* '''Comparisons with Other DBMSes''' are generally not helpful, unless your request is highly technical and is the result of serious comparison testing.  We're generally not interested in replicating exactly how Oracle or MySQL do things; if we were, we'd work on those databases instead. Furthermore, many readers of the -general list may not be familiar with the database system you're talking about, so saying &amp;quot;it's like &amp;amp;lt;blah&amp;amp;gt;&amp;quot; doesn't help much. Explain what you're trying to do, not how you did it in some other database. (That said, if you notice an issue where Pg supports something, but not with quite the same syntax as another major database, it's worth mentioning that since sometimes the other syntax can be easily added for compatibility).&lt;br /&gt;
* '''&amp;quot;Postgres Is Broken!&amp;quot;''' as well as variations on &amp;quot;I'm going to abandon Postgres if you don't help me&amp;quot; will not usually get you help, and certainly don't get you help any faster.  The general reaction you get will be: &amp;quot;Nobody is forcing you to use PostgreSQL, feel free to use MySQL/Oracle/CouchDB&amp;quot;.  Similarly, starting off your discussion by accusing PostgreSQL of having a bug because it doesn't work the way you expect is also not a good way to begin.&lt;br /&gt;
* '''Insist That Someone's Answer is Wrong (without testing)''': if you think you know better than they do, why are you asking for help? Usually what you actually mean is ''thanks for your suggestion, but you seem to have misunderstood my question. Perhaps I explained it poorly, let me try again''. Or, perhaps, ''That doesn't quite achieve what I'm trying to do, because it doesn't ''blah'' - is there anything that might?''. Think about how you say things.&lt;br /&gt;
&lt;br /&gt;
[[Category:FAQ]] [[Category:Asking Questions]] [[Category:Administration]]&lt;/div&gt;</description>
			<pubDate>Wed, 25 Apr 2012 17:29:28 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Guide_to_reporting_problems</comments>		</item>
		<item>
			<title>Todo</title>
			<link>http://wiki.postgresql.org/wiki/Todo</link>
			<guid>http://wiki.postgresql.org/wiki/Todo</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Locking */ Correct minor vertical whitespace anomaly.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;margin: 1ex 1em; float: right;&amp;quot;&amp;gt;&lt;br /&gt;
__TOC__&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This list contains '''all known PostgreSQL bugs and feature requests'''. If you would like to work on an item, please read the [[Developer FAQ]] first. There is also a [[Development_information|development information page]].&lt;br /&gt;
&lt;br /&gt;
* {{TodoPending}} - marks ordinary, incomplete items&lt;br /&gt;
* {{TodoEasy}} - marks items that are easier to implement&lt;br /&gt;
* {{TodoDone}} - marks changes that are done, and will appear in the PostgreSQL 9.2 release.&lt;br /&gt;
&lt;br /&gt;
For help on editing this list, please see [[Talk:Todo]]. &amp;lt;b&amp;gt;Please do not add items here without discussion on the mailing list.&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div style=&amp;quot;padding: 1ex 4em;&amp;quot;&amp;gt;&lt;br /&gt;
== Administration ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow administrators to cancel multi-statement idle transactions&lt;br /&gt;
|This allows locks to be released, but it is complex to report the cancellation back to the client.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php &amp;lt;nowiki&amp;gt;Cancelling idle in transaction state&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg00441.php &amp;lt;nowiki&amp;gt;Re: Cancelling idle in transaction state&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Check for unreferenced table files created by transactions that were in-progress when the server terminated abruptly&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-06/msg00096.php &amp;lt;nowiki&amp;gt;Removing unreferenced files&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Set proper permissions on non-system schemas during db creation&lt;br /&gt;
|Currently all schemas are owned by the super-user because they are copied from the template1 database.  However, since all objects are inherited from the template database, it is not clear that setting schemas to the db owner is correct.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow log_min_messages to be specified on a per-module basis&lt;br /&gt;
|This would allow administrators to see more detailed information from specific sections of the backend, e.g. checkpoints, autovacuum, etc. Another idea is to allow separate configuration files for each module, or allow arbitrary SET commands to be passed to them. See also [[Logging Brainstorm]].}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Simplify creation of partitioned tables&lt;br /&gt;
|This would allow creation of partitioned tables without requiring creation of triggers or rules for INSERT/UPDATE/DELETE, and constraints for rapid partition selection.  Options could include range and hash partition selection. See also [[Table partitioning]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow custom variables to appear in pg_settings()&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00850.php &amp;lt;nowiki&amp;gt;Re: count(*) performance improvement ideas&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have custom variables be transaction-safe&lt;br /&gt;
* {{MessageLink|4B577E9F.8000505@dunslane.net|Custom GUCs still a bit broken}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement the SQL-standard mechanism whereby REVOKE ROLE revokes only the privilege granted by the invoking role, and not those granted by other roles&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-05/msg00010.php &amp;lt;nowiki&amp;gt;Re: Grantor name gets lost when grantor role dropped&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent query cancel packets from being replayed by an attacker, especially when using SSL&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-08/msg00345.php &amp;lt;nowiki&amp;gt;Replay attack of query cancel&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide a way to query the log collector subprocess to determine the name of the currently active log file&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-11/msg00418.php &amp;lt;nowiki&amp;gt;Current log files when rotating?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow simpler reporting of the unix domain socket directory and allow easier configuration of its default location&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg01555.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-10/msg01482.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow custom daemons to be automatically stopped/started along with the postmaster&lt;br /&gt;
|This allows easier administration of daemons like user job schedulers or replication-related daemons.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01701.php &amp;lt;nowiki&amp;gt;Re: scheduler in core&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve logging of prepared transactions recovered during startup&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-11/msg00092.php &amp;lt;nowiki&amp;gt;&amp;amp;quot;recovering prepared transaction&amp;amp;quot; after server restart message&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider using POSIX shared memory to avoid System V shared memory kernel limits&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4DFA2673.3010009@enterprisedb.com &amp;lt;nowiki&amp;gt;POSIX shared memory patch status&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Address problem where superusers are assumed to be members of all groups&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00337.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Configuration files ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Change pg_ident.conf parsing to be the same as pg_hba.conf&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg02204.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow postgresql.conf file values to be changed via an SQL API, perhaps using SET GLOBAL&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg00764.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider normalizing fractions in postgresql.conf, perhaps using '%'&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg00550.php &amp;lt;nowiki&amp;gt;Fractions in GUC variables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow Kerberos to disable stripping of realms so we can check the username@realm against multiple realms&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00009.php &amp;lt;nowiki&amp;gt;krb_match_realm patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve LDAP authentication configuration options&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01745.php &amp;lt;nowiki&amp;gt;Proposed Patch - LDAPS support for servers on port 636 w/o TLS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add external tool to auto-tune some postgresql.conf parameters&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00000.php &amp;lt;nowiki&amp;gt;Re: Overhauling GUCS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00033.php &amp;lt;nowiki&amp;gt;Simple postgresql.conf wizard&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add 'hostgss' pg_hba.conf option to allow GSS link-level encryption&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg01454.php &amp;lt;nowiki&amp;gt;Re: Plans for 8.4&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Process pg_hba.conf keywords as case-insensitive&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00432.php &amp;lt;nowiki&amp;gt;More robust pg_hba.conf parsing/error logging&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Create utility to compute accurate random_page_cost value&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-04/msg00162.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-04/msg00362.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow configuration files to be independently validated&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php&lt;br /&gt;
* http://archives.postgresql.org/message-id/12666.1310774573@sss.pgh.pa.us&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow postgresql.conf settings to be accepted by backends even if some settings are invalid for those backends&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow all backends to receive postgresql.conf setting changes at the same time&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Tablespaces ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow a database in tablespace t1 with tables created in tablespace t2 to be used as a template for a new database created with default tablespace t2&lt;br /&gt;
|Currently all objects in the default database tablespace must have default tablespace specifications. This is because new databases are created by copying directories. If you mix default tablespace tables and tablespace-specified tables in the same directory, creating a new database from such a mixed directory would create a new database with tables that had incorrect explicit tablespaces.  To fix this would require modifying pg_class in the newly copied database, which we don't currently do.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow reporting of which objects are in which tablespaces&lt;br /&gt;
|This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow WAL replay of CREATE TABLESPACE to work when the directory structure on the recovery computer is different from the original}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow per-tablespace quotas}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow tablespaces on RAM-based partitions for unlogged tables&lt;br /&gt;
* http://archives.postgresql.org/pgsql-advocacy/2011-05/msg00033.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow toast tables to be moved to a different tablespace&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00980.php&lt;br /&gt;
* {{messageLink|CAFEQCbH756DyyAPQ1ykh3+b+kE1-EhWRww1WO_x5v38C-uLnUg@mail.gmail.com|patch : Allow toast tables to be moved to a different tablespace}} (issues remain)&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CAFEQCbEq07OopgE5xFYv2Q3eMq45hRSJkjCBO+kvpJq9NEVhow@mail.gmail.com Allow toast tables to be moved to a different tablespace]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Statistics Collector ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow statistics last vacuum/analyze execution times to be displayed without requiring track_counts to be enabled&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-docs/2007-04/msg00028.php &amp;lt;nowiki&amp;gt;row-level stats and last analyze time&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Clear table counters on TRUNCATE&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php &amp;lt;nowiki&amp;gt;Small TRUNCATE glitch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== SSL ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow SSL authentication/encryption over unix domain sockets&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00924.php &amp;lt;nowiki&amp;gt;Re: Spoofing as the postmaster&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow SSL key file permission checks to be optionally disabled when sharing SSL keys with other applications&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-12/msg00069.php &amp;lt;nowiki&amp;gt;BUG #3809: SSL &amp;amp;quot;unsafe&amp;amp;quot; private key permissions bug&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow SSL CRL files to be re-read during configuration file reload, rather than requiring a server restart&lt;br /&gt;
|Unlike SSL CRT files, CRL (Certificate Revocation List) files are updated frequently&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-12/msg00832.php &amp;lt;nowiki&amp;gt;Automatic CRL reload&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
Alternatively or additionally supporting OCSP (online certificate security protocol) would provide real-time revocation discovery without reloading&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Allow automatic selection of SSL client certificates from a certificate store&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00406.php &amp;lt;nowiki&amp;gt;Allow multiple certificates or keys in the postgresql.crt/.key files&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Send the full certificate server chain to the client&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-12/msg00145.php BUG #5245: Full Server Certificate Chain Not Sent to client]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Point-In-Time Recovery (PITR) ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Create dump tool for write-ahead logs for use in determining transaction id for point-in-time recovery&lt;br /&gt;
|This is useful for checking PITR recovery.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow archive_mode to be changed without server restart?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01655.php &amp;lt;nowiki&amp;gt;Enabling archive_mode without restart&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider avoiding WAL switching via archive_timeout if there has been no database activity&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-01/msg01469.php &amp;lt;nowiki&amp;gt;archive_timeout behavior for no activity&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg00395.php &amp;lt;nowiki&amp;gt;Re: archive_timeout behavior for no activity&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Standby server mode ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Allow pg_xlogfile_name() to be used in recovery mode&lt;br /&gt;
* [http://archives.postgresql.org/message-id/3f0b79eb1001190135vd9f62f1sa7868abc1ea61d12@mail.gmail.com &amp;lt;nowiki&amp;gt;Streaming replication and pg_xlogfile_name()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Prevent variables inherited from the server environment from begin used for making streaming replication connections.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01011.php &amp;lt;nowiki&amp;gt;Re: Parameter name standby_mode&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
| Allow hot file system backups on standby servers&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01727.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Change walsender so that it applies per-role settings&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg00642.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
| Add more control over waiting for synchronous commit&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01611.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Restructure configuration parameters for standby mode&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg01820.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Allow time-delayed application of logs on the standby&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00992.php&lt;br /&gt;
}}&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Add -X parameter to pg_basebackup to specify a different directory for px_xlog, like initdb&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Data Types ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix data types where equality comparison is not intuitive, e.g. box&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-10/msg01643.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for public SYNONYMs&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-03/msg00519.php &amp;lt;nowiki&amp;gt;Proposal for SYNONYMS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg02043.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-general/2010-12/msg00139.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for SQL-standard GENERATED/IDENTITY columns&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-07/msg00543.php &amp;lt;nowiki&amp;gt;Re: Three weeks left until feature freeze&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-08/msg00038.php &amp;lt;nowiki&amp;gt;GENERATED ... AS IDENTITY, Was: Re: Feature Freeze&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00344.php &amp;lt;nowiki&amp;gt;Behavior of GENERATED columns per SQL2003&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00076.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] Behavior of GENERATED columns per SQL2003&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00604.php &amp;lt;nowiki&amp;gt;IDENTITY/GENERATED patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider placing all sequences in a single table, or create a system view&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php &amp;lt;nowiki&amp;gt;Re: newbie: renaming sequences task&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2012-02/msg00258.php Removing special case OID generation]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a special data type for regular expressions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg01067.php &amp;lt;nowiki&amp;gt;Why is there a tsquery data type?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce BIT data type overhead using short varlena headers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-12/msg00273.php &amp;lt;nowiki&amp;gt;storage size of &amp;amp;quot;bit&amp;amp;quot; data type..&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow renaming and deleting enumerated values from an existing enumerated data type&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support scoped IPv6 addresses in the inet type&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-05/msg00111.php &amp;lt;nowiki&amp;gt;strange problem with ip6&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Add a JSON (JavaScript Object Notation) data type&lt;br /&gt;
|This would behave similar to the XML data type, which is stored as text, but allows element lookup and conversion functions.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg01494.php &amp;lt;nowiki&amp;gt;PATCH: Add hstore_to_json()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-01/msg00001.php &amp;lt;nowiki&amp;gt;Re: PATCH: Add hstore_to_json()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-03/msg01092.php &amp;lt;nowiki&amp;gt;Proposal: Add JSON support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00057.php &amp;lt;nowiki&amp;gt;Re: Proposal: Add JSON support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00481.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01694.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-12/msg00219.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Considering improving performance of computing CHAR() value lengths&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg00900.php &amp;lt;nowiki&amp;gt;char() overhead on read-only workloads not so insignifcant as the docs claim it is...&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01787.php &amp;lt;nowiki&amp;gt;Re: [PATCH] backend: compare word-at-a-time in bcTruelen&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add overlaps geometric operators that ignore point overlaps&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-03/msg00861.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Add IMMUTABLE column attribute&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg00623.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Domains ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow functions defined as casts to domains to be called during casting&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-05/msg00072.php &amp;lt;nowiki&amp;gt;bug? non working casts for domain&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-09/msg01681.php &amp;lt;nowiki&amp;gt;TODO: Fix CREATE CAST on DOMAINs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow values to be cast to domain types&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2003-06/msg01206.php &amp;lt;nowiki&amp;gt;Domain casting still doesn't work right&amp;lt;/nowiki&amp;gt;] &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00289.php &amp;lt;nowiki&amp;gt;domain casting?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00812.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make domains work better with polymorphic functions&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4887.1228700773@sss.pgh.pa.us Polymorphic types vs. domains]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/15535.1238774571@sss.pgh.pa.us some difficulties with fixing it]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Dates and Times ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow infinite intervals just like infinite timestamps&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg00076.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Determine how to represent date/time field extraction on infinite timestamps&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CA+mi_8bda-Fnev9iXeUbnqhVaCWzbYhHkWoxPQfBca9eDPpRMw@mail.gmail.com extract(epoch from infinity) is not 0]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CADAkt-icuESH16uLOCXbR-dKpcvwtUJE4JWXnkdAjAAwP6j12g@mail.gmail.com converting between infinity timestamp and float8]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow TIMESTAMP WITH TIME ZONE to store the original timezone information, either zone name or offset from UTC&lt;br /&gt;
|If the TIMESTAMP value is stored with a time zone name, interval computations should adjust based on the time zone rules. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-10/msg00705.php &amp;lt;nowiki&amp;gt;timestamp with time zone a la sql99&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have timestamp subtraction not call justify_hours()?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-sql/2006-10/msg00059.php &amp;lt;nowiki&amp;gt;timestamp subtraction (was Re: formatting intervals with to_char)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve TIMESTAMP WITH TIME ZONE subtraction to be DST-aware&lt;br /&gt;
|Currently subtracting one date from another that crosses a daylight savings time adjustment can return '1 day 1 hour', but adding that back to the first date returns a time one hour in the future.  This is caused by the adjustment of '25 hours' to '1 day 1 hour', and '1 day' is the same time the next day, even if daylight savings adjustments are involved.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix interval display to support values exceeding 2^31 hours}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add overflow checking to timestamp and interval arithmetic}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add function to allow the creation of timestamps using parameters&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-06/msg00232.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Arrays ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for arrays of domains&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00114.php &amp;lt;nowiki&amp;gt;Re: updated WIP: arrays of composites&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow single-byte header storage for array elements}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add function to detect if an array is empty&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00475.php &amp;lt;nowiki&amp;gt;Re: array_length()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of empty arrays&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01033.php &amp;lt;nowiki&amp;gt;So what's an &amp;amp;quot;empty&amp;amp;quot; array anyway?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of NULLs in arrays&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-11/msg00009.php &amp;lt;nowiki&amp;gt;BUG #4509: array_cat's null behaviour is inconsistent&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg01040.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Binary Data ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve vacuum of large objects, like contrib/vacuumlo?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Auto-delete large objects when referencing row is deleted&lt;br /&gt;
|contrib/lo offers this functionality.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow read/write into TOAST values like large objects&lt;br /&gt;
|Writing might require the TOAST column to be stored EXTERNAL.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00049.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add API for 64-bit large object access&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-09/msg00781.php &amp;lt;nowiki&amp;gt;64-bit API for large objects&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg01790.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== MONEY Data Type ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add locale-aware MONEY type, and support multiple currencies&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2005-08/msg01432.php &amp;lt;nowiki&amp;gt;A real currency type&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01181.php &amp;lt;nowiki&amp;gt;Money type todos?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|MONEY dumps in a locale-specific format making it difficult to restore to a system with a different locale}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Text Search ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow dictionaries to change the token that is passed on to later dictionaries&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-11/msg00081.php &amp;lt;nowiki&amp;gt;a tsearch2 (8.2.4) dictionary that only filters out stopwords&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a function-based API for '@@' searches&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00511.php &amp;lt;nowiki&amp;gt;Simplifying Text Search&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve text search error messages&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00966.php &amp;lt;nowiki&amp;gt;Poorly designed tsearch NOTICEs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg01146.php &amp;lt;nowiki&amp;gt;Re: Poorly designed tsearch NOTICEs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider changing error to warning for strings larger than one megabyte&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-02/msg00190.php &amp;lt;nowiki&amp;gt;BUG #3975: tsearch2 index should not bomb out of 1Mb limit&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-03/msg00062.php &amp;lt;nowiki&amp;gt;Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|tsearch and tsdicts regression tests fail in Turkish locale on glibc&lt;br /&gt;
* [http://archives.postgresql.org/message-id/49749645.5070801@gmx.net tsearch with Turkish locale]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|tsquery negator operator treated as part of lexeme&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-06/msg00346.php BUG #4887: inclusion operator (@&amp;gt;) on tsqeries behaves not conforming to documentation]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of dash and plus signs in email address user names, and perhaps improve URL parsing&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg00772.php&lt;br /&gt;
* [http://archives.postgresql.org/message-id/E1Ri8il-0008Ct-9p@wrigleys.postgresql.org tsearch does not recognize all valid emails]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve default parser, to more easily allow adding new tokens&lt;br /&gt;
* http://archives.postgresql.org/message-id/23485.1297727826@sss.pgh.pa.us&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add additional support functions&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00319.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== XML ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow XML arrays to be cast to other data types&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00981.php &amp;lt;nowiki&amp;gt;proposal casting from XML[] to int[], numeric[], text[]&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00231.php &amp;lt;nowiki&amp;gt;Re: proposal casting from XML[] to int[], numeric[], text[]&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00471.php &amp;lt;nowiki&amp;gt;Re: proposal casting from XML[] to int[], numeric[], text[]&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add XML Schema validation and xmlvalidate functions (SQL:2008)}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add xmlvalidatedtd variant to support validating against a DTD?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Relax-NG validation; libxml2 supports this already}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow reliable XML operation non-UTF8 server encodings (xpath(), in particular, is known to not work)&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-01/msg00135.php &amp;lt;nowiki&amp;gt;BUG #4622: xpath only work in utf-8 server encoding&amp;lt;/nowiki&amp;gt;] &lt;br /&gt;
* http://archives.postgresql.org/message-id/4110.1238973350@sss.pgh.pa.us}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add functions from SQL:2006: XMLDOCUMENT, XMLCAST, XMLTEXT}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add XMLNAMESPACES support in XMLELEMENT and elsewhere}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move XSLT from contrib/xml2 to a more reasonable location&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg00539.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Report errors returned by the XSLT library&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg00562.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve the XSLT parameter passing API&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg00416.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|XML Canonical: Convert XML documents to canonical form to compare them. libxml2 has support for this.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add pretty-printed XML output option&lt;br /&gt;
|Parse a document and serialize it back in some indented form. libxml2 might support this.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add XMLQUERY (from the SQL/XML standard)}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow XML sthredding&lt;br /&gt;
|In some cases shredding could be better option (if there is no need to keep XML docs entirely, e.g. if we have already developed tools that understand only relational data.  This would be a separate module that implements annotated schema decomposition technique, similar to DB2 and SQL Server functionality.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix Nested or repeated xpath() that apparently mess up namespaces [http://archives.postgresql.org/pgsql-bugs/2008-03/msg00097.php] [http://archives.postgresql.org/pgsql-bugs/2008-03/msg00144.php] [http://archives.postgresql.org/pgsql-general/2008-03/msg00295.php] [http://archives.postgresql.org/pgsql-bugs/2008-07/msg00054.php] [http://archives.postgresql.org/message-id/004f01c90e91$138e9d10$3aabd730$@anstett@iaas.uni-stuttgart.de]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|XPath: Adding the &amp;lt;x&amp;gt; at the root causes problems [http://archives.postgresql.org/pgsql-bugs/2008-05/msg00184.php] [http://archives.postgresql.org/pgsql-bugs/2008-07/msg00054.php] [http://archives.postgresql.org/pgsql-general/2008-07/msg00613.php]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|xpath_table needs to be implemented/implementable to get rid of contrib/xml2 [http://archives.postgresql.org/pgsql-general/2008-05/msg00823.php]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|xpath_table is pretty broken anyway [http://archives.postgresql.org/pgsql-hackers/2010-02/msg02424.php]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|better handling of XPath data types [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php] [http://archives.postgresql.org/message-id/004a01c90e90$4b986d90$e2c948b0$@anstett@iaas.uni-stuttgart.de]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of PIs and DTDs in xmlconcat() [http://archives.postgresql.org/message-id/200904211211.n3LCB09p008988@wwwmaster.postgresql.org]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Restructure XML and /contrib/xml2 functionality&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02314.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00017.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Functions ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow INET subnet comparisons using non-constants to be indexed}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add an INET overlaps operator, for use by exclusion constraints &lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-03/msg00845.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Enforce typmod for function inputs, function results and parameters for spi_prepare'd statements called from PLs&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-01/msg01403.php &amp;lt;nowiki&amp;gt;Re: BUG #2917: spi_prepare doesn't accept typename aliases&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-11/msg01160.php &amp;lt;nowiki&amp;gt;RFC for adding typmods to functions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix IS OF so it matches the ISO specification, and add documentation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2003-08/msg00060.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] IS OF&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00060.php &amp;lt;nowiki&amp;gt;ToDo: add documentation for operator IS OF&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement Boyer-Moore searching in LIKE queries&lt;br /&gt;
* {{messageLink|27645.1220635769@sss.pgh.pa.us|TODO item: Implement Boyer-Moore searching (First time hacker)}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent malicious functions from being executed with the permissions of unsuspecting users&lt;br /&gt;
|Index functions are safe, so VACUUM and ANALYZE are safe too.  Triggers, CHECK and DEFAULT expressions, and rules are still vulnerable. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00268.php &amp;lt;nowiki&amp;gt;Some notes about the index-functions security vulnerability&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce memory usage of aggregates in set returning functions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2008-01/msg00031.php &amp;lt;nowiki&amp;gt;Re: Performance of aggregates over set-returning functions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix /contrib/ltree operator&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-11/msg00044.php &amp;lt;nowiki&amp;gt;BUG #3720: wrong results at using ltree&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix /contrib/btree_gist's implementation of inet indexing&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2010-10/msg00099.php &amp;lt;nowiki&amp;gt;BUG #5705: btree_gist: Index on inet changes query result&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;lt;nowiki&amp;gt;Fix inconsistent precedence of =, &amp;amp;gt;, and &amp;amp;lt; compared to &amp;amp;lt;&amp;amp;gt;, &amp;amp;gt;=, and &amp;amp;lt;=&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-12/msg00145.php &amp;lt;nowiki&amp;gt;BUG #3822: Nonstandard precedence for comparison operators&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix regular expression bug when using complex back-references&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-10/msg00000.php &amp;lt;nowiki&amp;gt;BUG #3645: regular expression back references seem broken&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have /contrib/dblink reuse unnamed connections&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00895.php &amp;lt;nowiki&amp;gt;dblink un-named connection doesn't get re-used&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve formatting of pg_get_viewdef() output&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg01648.php &amp;lt;nowiki&amp;gt;pg_get_viewdef formattiing&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01885.php &amp;lt;nowiki&amp;gt;Re: pretty print viewdefs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-12/msg00906.php reprise: pretty print viewdefs]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add function to dump pg_depend information cleanly&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00226.php &amp;lt;nowiki&amp;gt;Elementary dependency look-up&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation&lt;br /&gt;
* [http://archives.postgresql.org/message-id/28488.1286461610@sss.pgh.pa.us pg_relation_size / could not open relation with OID #]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Character Formatting ===&lt;br /&gt;
&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow to_date() and to_timestamp() to accept localized month names}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add missing parameter handling in to_char()&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-12/msg00948.php &amp;lt;nowiki&amp;gt;Re: to_char and i18n&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Throw an error from to_char() instead of printing a string of &amp;quot;#&amp;quot; when a number doesn't fit in the desired output format.&lt;br /&gt;
* discussed in [http://archives.postgresql.org/message-id/37ed240d0907290836w42187222n18664dfcbcb445b1@mail.gmail.com &amp;quot;to_char, support for EEEE format&amp;quot;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow to_char() on interval values to accumulate the highest unit requested&lt;br /&gt;
|2= Some special format flag would be required to request such accumulation.  Such functionality could also be added to EXTRACT. Prevent accumulation that crosses the month/day boundary because of the uneven number of days in a month.&lt;br /&gt;
* to_char(INTERVAL '1 hour 5 minutes', 'MI') =&amp;amp;gt; 65&lt;br /&gt;
* to_char(INTERVAL '43 hours 20 minutes', 'MI' ) =&amp;amp;gt; 2600&lt;br /&gt;
* to_char(INTERVAL '43 hours 20 minutes', 'WK:DD:HR:MI') =&amp;amp;gt; 0:1:19:20&lt;br /&gt;
* to_char(INTERVAL '3 years 5 months','MM') =&amp;amp;gt; 41&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix to_number() handling for values not matching the format string&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg01447.php &amp;lt;nowiki&amp;gt;Re: numeric_to_number() function skipping some digits&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Multi-Language Support ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add NCHAR (as distinguished from ordinary varchar),}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a cares-about-collation column to pg_proc, so that unresolved-collation errors can be thrown at parse time&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-03/msg01520.php &amp;lt;nowiki&amp;gt;Open issues for collations&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Integrate collations with text search configurations&lt;br /&gt;
* [http://archives.postgresql.org/message-id/28887.1303579034@sss.pgh.pa.us &amp;lt;nowiki&amp;gt;Some TODO items for collations&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Integrate collations with to_char() and related functions&lt;br /&gt;
* [http://archives.postgresql.org/message-id/28887.1303579034@sss.pgh.pa.us &amp;lt;nowiki&amp;gt;Some TODO items for collations&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support collation-sensitive equality and hashing functions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-06/msg00472.php &amp;lt;nowiki&amp;gt; contrib/citext versus collations&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a LOCALE option to CREATE DATABASE, as a shorthand&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00119.php &amp;lt;nowiki&amp;gt; Re: 8.4 open items list&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support multiple simultaneous character sets, per SQL:2008}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve UTF8 combined character handling?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add octet_length_server() and octet_length_client()}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make octet_length_client() the same as octet_length()?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problems with wrong runtime encoding conversion for NLS message files}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add URL to more complete multi-byte regression tests&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-07/msg00272.php &amp;lt;nowiki&amp;gt;Multi-byte and client side character encoding tests for copy command..&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix contrib/fuzzystrmatch to work with multibyte encodings&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-04/msg00047.php &amp;lt;nowiki&amp;gt; soundex function returns UTF-16 characters&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00138.php &amp;lt;nowiki&amp;gt; dmetaphone woes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change memory allocation for multi-byte functions so memory is allocated inside conversion functions&lt;br /&gt;
|Currently we preallocate memory based on worst-case usage.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ability to use case-insensitive regular expressions on multi-byte characters&lt;br /&gt;
|Currently it works for UTF-8, but not other multi-byte encodings&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00433.php &amp;lt;nowiki&amp;gt;Regexps vs. locale&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* {{MessageLink|20091201210024.B1393753FB7@cvs.postgresql.org|A partial solution for UTF-8}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve encoding of connection startup messages sent to the client&lt;br /&gt;
|Currently some authentication error messages are sent in the server encoding&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-12/msg00801.php &amp;lt;nowiki&amp;gt;encoding of PostgreSQL messages&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2009-01/msg00005.php &amp;lt;nowiki&amp;gt;Re: encoding of PostgreSQL messages&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|More sensible support for Unicode combining characters, normal forms&lt;br /&gt;
* http://archives.postgresql.org/message-id/200904141532.44618.peter_e@gmx.net&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Views / Rules ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Automatically create rules on views so they are updateable, per SQL:2008&lt;br /&gt;
|We can only auto-create rules for simple views.  For more complex cases users will still have to write rules manually.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-03/msg00586.php &amp;lt;nowiki&amp;gt;Proposal for updatable views&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-08/msg00255.php &amp;lt;nowiki&amp;gt;Updatable views&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php &amp;lt;nowiki&amp;gt;Re: [COMMITTERS] pgsql: Automatic view update rules Bernd Helmle&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://wiki.postgresql.org/wiki/Updatable_views&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add the functionality of the WITH CHECK OPTION clause to CREATE VIEW}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow VIEW/RULE recompilation when the underlying tables change&lt;br /&gt;
|This is both difficult and controversial.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg01723.php Re: About &amp;quot;Allow VIEW/RULE recompilation when the underlying tables change&amp;quot;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg01724.php Re: About &amp;quot;Allow VIEW/RULE recompilation when the underlying tables change&amp;quot;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CACk=U9NFSzWrEba8G5dZ=TZLy3_hx3QXGyCcKVWT=4iA1FjMuA@mail.gmail.com VIEW still referring to old name of field]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make it possible to use RETURNING together with conditional DO INSTEAD rules, such as for partitioning setups&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00577.php &amp;lt;nowiki&amp;gt;RETURNING and DO INSTEAD ... Intentional or not?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add the ability to automatically create materialized views&lt;br /&gt;
|Right now materialized views require the user to create triggers on the main table to keep the summary table current.  SQL syntax should be able to manage the triggers and summary table automatically.  A more sophisticated implementation would automatically retrieve from the summary table when the main table is referenced, if possible.  See [[Materialized Views]] for implementation details&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00479.php &amp;lt;nowiki&amp;gt;GSoC - proposal - Materialized Views in PostgreSQL&amp;lt;/nowiki&amp;gt;] &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve ability to modify views via ALTER TABLE&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00691.php &amp;lt;nowiki&amp;gt;Re: idea: storing view source in system catalogs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg01410.php &amp;lt;nowiki&amp;gt;modifying views&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-08/msg00300.php &amp;lt;nowiki&amp;gt;Re: patch: Add columns via CREATE OR REPLACE VIEW&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Prevent low-cost functions from seeing unauthorized view rows&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-10/msg01346.php &amp;lt;nowiki&amp;gt;Using views for row-level access control is leaky&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SQL Commands ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT&lt;br /&gt;
* [http://dissipatedheat.com/2011/11/10/how-not-to-write-a-patch-for-postgresql/ How not to write this patch.]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve type determination of unknown (NULL or quoted literal) result columns for UNION/INTERSECT/EXCEPT&lt;br /&gt;
* [http://archives.postgresql.org/message-id/9799.1302719551@sss.pgh.pa.us &amp;lt;nowiki&amp;gt;UNION construct type cast gives poor error message&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ROLLUP, CUBE, GROUPING SETS options to GROUP BY&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00838.php &amp;lt;nowiki&amp;gt;WIP: grouping sets support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00466.php &amp;lt;nowiki&amp;gt;Implementation of GROUPING SETS (T431: Extended grouping 	capabilities)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow prepared transactions with temporary tables created and dropped in the same transaction, and when an ON COMMIT DELETE ROWS temporary table is accessed&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00047.php &amp;lt;nowiki&amp;gt;Re: &amp;amp;quot;could not open relation 1663/16384/16584: No such file or directory&amp;amp;quot; in a specific combination of transactions with temp tables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/492543D5.9050904@enterprisedb.com A suggestion on how to implement this]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a GUC variable to warn about non-standard SQL usage in queries}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add SQL-standard MERGE/REPLACE/UPSERT command&lt;br /&gt;
|MERGE is typically used to merge two tables.  REPLACE or UPSERT command does UPDATE, or on failure, INSERT. See [[SQL MERGE]] for notes on the implementation details.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add NOVICE output level for helpful messages&lt;br /&gt;
|For example, have it warn about unjoined tables.  This could also control automatic sequence/index creation messages.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow NOTIFY in rules involving conditionals}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow EXPLAIN to identify tables that were skipped because of constraint_exclusion&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Simplify dropping roles that have objects in several databases}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow the count returned by SELECT, etc to be represented as an int64 to allow a higher range of values}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for WITH RECURSIVE ... CYCLE&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00291.php &amp;lt;nowiki&amp;gt;WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows&amp;lt;/nowiki&amp;gt;]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add DEFAULT .. AS OWNER so permission checks are done as the table owner&lt;br /&gt;
|This would be useful for SERIAL nextval() calls and CHECK constraints.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow DISTINCT to work in multiple-argument aggregate calls}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add comments on system tables/columns using the information in catalogs.sgml&lt;br /&gt;
|Ideally the information would be pulled from the SGML file automatically.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent the specification of conflicting transaction read/write options&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00684.php &amp;lt;nowiki&amp;gt;Re: SET TRANSACTION and SQL Standard&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support LATERAL subqueries&lt;br /&gt;
|Lateral subqueries can reference columns of tables defined outside the subquery at the same level, i.e. ''laterally''.&lt;br /&gt;
For example, a LATERAL subquery in a FROM clause could reference tables defined in the same FROM clause.&lt;br /&gt;
Currently only the columns of tables defined ''above'' subqueries are recognized.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00292.php &amp;lt;nowiki&amp;gt;LATERAL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-10/msg00991.php &amp;lt;nowiki&amp;gt;Re: LATERAL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4F5AA202.9020906@gmail.com lateral function as a subquery - WIP patch]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent temporary tables created with ON COMMIT DELETE ROWS from repeatedly truncating the table on every commit if the table is already empty&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00842.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-03/msg00392.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-04/msg00046.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow DELETE and UPDATE to be used with LIMIT and ORDER BY&lt;br /&gt;
* http://archives.postgresql.org/pgadmin-hackers/2010-04/msg00078.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg01997.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00021.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Improve caching of prepared query plans&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow PREPARE of cursors}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have DISCARD PLANS discard plans cached by functions&lt;br /&gt;
|DISCARD all should do the same.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00431.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid multiple-evaluation of BETWEEN and IN arguments containing volatile expressions&lt;br /&gt;
* http://archives.postgresql.org/message-id/4D95B605.2020709@enterprisedb.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix nested CASE-WHEN constructs&lt;br /&gt;
* http://archives.postgresql.org/message-id/4DDCEEB8.50602@enterprisedb.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== CREATE ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow CREATE TABLE AS to determine column lengths for complex expressions like SELECT col1 || col2}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have WITH CONSTRAINTS also create constraint indexes&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-04/msg00149.php &amp;lt;nowiki&amp;gt;Re: CREATE TABLE LIKE INCLUDING INDEXES support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move NOT NULL constraint information to pg_constraint&lt;br /&gt;
|Currently NOT NULL constraints are stored in pg_attribute without any designation of their origins, e.g. primary keys.  One manifest problem is that dropping a PRIMARY KEY constraint does not remove the NOT NULL constraint designation.  Another issue is that we should probably force NOT NULL to be propagated from parent tables to children, just as CHECK constraints are.  (But then does dropping PRIMARY KEY affect children?)&lt;br /&gt;
* http://archives.postgresql.org/message-id/19768.1238680878@sss.pgh.pa.us&lt;br /&gt;
* http://archives.postgresql.org/message-id/200909181005.n8IA5Ris061239@wwwmaster.postgresql.org&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-07/msg01223.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent concurrent CREATE TABLE from sometimes returning a cryptic error message&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-10/msg00169.php &amp;lt;nowiki&amp;gt;BUG #3692: Conflicting create table statements throw unexpected error&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add CREATE SCHEMA ... LIKE that copies a schema}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix CREATE OR REPLACE FUNCTION to not leave objects depending on the function in inconsistent state&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-08/msg00985.php indexes on functions and create or replace function]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow temporary tables to exist as empty by default in all sessions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00006.php &amp;lt;nowiki&amp;gt;what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg01329.php &amp;lt;nowiki&amp;gt;idea: global temp tables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org//pgsql-hackers/2009-05/msg00016.php &amp;lt;nowiki&amp;gt;Re: idea: global temp tables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg01098.php &amp;lt;nowiki&amp;gt;global temporary tables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow the creation of &amp;quot;distinct&amp;quot; types&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01647.php &amp;lt;nowiki&amp;gt;Distinct types&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider analyzing temporary tables when they are first used in a query&lt;br /&gt;
|Autovacuum cannot analyze or vacuum temporary tables.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00416.php &amp;lt;nowiki&amp;gt;autovacuum and temp tables support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow an unlogged table to be changed to logged&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00437.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00323.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00237.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== UPDATE ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;lt;nowiki&amp;gt;Allow UPDATE tab SET ROW (col, ...) = (SELECT...)&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-07/msg01308.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] extension for sql update&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00865.php &amp;lt;nowiki&amp;gt;UPDATE using sub selects&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-04/msg00315.php &amp;lt;nowiki&amp;gt;UPDATE using sub selects&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-03/msg00237.php &amp;lt;nowiki&amp;gt;Re: UPDATE using sub selects&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Research self-referential UPDATEs that see inconsistent row versions in read-committed mode&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00507.php &amp;lt;nowiki&amp;gt;Concurrently updating an updatable view&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg00016.php &amp;lt;nowiki&amp;gt;Re: Do we need a TODO? (was Re: Concurrently updating anupdatable view)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve performance of EvalPlanQual mechanism that rechecks already-updated rows&lt;br /&gt;
|This is related to the previous item, which questions whether it even has the right semantics&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-09/msg00045.php &amp;lt;nowiki&amp;gt;BUG #4401: concurrent updates to a table blocks one update indefinitely&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-07/msg00302.php &amp;lt;nowiki&amp;gt;BUG #4945: Parallel update(s) gone wild&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== ALTER ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have ALTER TABLE RENAME of a SERIAL column rename the sequence&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php &amp;lt;nowiki&amp;gt;Re: newbie: renaming sequences task&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CADLWmXUV4LbLhMZL8rYMhCy72aZZLB5BSARPQVgoX0BrxA0FFg@mail.gmail.com renaming implicit sequences]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have ALTER SEQUENCE RENAME rename the sequence name stored in the sequence table&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-09/msg00092.php &amp;lt;nowiki&amp;gt;BUG #3619: Renaming sequence does not update its 'sequence_name' field&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-10/msg00007.php &amp;lt;nowiki&amp;gt;Re: BUG #3619: Renaming sequence does not update its 'sequence_name' field&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php &amp;lt;nowiki&amp;gt;Re: newbie: renaming sequences task&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow ALTER TABLE ... ALTER CONSTRAINT ... RENAME or ALTER TABLE RENAME CONSTRAINT&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-02/msg00168.php &amp;lt;nowiki&amp;gt;ALTER CONSTRAINT RENAME patch reverted&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ALTER DOMAIN to modify the underlying data type}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow ALTER TABLESPACE to move the tablespace to different directories}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow moving system tables to other tablespaces, where possible&lt;br /&gt;
|Currently non-global system tables must be in the default database tablespace. Global system tables can never be moved.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have ALTER INDEX update the name of a constraint using that index}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow column display reordering by recording a display, storage, and permanent id for every column?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00782.php &amp;lt;nowiki&amp;gt;Re: column ordering, was Re: [PATCHES] Enums patch v2&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg01029.php &amp;lt;nowiki&amp;gt;Column reordering in pg_dump&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/message-id/1324412114-sup-9608@alvh.no-ip.org&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow deactivating (and reactivating) indexes via ALTER TABLE&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg01191.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ALTER OPERATOR ... RENAME&lt;br /&gt;
|needs to consider effects of changing operator precedence&lt;br /&gt;
* [http://archives.postgresql.org/message-id/1322948781.26266.9.camel@vanquo.pezone.net Missing rename support]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ALTER TABLE ... RENAME RULE&lt;br /&gt;
* [http://archives.postgresql.org/message-id/1322948781.26266.9.camel@vanquo.pezone.net Missing rename support]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== CLUSTER ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Automatically maintain clustering on a table&lt;br /&gt;
|This might require some background daemon to maintain clustering during periods of low usage. It might also require tables to be only partially filled for easier reorganization.  Another idea would be to create a merged heap/index data file so an index lookup would automatically access the heap data too.  A third idea would be to store heap rows in hashed groups, perhaps using a user-supplied hash function.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2004-08/msg00350.php &amp;lt;nowiki&amp;gt;Equivalent praxis to CLUSTERED INDEX?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00155.php &amp;lt;nowiki&amp;gt;Re: Grouped Index Tuples&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://community.enterprisedb.com/git/&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2009-10/msg00346.php &amp;lt;nowiki&amp;gt;Re: maintain_cluster_order_v5.patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== COPY ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY to report error lines and continue&lt;br /&gt;
|This requires the use of a savepoint before each COPY line is processed, with ROLLBACK on COPY failure. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00572.php &amp;lt;nowiki&amp;gt;Re: VLDB Features&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY FROM to create index entries in bulk&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00811.php &amp;lt;nowiki&amp;gt;Batch update of indexes on data loading&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY in CSV mode to control whether a quoted zero-length string is treated as NULL&lt;br /&gt;
|Currently this is always treated as a zero-length string, which generates an error when loading into an integer column &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] allow CSV quote in NULL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve COPY performance&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00954.php &amp;lt;nowiki&amp;gt;Re: 8.3 / 8.2.6 restore comparison&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01882.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY to report errors sooner&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01169.php &amp;lt;nowiki&amp;gt;Timely reporting of COPY errors&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY to handle other number formats&lt;br /&gt;
|E.g. the German notation. Best would be something like WITH DECIMAL ','.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow a stalled COPY to exit if the backend is terminated&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-04/msg00067.php &amp;lt;nowiki&amp;gt;Re: possible bug not in open items&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== GRANT/REVOKE ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow SERIAL sequences to inherit permissions from the base table?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow dropping of a role that has connection rights&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00736.php &amp;lt;nowiki&amp;gt;DROP ROLE dependency tracking ...&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== DECLARE CURSOR ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent DROP TABLE from dropping a table referenced by its own open cursor?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide some guarantees about the behavior of cursors that invoke volatile functions&lt;br /&gt;
* [http://archives.postgresql.org/message-id/20997.1244563664@sss.pgh.pa.us Re: Cursor with hold emits the same row more than once across commits in 8.3.7]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== INSERT ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow INSERT/UPDATE of the system-generated oid value for a row}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|In rules, allow VALUES() to contain a mixture of 'old' and 'new' references}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== SHOW/SET ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM ANALYZE, and CLUSTER}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Rationalize the discrepancy between settings that use values in bytes and SHOW that returns the object count&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-docs/2008-07/msg00007.php &amp;lt;nowiki&amp;gt;Re: [ADMIN] shared_buffers and shmmax&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== ANALYZE ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified percentage}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have EXPLAIN ANALYZE report rows as floating-point numbers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg01363.php &amp;lt;nowiki&amp;gt;explain analyze rows=%.0f&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg00108.php &amp;lt;nowiki&amp;gt;Re: explain analyze rows=%.0f&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve how ANALYZE computes in-doubt tuples&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00771.php &amp;lt;nowiki&amp;gt;VACUUM/ANALYZE counting of in-doubt tuples&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Window Functions ===&lt;br /&gt;
See {{messageLink|357.1230492361@sss.pgh.pa.us|TODO items for window functions}}.&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support creation of user-defined window functions&lt;br /&gt;
|We have the ability to create new window functions written in C.  Is it&lt;br /&gt;
worth the effort to create an API that would let them be written in PL/pgsql, etc?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement full support for window framing clauses&lt;br /&gt;
|In addition to done clauses described in the [http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS latest doc], these clauses are not implemented yet.&lt;br /&gt;
* RANGE BETWEEN ... PRECEDING/FOLLOWING&lt;br /&gt;
* EXCLUDE&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Investigate tuplestore performance issues&lt;br /&gt;
|The tuplestore_in_memory() thing is just a band-aid, we ought to try to solve it properly.  tuplestore_advance seems like a weak spot as well.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00152.php &amp;lt;nowiki&amp;gt;tuplestore potential performance problem&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem|Do we really need so much duplicated code between Agg and WindowAgg?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Teach planner to evaluate multiple windows in the optimal order&lt;br /&gt;
|Currently windows are always evaluated in the query-specified order.&lt;br /&gt;
* http://archives.postgresql.org/message-id/3CDAD71E9D70417290FCF66F0178D1E1@amd64&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement DISTINCT clause in window aggregates&lt;br /&gt;
|Some proprietary RDBMSs have implemented it already, so it helps with porting from those.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Integrity Constraints ==&lt;br /&gt;
=== Keys ===&lt;br /&gt;
&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve deferrable unique constraints for cases with many conflicts&lt;br /&gt;
|The current implementation fires a trigger for each potentially conflicting row.  This might not scale well for an update that changes many key values at once.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Referential Integrity ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add MATCH PARTIAL referential integrity}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change foreign key constraint for array -&amp;amp;gt; element to mean element in array?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-10/msg01814.php &amp;lt;nowiki&amp;gt;foreign keys for array/period contains relationships&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problem when cascading referential triggers make changes on cascaded tables, seeing the tables in an intermediate state&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-09/msg00174.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] Work-in-progress referential action trigger timing&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Optimize referential integrity checks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php &amp;lt;nowiki&amp;gt;Re: Effects of cascading references in foreign keys&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-04/msg00744.php &amp;lt;nowiki&amp;gt;Can't ri_KeysEqual() consider two nulls as equal?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Check Constraints ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Run check constraints only when affected columns are changed&lt;br /&gt;
* http://archives.postgresql.org/message-id/1326055327.15293.13.camel@vanquo.pezone.net&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Server-Side Languages ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for polymorphic arguments and return types to languages other than PL/PgSQL}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for OUT and INOUT parameters to languages other than PL/PgSQL}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add more fine-grained specification of functions taking arbitrary data types&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00367.php &amp;lt;nowiki&amp;gt;RfD: more powerful &amp;amp;quot;any&amp;amp;quot; types&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement stored procedures&lt;br /&gt;
|This might involve the control of transaction state and the return of multiple result sets&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-10/msg00454.php &amp;lt;nowiki&amp;gt;PL/pgSQL stored procedure returning multiple result sets (SELECTs)?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg01375.php &amp;lt;nowiki&amp;gt;Proposal: real procedures again (8.4)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg00542.php&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-04/msg01149.php &amp;lt;nowiki&amp;gt;Gathering specs and discussion on feature (post 9.1)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow holdable cursors in SPI}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Add SPI_gettypmod() to return a field's typemod from a TupleDesc&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2005-11/msg00250.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== SQL-Language Functions ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow SQL-language functions to reference parameters by parameter name&lt;br /&gt;
|Currently SQL-language functions can only refer to dollar parameters, e.g. $1&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01479.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01519.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00221.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Rethink query plan caching and timing of parse analysis within SQL-language functions&lt;br /&gt;
|They should work more like plpgsql functions do ...&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2011-05/msg00078.php &amp;lt;nowiki&amp;gt;Re: BUG #6019: invalid cached plan on inherited table&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== PL/pgSQL ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;lt;nowiki&amp;gt;Allow listing of record column names, and access to record columns via variables, e.g. columns := r.(*), tval2 := r.(colname)&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2005-07/msg00458.php &amp;lt;nowiki&amp;gt;Re: PL/PGSQL: Dynamic Record Introspection&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-05/msg00302.php &amp;lt;nowiki&amp;gt;Re: PL/PGSQL: Dynamic Record Introspection&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-06/msg00031.php &amp;lt;nowiki&amp;gt;Re: PL/PGSQL: Dynamic Record Introspection&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow row and record variables to be set to NULL constants, and allow NULL tests on such variables&lt;br /&gt;
|Because a row is not scalar, do not allow assignment from NULL-valued scalars.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-10/msg00070.php &amp;lt;nowiki&amp;gt;NULL and plpgsql rows&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider keeping separate cached copies when search_path changes&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg01009.php &amp;lt;nowiki&amp;gt;pl/pgsql Plan Invalidation and search_path&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of NULL row values vs. NULL rows&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-09/msg01758.php &amp;lt;nowiki&amp;gt;Null row vs. row of nulls in plpgsql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg01973.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve PERFORM handling of WITH queries or document limitation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-03/msg00309.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== PL/Perl ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow regex operations in plperl using UTF8 characters in non-UTF8 encoded databases}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== PL/Python ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Develop a trusted variant of PL/Python.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Create a new restricted execution class that will allow passing function arguments in as locals.  Passing them as globals means functions cannot be called recursively.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-02/msg01468.php &amp;lt;nowiki&amp;gt;Re: pl/python do not delete function arguments&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a DB-API compliant interface on top of the SPI interface&lt;br /&gt;
* http://petereisentraut.blogspot.com/2011/11/plpydbapi-db-api-for-plpython.html&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|For functions returning a setof record with a composite type, cache the I/O functions for the composite type&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg02007.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix loss of information during conversion of numeric type to Python float}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== PL/Tcl ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add table function support}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Check encoding validity of values passed back to Postgres in function returns, trigger tuple changes, and SPI calls.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Clients ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a function like pg_get_indexdef() that report more detailed index information&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-12/msg00166.php &amp;lt;nowiki&amp;gt;BUG #3829: Wrong index reporting from pgAdmin III (v1.8.0 rev 6766-6767)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Split out pg_resetxlog output into pre- and post-sections&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg02040.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== pg_ctl ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow pg_ctl to work properly with configuration files located outside the PGDATA directory&lt;br /&gt;
|pg_ctl can not read the pid file because it isn't located in the config directory but in the PGDATA directory.  The solution is to allow pg_ctl to read and understand postgresql.conf to find the data_directory value.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-10/msg00024.php &amp;lt;nowiki&amp;gt;BUG #5103: &amp;amp;quot;pg_ctl -w (re)start&amp;amp;quot; fails with custom unix_socket_directory&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Modify pg_ctl behavior and exit codes to make it easier to write an LSB conforming init script&lt;br /&gt;
|It may be desirable to condition some of the changes on a command-line switch, to avoid breaking existing scripts.  A Linux shell (sh) script is referenced which has been tested and seems to provide a high degree of conformance in multiple environments.  Study of this script might suggest areas where pg_ctl could be modified to make writing an LSB conforming script easier; however, some aspects of that script would be unnecessary with other suggested changes to pg_ctl, and discussion on the lists did not reach consensus on support for all aspects of this script.  Further discussion of particular changes is needed before beginning any work.&lt;br /&gt;
* [[Lsb_conforming_init_script|LSB conforming init script]]&lt;br /&gt;
These threads should be studied for other ideas on improvements:&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01390.php &amp;lt;nowiki&amp;gt;We should Axe /contrib/start-scripts&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01843.php &amp;lt;nowiki&amp;gt;Linux LSB init script&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00008.php &amp;lt;nowiki&amp;gt;Re: Linux LSB init script&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve pg_ctl's detection of running postmasters&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00000.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-committers/2011-06/msg00001.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== psql ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have psql \ds show all sequences and their settings&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00916.php &amp;lt;nowiki&amp;gt;Re: TODO item: Have psql show current values for a sequence&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00401.php &amp;lt;nowiki&amp;gt;Quick patch: Display sequence owner&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Have \d on a sequence indicate if the sequence is owned by a table}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move psql backslash database information into the backend, use mnemonic commands?&lt;br /&gt;
|This would allow non-psql clients to pull the same information out of the database as psql. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-01/msg00191.php &amp;lt;nowiki&amp;gt;Re: psql \d option list overloaded&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make psql's \d commands more consistent in their handling of schemas&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-11/msg00014.php &amp;lt;nowiki&amp;gt;Re: psql and schemas&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make psql's \d commands distinguish default privileges from no privileges&lt;br /&gt;
|ACL displays were visibly different for the two cases before we &amp;quot;improved&amp;quot; them by using array_to_string.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2011-05/msg00082.php &amp;lt;nowiki&amp;gt;BUG #6021: There is no difference between default and empty access privileges with \dp&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consistently display privilege information for all objects in psql}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Add &amp;amp;quot;auto&amp;amp;quot; expanded mode that outputs in expanded format if &amp;amp;quot;wrapped&amp;amp;quot; mode can't wrap the output to the screen width&lt;br /&gt;
|Consider using auto-expanded mode for backslash commands like \df+.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00417.php &amp;lt;nowiki&amp;gt;Re: psql wrapped format default for backslash-d commands&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg01638.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent tab completion of SET TRANSACTION from querying the database and therefore preventing the transaction isolation level from being set.&lt;br /&gt;
|Currently SET &amp;amp;lt;tab&amp;amp;gt; causes a database lookup to check all supported session variables.  This query causes problems because setting the transaction isolation level must be the first statement of a transaction.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|\s without arguments (display history) fails with libedit, doesn't use pager either&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2011-06/msg00114.php &amp;lt;nowiki&amp;gt; psql \s not working - OS X&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a \set variable to control whether \s displays line numbers&lt;br /&gt;
|Another option is to add \# which lists line numbers, and allows command execution.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00255.php &amp;lt;nowiki&amp;gt;Re: psql possible TODO&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Include the symbolic SQLSTATE name in verbose error reports&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-09/msg00438.php &amp;lt;nowiki&amp;gt;Re: Checking is TSearch2 query is valid&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add prompt escape to display the client and server versions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00310.php &amp;lt;nowiki&amp;gt;WIP patch for TODO Item: Add prompt escape to display the client and server versions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add option to wrap column values at whitespace boundaries, rather than chopping them at a fixed width.&lt;br /&gt;
|Currently, &amp;amp;quot;wrapped&amp;amp;quot; format chops values into fixed widths.  Perhaps the word wrapping could use the same algorithm documented in the W3C specification. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00404.php &amp;lt;nowiki&amp;gt;Re: psql wrapped format default for backslash-d commands&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://www.w3.org/TR/CSS21/tables.html#auto-table-layout}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support the ReST table output format&lt;br /&gt;
|Details about the ReST format:  http://docutils.sourceforge.net/rst.html#reference-documentation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-08/msg01007.php &amp;lt;nowiki&amp;gt;Proposal: new border setting in psql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00518.php &amp;lt;nowiki&amp;gt;Re: Proposal: new border setting in psql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00609.php &amp;lt;nowiki&amp;gt;Re: Proposal: new border setting in psql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add option to print advice for people familiar with other databases&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-01/msg01845.php &amp;lt;nowiki&amp;gt;MySQL-ism help patch for psql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|\dd is missing comments for several types of objects&lt;br /&gt;
|Comments are not handled at all for some object types, and are handled by both \dd and the individual backslash command for others. Consider a system view like pg_comments to manage this mess.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2009-09/msg00199.php &amp;lt;nowiki&amp;gt;comment on constraint&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-09/msg01080.php &amp;lt;nowiki&amp;gt;pg_comments&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-05/msg00885.php &amp;lt;nowiki&amp;gt;patch: Allow \dd to show constraint comments&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ability to edit views with \ev&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00023.php &amp;lt;nowiki&amp;gt;Adding \ev view editor?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix FETCH_COUNT to handle SELECT ... INTO and WITH queries&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01565.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00192.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent psql from sending remaining single-line multi-statement queries after reconnecting&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01283.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Add \i option to bring in the specified file as a quoted literal&lt;br /&gt;
|This would be useful for creating functions and other areas.  Details still need to be worked out.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-02/msg00016.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-02/msg00020.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider having psql -c read .psqlrc, for consistency&lt;br /&gt;
|psql -f already reads .psqlrc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow processing of multiple -f (file) options&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve line drawing characters&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00386.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider improving the continuation prompt&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg01772.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== pg_dump / pg_restore ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|&amp;lt;nowiki&amp;gt;Add full object name to the tag field.  eg. for operators we need '=(integer, integer)', instead of just '='.&amp;lt;/nowiki&amp;gt;}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add pg_dumpall custom format dumps?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2010-05/msg00509.php pg_dumpall custom format]&lt;br /&gt;
|}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid using platform-dependent locale names in pg_dumpall output&lt;br /&gt;
|Using native locale names puts roadblocks in the way of porting a dump to another platform.  One possible solution is to get&lt;br /&gt;
CREATE DATABASE to accept some agreed-on set of locale names and fix them up to meet the platform's requirements.&lt;br /&gt;
* http://archives.postgresql.org/message-id/21396.1241716688@sss.pgh.pa.us&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow selection of individual object(s) of all types, not just tables}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|In a selective dump, allow dumping of an object and all its dependencies}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add options like pg_restore -l and -L to pg_dump}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for multiple pg_restore -t options, like pg_dump&lt;br /&gt;
|pg_restore's -t switch is less useful than pg_dump's in quite a few ways: no multiple switches, no pattern matching, no ability to pick up indexes and other dependent items for a selected table.  It should be made to handle this switch just like pg_dump does.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Stop dumping CASCADE on DROP TYPE commands in clean mode}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow pg_dump --clean to drop roles that own objects or have privileges&lt;br /&gt;
|tgl says: if this is about pg_dumpall, it's done as of 8.4.  If it's really about pg_dump, what does it mean?  pg_dump has no business dropping roles.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow pg_dump to utilize multiple CPUs and I/O channels by dumping multiple objects simultaneously&lt;br /&gt;
|The difficulty with this is getting multiple dump processes to produce a single dump output file.  It also would require several sessions to share the same snapshot. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00205.php &amp;lt;nowiki&amp;gt;pg_dump additional options for performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00135.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00040.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg02454.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow pg_restore to load different parts of the COPY data for a single table simultaneously}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Remove support for dumping from pre-7.3 servers&lt;br /&gt;
|In 7.3 and later, we can get accurate dependency information from the server.  pg_dump still contains a lot of crufty code&lt;br /&gt;
to try to deal with the lack of dependency info in older servers, but the usefulness of maintaining that code grows small.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow pre/data/post files when schema and data are dumped separately, for performance reasons&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00205.php &amp;lt;nowiki&amp;gt;pg_dump additional options for performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-07/msg00185.php &amp;lt;nowiki&amp;gt;Re: pg_dump additional options for performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00821.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00135.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Refactor handling of database attributes between pg_dump and pg_dumpall&lt;br /&gt;
|Currently only pg_dumpall emits database attributes, such as ALTER DATABASE SET commands and database-level GRANTs.&lt;br /&gt;
Many people wish that pg_dump would do that.  One proposal is to let pg_dump issue such commands if the -C switch was used,&lt;br /&gt;
but it's unclear whether that will satisfy the demand.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg01031.php &amp;lt;nowiki&amp;gt;ALTER DATABASE vs pg_dump&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2010-05/msg00010.php summary of the issues]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change pg_dump so that a comment on the dumped database is applied to the loaded database, even if the database has a different name.&lt;br /&gt;
|This will require new backend syntax, perhaps COMMENT ON CURRENT DATABASE. This is related to the previous item.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow parallel restore of tar dumps&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-02/msg01154.php &amp;lt;nowiki&amp;gt;Re: parallel restore&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow pg_dumpall to output restorable ALTER USER/DATABASE SET settings&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00916.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00394.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02359.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-10/msg00489.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== ecpg ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Docs&lt;br /&gt;
|Document differences between ecpg and the SQL standard and information about the Informix-compatibility module.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Solve cardinality &amp;amp;gt; 1 for input descriptors / variables?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a semantic check level, e.g. check if a table really exists}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|fix handling of DB attributes that are arrays}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix nested C comments}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|sqlwarn[6] should be 'W' if the PRECISION or SCALE value specified}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make SET CONNECTION thread-aware, non-standard?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow multidimensional arrays}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement COPY FROM STDIN}} &lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide a way to specify size of a bytea parameter&lt;br /&gt;
* [http://archives.postgresql.org/message-id/200906192131.n5JLVoMo044178@wwwmaster.postgresql.org &amp;lt;nowiki&amp;gt;BUG #4866: ECPG and BYTEA&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Fix small memory leaks in ecpg&lt;br /&gt;
|Memory leaks in a short running application like ecpg are not really a problem, but make debugging more complicated}} &lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow reuse of cursor name variables&lt;br /&gt;
* [http://archives.postgresql.org/message-id/20100329113435.GA3430@feivel.credativ.lan &amp;lt;nowiki&amp;gt;Problems with variable cursorname in ecpg&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== libpq ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent PQfnumber() from lowercasing unquoted column names&lt;br /&gt;
|PQfnumber() should never have been doing lowercasing, but historically it has so we need a way to prevent it}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow statement results to be automatically batched to the client&lt;br /&gt;
|Currently all statement results are transferred to the libpq client before libpq makes the results available to the application.  This feature would allow the application to make use of the first result rows while the rest are transferred, or held on the server waiting for them to be requested by libpq. One complexity is that a statement like SELECT 1/col could error out mid-way through the result set.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider disallowing multiple queries in PQexec() as an additional barrier to SQL injection attacks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-01/msg00184.php &amp;lt;nowiki&amp;gt;Re: InitPostgres and flatfiles question&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add PQexecf() that allows complex parameter substitution&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01803.php &amp;lt;nowiki&amp;gt;Last minute mini-proposal (I know, know) for PQexecf()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add SQLSTATE and severity to errors generated within libpq itself&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-interfaces/2007-11/msg00015.php &amp;lt;nowiki&amp;gt;v8.1: Error severity on libpq PGconn*&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01425.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for interface/ipaddress binding to libpq&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01811.php &amp;lt;nowiki&amp;gt;SR/libpq - outbound interface/ipaddress binding&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Triggers ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve storage of deferred trigger queue&lt;br /&gt;
|Right now all deferred trigger information is stored in backend memory.  This could exhaust memory for very large trigger queues. This item involves dumping large queues into files, or doing some kind of join to process all the triggers, some bulk operation, or a bitmap. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00876.php &amp;lt;nowiki&amp;gt;Re: BUG #4204: COPY to table with FK has memory leak&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-10/msg00464.php &amp;lt;nowiki&amp;gt;Scaling up deferred unique checks and the after trigger queue&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-08/msg00023.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow triggers to be disabled in only the current session.&lt;br /&gt;
|This is currently possible by starting a multi-statement transaction, modifying the system tables, performing the desired SQL, restoring the system tables, and committing the transaction.  ALTER TABLE ... TRIGGER requires a table lock so it is not ideal for this usage.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|With disabled triggers, allow pg_dump to use ALTER TABLE ADD FOREIGN KEY&lt;br /&gt;
|If the dump is known to be valid, allow foreign keys to be added without revalidating the data.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow statement-level triggers to access modified rows}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|When statement-level triggers are defined on a parent table, have them fire only on the parent table, and fire child table triggers only where appropriate&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg01883.php &amp;lt;nowiki&amp;gt;Statement-level triggers and inheritance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow AFTER triggers on system tables&lt;br /&gt;
|System tables are modified in many places in the backend without going through the executor and therefore not causing triggers to fire. To complete this item, the functions that modify system tables will have to fire triggers.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01665.php&lt;br /&gt;
* http://wiki.postgresql.org/wiki/DDL_Triggers&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-12/msg00022.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Tighten trigger permission checks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00564.php &amp;lt;nowiki&amp;gt;Security leak with trigger functions?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow BEFORE INSERT triggers on views&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-02/msg01466.php &amp;lt;nowiki&amp;gt;Re: Why can't I put a BEFORE EACH ROW trigger on a view?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add database and transaction-level triggers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00451.php &amp;lt;nowiki&amp;gt;Proposal for db level triggers&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00620.php &amp;lt;nowiki&amp;gt;triggers on prepare, commit, rollback... ?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce locking requirements for creating a trigger&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00635.php &amp;lt;nowiki&amp;gt;Re: Change lock requirements for adding a trigger&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid requirement for &amp;quot;AFTER&amp;quot; trigger functions to return a value&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02384.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow creation of inline triggers&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-02/msg00708.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Inheritance ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow inherited tables to inherit indexes, UNIQUE constraints, and primary/foreign keys&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-05/msg00285.php &amp;lt;nowiki&amp;gt;Partitioning/inherited tables vs FKs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00039.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00305.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Honor UNIQUE INDEX on base column in INSERTs/UPDATEs on inherited table, e.g.  INSERT INTO inherit_table (unique_index_col) VALUES (dup) should fail&lt;br /&gt;
|The main difficulty with this item is the problem of creating an index that can span multiple tables.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Determine whether ALTER TABLE / SET SCHEMA should work on inheritance hierarchies (and thus support ONLY).  If yes, implement it.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|ALTER TABLE variants sometimes support recursion and sometimes not, but this is poorly/not documented, and the ONLY marker would then be silently ignored. Clarify the documentation, and reject ONLY if it is not supported.}}&lt;br /&gt;
&lt;br /&gt;
== Indexes ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent index uniqueness checks when UPDATE does not modify the column&lt;br /&gt;
|Uniqueness (index) checks are done when updating a column even if the column is not modified by the UPDATE.&lt;br /&gt;
However, HOT already short-circuits this in common cases, so more work might not be helpful.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow the creation of on-disk bitmap indexes which can be quickly combined with other bitmap indexes&lt;br /&gt;
|Such indexes could be more compact if there are only a few distinct values. Such indexes can also be compressed.  Keeping such indexes updated can be costly.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2005-07/msg00512.php &amp;lt;nowiki&amp;gt;Re: Bitmap index AM&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg01107.php &amp;lt;nowiki&amp;gt;Bitmap index thoughts&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00265.php &amp;lt;nowiki&amp;gt;Stream bitmaps&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01214.php &amp;lt;nowiki&amp;gt;Re: Bitmapscan changes - Requesting further feedback&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00013.php &amp;lt;nowiki&amp;gt;Updated bitmap index patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00741.php &amp;lt;nowiki&amp;gt;Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01023.php &amp;lt;nowiki&amp;gt;Bitmap Indexes: request for feedback&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/message-id/800923.27831.qm@web29010.mail.ird.yahoo.com &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow accurate statistics to be collected on indexes with more than one column or expression indexes, perhaps using per-index statistics&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2006-10/msg00222.php &amp;lt;nowiki&amp;gt;Re: Simple join optimized badly?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01131.php &amp;lt;nowiki&amp;gt;Stats for multi-column indexes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00741.php &amp;lt;nowiki&amp;gt;Cross-column statistics revisited&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg01431.php &amp;lt;nowiki&amp;gt;Multi-Dimensional Histograms&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00913.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg02179.php &lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00459.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02054.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg01731.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00894.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-09/msg00679.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider having a larger statistics target for indexed columns and expression indexes. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider smaller indexes that record a range of values per heap page, rather than having one index entry for every heap row&lt;br /&gt;
|This is useful if the heap is clustered by the indexed values. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00341.php &amp;lt;nowiki&amp;gt;Grouped Index Tuples&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg01264.php &amp;lt;nowiki&amp;gt;Grouped Index Tuples&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00465.php &amp;lt;nowiki&amp;gt;Grouped Index Tuples / Clustered Indexes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php &amp;lt;nowiki&amp;gt;Bitmapscan changes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00014.php &amp;lt;nowiki&amp;gt;Re: GIT patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00487.php &amp;lt;nowiki&amp;gt;Re: Index Tuple Compression Approach?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01589.php &amp;lt;nowiki&amp;gt;Re: Index AM change proposals, redux&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY&lt;br /&gt;
|This is difficult because you must upgrade to an exclusive table lock to replace the existing index file.  CREATE INDEX CONCURRENTLY does not have this complication.  This would allow index compaction without downtime. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2007-08/msg00289.php &amp;lt;nowiki&amp;gt;Re: When/if to Reindex&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow multiple indexes to be created concurrently, ideally via a single heap scan&lt;br /&gt;
|pg_restore allows parallel index builds, but it is done via subprocesses, and there is no SQL interface for this.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-04/msg00093.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider sorting entries before inserting into btree index&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-01/msg01010.php &amp;lt;nowiki&amp;gt;Re: ATTN: Clodaldo was Performance problem. Could it be related to 8.3-beta4?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow index scans to return matching index keys, not just the matching heap locations&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01657.php &amp;lt;nowiki&amp;gt;Re: Is this TODO item done?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01477.php &amp;lt;nowiki&amp;gt;Index-only quals&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow creation of an index that can do comparisons to test if a value is between two column values&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00757.php &amp;lt;nowiki&amp;gt;Proposal: temporal extension &amp;amp;quot;period&amp;amp;quot; data type&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider using &amp;quot;effective_io_concurrency&amp;quot; for index scans&lt;br /&gt;
* Currently only bitmap scans use this, which might be fine because most multi-row index scans use bitmap scans.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problem with btree page splits during checkpoints&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00052.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-09/msg00184.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== GIST ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add more GIST index support for geometric data types}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow GIST indexes to create certain complex index types, like digital trees (see Aoki)}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix performance issues in contrib/seg and contrib/cube GiST support&lt;br /&gt;
* [http://archives.postgresql.org/message-id/alpine.DEB.2.00.0904161633160.4053@aragorn.flymine.org GiST index performance]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/alpine.DEB.2.00.0904221704470.22330@aragorn.flymine.org draft patch]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2009-05/msg00069.php &amp;lt;nowiki&amp;gt;Re: GiST index performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2009-06/msg00068.php &amp;lt;nowiki&amp;gt;GiST index performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Hash ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add UNIQUE capability to hash indexes}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add hash WAL logging for crash recovery&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-09/msg00196.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow multi-column hash indexes}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Sorting ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider whether duplicate keys should be sorted by block/offset&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00558.php &amp;lt;nowiki&amp;gt;Remove hacks for old bad qsort() implementations?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider being smarter about memory and external files used during sorts&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg01101.php &amp;lt;nowiki&amp;gt;Sorting Improvements for 8.4&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00045.php &amp;lt;nowiki&amp;gt;Re: Sorting Improvements for 8.4&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider detoasting keys before sorting}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow sorts to use more available memory&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2007-11/msg01026.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg01123.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg01957.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Fsync ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options and whether fsync does anything&lt;br /&gt;
|Ideally this requires a separate test program like /contrib/pg_test_fsync that can be run at initdb time or optionally later.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider sorting writes during checkpoint&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php &amp;lt;nowiki&amp;gt;Sorted writes in checkpoint&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-07/msg00050.php &amp;lt;nowiki&amp;gt;Re: Sorting writes during checkpoint&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg02012.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg00278.php&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CA+TgmoaHu1zuNohoE=cEP0nSc+0wtuRSyEAj_Af2XhxU+ry6-w@mail.gmail.com checkpoint writeback via sync_file_range]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Cache Usage ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Speed up COUNT(*)&lt;br /&gt;
|We could use a fixed row count and a +/- count to follow MVCC visibility rules, or a single cached value could be used and invalidated if anyone modifies the table.  Another idea is to get a count directly from a unique index, but for this to be faster than a sequential scan it must avoid access to the heap to obtain tuple visibility information.  Note that the index-only scans feature is now implemented which now dramatically speeds up some COUNT(*) cases.&lt;br /&gt;
* http://wiki.postgresql.org/wiki/Slow_Counting&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide a way to calculate an &amp;amp;quot;estimated COUNT(*)&amp;amp;quot;&lt;br /&gt;
|Perhaps by using the optimizer's cardinality estimates or random sampling.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-11/msg00943.php &amp;lt;nowiki&amp;gt;Re: Improving count(*)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://wiki.postgresql.org/wiki/Slow_Counting&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow data to be pulled directly from indexes&lt;br /&gt;
|Currently indexes do not have enough tuple visibility information to allow data to be pulled from the index without also accessing the heap.  The idea is to use the visibility map used for vacuum to avoid heap lookups on pages where all tuples are visible.&lt;br /&gt;
* [http://wiki.postgresql.org/wiki/Index-only_scans Index-Only Scans wiki]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider automatic caching of statements at various levels:&lt;br /&gt;
* Parsed query tree&lt;br /&gt;
* Query execute plan&lt;br /&gt;
* Query results &lt;br /&gt;
&lt;br /&gt;
:&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg00823.php &amp;lt;nowiki&amp;gt;Cached Query Plans (was: global prepared statements)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider increasing internal areas (NUM_CLOG_BUFFERS) when shared buffers is increased&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-10/msg01419.php &amp;lt;nowiki&amp;gt;Re: slru.c race condition (was Re: TRAP: FailedAssertion(&amp;amp;quot;!((itemid)-&amp;amp;gt;lp_flags &amp;amp;amp; 0x01)&amp;amp;quot;,)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00030.php &amp;lt;nowiki&amp;gt;clog_buffers to 64 in 8.3?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2007-08/msg00024.php &amp;lt;nowiki&amp;gt;CLOG Patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider decreasing the amount of memory used by PrivateRefCount&lt;br /&gt;
|&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php &amp;lt;nowiki&amp;gt;PrivateRefCount (for 8.3)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php &amp;lt;nowiki&amp;gt;Re: PrivateRefCount (for 8.3)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider allowing higher priority queries to have referenced buffer cache pages stay in memory longer&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00562.php &amp;lt;nowiki&amp;gt;Re: How to keep a table in memory?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Vacuum ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Auto-fill the free space map by scanning the buffer cache or by checking pages written by the background writer&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-02/msg01125.php &amp;lt;nowiki&amp;gt;Dead Space Map&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-03/msg00011.php &amp;lt;nowiki&amp;gt;Re: Automatic free space map filling&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow concurrent inserts to use recently created pages rather than creating new ones&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg00853.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider having single-page pruning update the visibility map&lt;br /&gt;
* &amp;lt;nowiki&amp;gt;https://commitfest.postgresql.org/action/patch_view?id=75&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg02344.php &amp;lt;nowiki&amp;gt;Re: visibility maps and heap_prune&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve tracking of total relation tuple counts now that vacuum doesn't always scan the whole heap&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg00531.php Partial vacuum versus pg_class.reltuples]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Bias FSM towards returning free space near the beginning of the heap file, in hopes that empty pages at the end can be truncated by VACUUM&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg01124.php &amp;lt;nowiki&amp;gt;FSM search modes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a more compact data representation for dead tuple locations within VACUUM&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00143.php &amp;lt;nowiki&amp;gt;Re: Have vacuum emit a warning when it runs out of maintenance_work_mem&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide more information in order to improve user-side estimates of dead space bloat in relations&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2009-05/msg01039.php &amp;lt;nowiki&amp;gt;Re: Bloated Table&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve locking behaviour of vacuum during trailing page truncation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-03/msg00319.php&lt;br /&gt;
* http://archives.postgresql.org/message-id/4D8DF88E.7080205@Yahoo.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce the number of table scans performed by vacuum&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg01119.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00605.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-07/msg00624.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Auto-vacuum ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Issue log message to suggest VACUUM FULL if a table is nearly empty?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent long-lived temporary tables from causing frozen-xid advancement starvation&lt;br /&gt;
|The problem is that autovacuum cannot vacuum them to set frozen xids; only the session that created them can do that. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-06/msg01645.php &amp;lt;nowiki&amp;gt;Re: AutoVacuum Behaviour Question&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent autovacuum from running if an old transaction is still running from the last vacuum&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00899.php &amp;lt;nowiki&amp;gt;Re: Autovacuum and OldestXmin&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have autoanalyze of parent tables occur when child tables are modified&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-06/msg00137.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-10/msg00271.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow parallel cores to be used by vacuumdb&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4F10A728.7090403@agliodbs.com vacuumdb -j]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Locking ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix priority ordering of read and write light-weight locks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-11/msg00893.php &amp;lt;nowiki&amp;gt;lwlocks and starvation&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-11/msg00905.php &amp;lt;nowiki&amp;gt;Re: lwlocks and starvation&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problem when multiple subtransactions of the same outer transaction hold different types of locks, and one subtransaction aborts&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-11/msg01011.php &amp;lt;nowiki&amp;gt;FOR SHARE vs FOR UPDATE locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00001.php &amp;lt;nowiki&amp;gt;Re: FOR SHARE vs FOR UPDATE locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00435.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00773.php &amp;lt;nowiki&amp;gt;Re: savepoints and upgrading locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow UPDATEs on only non-referential integrity columns not to conflict with referential integrity locks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00073.php &amp;lt;nowiki&amp;gt;Referential Integrity and SHARE locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add idle_in_transaction_timeout GUC so locks are not held for long periods of time}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve deadlock detection when a page cleaning lock conflicts with a shared buffer that is pinned&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-01/msg00138.php &amp;lt;nowiki&amp;gt;BUG #3883: Autovacuum deadlock with truncate?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00873.php &amp;lt;nowiki&amp;gt;Thoughts about bug #3883&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-committers/2008-01/msg00365.php &amp;lt;nowiki&amp;gt;Re: pgsql: Add checks to TRUNCATE, CLUSTER, and REINDEX to prevent&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Detect deadlocks involving LockBufferForCleanup()&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00873.php &amp;lt;nowiki&amp;gt;Thoughts about bug #3883&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow finer control over who is cancelled in a deadlock&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01727.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a lock timeout parameter&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00485.php &amp;lt;nowiki&amp;gt;SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Reduce number of unnecessary false positives in Serializable Snapshot Isolation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-06/msg00609.php &amp;lt;nowiki&amp;gt;SSI heap_insert and page-level predicate locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Startup Time Improvements ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Experiment with multi-threaded backend for backend creation&lt;br /&gt;
|This would prevent the overhead associated with process creation. Most operating systems have trivial process creation time compared to database startup overhead, but a few operating systems (Win32, Solaris) might benefit from threading.  Also explore the idea of a single session using multiple threads to execute a statement faster.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow backends to change their database without restart&lt;br /&gt;
|This allows for faster server startup.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00843.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00336.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Write-Ahead Log ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Eliminate need to write full pages to WAL before page modification&lt;br /&gt;
|Currently, to protect against partial disk page writes, we write full page images to WAL before they are modified so we can correct any partial page writes during recovery.  These pages can also be eliminated from point-in-time archive files. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2002-06/msg00655.php &amp;lt;nowiki&amp;gt;Re: Index Scans become Seq Scans after VACUUM ANALYSE&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg01191.php&lt;br /&gt;
* [http://archives.postgresql.org/message-id/20120105061916.GB21048@fetter.org WIP double writes]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4EFC449F02000025000441CD@gw.wicourts.gov double writes]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/20120110214344.GB21106@fetter.org Double-write with Fast Checksums]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/1962493974.656458.1327703514780.JavaMail.root@zimbra-prod-mbox-4.vmware.com double writes using &amp;quot;double-write buffer&amp;quot; approach]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|When full page writes are off, write CRC to WAL and check file system blocks on recovery&lt;br /&gt;
|If CRC check fails during recovery, remember the page in case a later CRC for that page properly matches.  The difficulty is that hint bits are not WAL logged, meaning a valid page might not match the earlier CRC.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Write full pages during file system write and not when the page is modified in the buffer cache&lt;br /&gt;
|This allows most full page writes to happen in the background writer.  It might cause problems for applying WAL on recovery into a partially-written page, but later the full page will be replaced from WAL.&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CAGvK12UST-tPhyLrSLuSpwFxZbAO79yYrhV2xaLmS2MkUxNUVQ@mail.gmail.com Page Checksums + Double Writes]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce WAL traffic so only modified values are written rather than entire rows&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01589.php &amp;lt;nowiki&amp;gt;Reduction in WAL for UPDATEs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow WAL information to recover corrupted pg_controldata&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] pg_resetxlog -r flag&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Find a way to reduce rotational delay when repeatedly writing last WAL page&lt;br /&gt;
|Currently fsync of WAL requires the disk platter to perform a full rotation to fsync again. One idea is to write the WAL to different offsets that might reduce the rotational delay. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2002-11/msg00483.php &amp;lt;nowiki&amp;gt;500 tpsQL + WAL log implementation&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Speed WAL recovery by allowing more than one page to be prefetched&lt;br /&gt;
|This should be done utilizing the same infrastructure used for prefetching in general to avoid introducing complex error-prone code in WAL replay. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php &amp;lt;nowiki&amp;gt;Slow PITR restore&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php &amp;lt;nowiki&amp;gt;Re: [GENERAL] Slow PITR restore&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg01279.php &amp;lt;nowiki&amp;gt;Read-ahead and parallelism in redo recovery&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve WAL concurrency by increasing lock granularity&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00556.php &amp;lt;nowiki&amp;gt;Reworking WAL locking&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Be more aggressive about creating WAL files&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg01325.php &amp;lt;nowiki&amp;gt;Re: PANIC caused by open_sync on Linux&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-07/msg01075.php &amp;lt;nowiki&amp;gt;PreallocXlogFiles&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-04/msg00556.php &amp;lt;nowiki&amp;gt;WAL/PITR additional items&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have resource managers report the duration of their status changes&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg01468.php &amp;lt;nowiki&amp;gt;Recovery of Multi-stage WAL actions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move pgfoundry's xlogdump to /contrib and have it rely more closely on the WAL backend code&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00035.php &amp;lt;nowiki&amp;gt;xlogdump&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Close deleted WAL files held open in *nix by long-lived read-only backends&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-11/msg01754.php &amp;lt;nowiki&amp;gt;Deleted WAL files held open by backends in Linux&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg00060.php &amp;lt;nowiki&amp;gt;Re: Deleted WAL files held open by backends in Linux&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Optimizer / Executor ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve selectivity functions for geometric operators}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider increasing the default values of from_collapse_limit, join_collapse_limit, and/or geqo_threshold&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4136ffa0905210551u22eeb31bn5655dbe7c9a3aed5@mail.gmail.com from_collapse_limit vs. geqo_threshold]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve ability to display optimizer analysis using OPTIMIZER_DEBUG}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Log statements where the optimizer row estimates were dramatically different from the number of rows actually found?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider compressed annealing to search for query plans&lt;br /&gt;
|This might replace GEQO.&lt;br /&gt;
* http://archives.postgresql.org/message-id/15658.1241278636%40sss.pgh.pa.us&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve use of expression indexes for ORDER BY &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01553.php &amp;lt;nowiki&amp;gt;Resjunk sort columns, Heikki's index-only quals patch, and bug #5000&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Modify the planner to better estimate caching effects&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-11/msg00117.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow shared buffer cache contents to affect index cost computations&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg01140.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Hashing ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider using a hash for joining to a large IN (VALUES ...) list&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00450.php &amp;lt;nowiki&amp;gt;Planning large IN lists&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow single batch hash joins to preserve outer pathkeys&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php Re: Potential Join Performance Issue]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;quot;lazy&amp;quot; hash tables - look up only the tuples that are actually requested&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid building the same hash table more than once during the same query&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid hashing for distinct and then re-hashing for hash join&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4136ffa0902191346g62081081v8607f0b92c206f0a@mail.gmail.com Re: Fixing Grittner's planner issues]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Background Writer ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider having the background writer update the transaction status hint bits before writing out the page&lt;br /&gt;
|Implementing this requires the background writer to have access to system catalogs and the transaction status log.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider adding buffers the background writer finds reusable to the free list &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php &amp;lt;nowiki&amp;gt;Background LRU Writer/free list&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CA+U5nMKtvyDcV4zTr7bq7t6cA2nBfLxCJ8tQgVBnc5ddRPO+Bg@mail.gmail.com our buffer replacement strategy is kind of lame]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Automatically tune bgwriter_delay based on activity rather then using a fixed interval&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php &amp;lt;nowiki&amp;gt;Background LRU Writer/free list&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CA+U5nMKtvyDcV4zTr7bq7t6cA2nBfLxCJ8tQgVBnc5ddRPO+Bg@mail.gmail.com our buffer replacement strategy is kind of lame]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider whether increasing BM_MAX_USAGE_COUNT improves performance&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg01007.php &amp;lt;nowiki&amp;gt;Bgwriter LRU cleaning: we've been going at this all wrong&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Test to see if calling PreallocXlogFiles() from the background writer will help with WAL segment creation latency&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-06/msg00340.php &amp;lt;nowiki&amp;gt;Re: Load Distributed Checkpoints, final patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Concurrent Use of Resources ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Do async I/O for faster random read-ahead of data&lt;br /&gt;
|Async I/O allows multiple I/O requests to be sent to the disk with results coming back asynchronously.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php &amp;lt;nowiki&amp;gt;Asynchronous I/O Support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2007-09/msg00255.php &amp;lt;nowiki&amp;gt;Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00027.php &amp;lt;nowiki&amp;gt;There's random access and then there's random access&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-01/msg00170.php &amp;lt;nowiki&amp;gt;Bitmap index scan preread using posix_fadvise (Was: There's random access and then there's random access)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
The above patch is already applied as of 8.4, but it still remains to figure out how to handle plain indexscans effectively.&lt;br /&gt;
* [http://archives.postgresql.org//pgsql-hackers/2009-01/msg00806.php Problems with the patch submitted for posix_fadvise in index scans]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Experiment with multi-threaded backend for better I/O utilization&lt;br /&gt;
|This would allow a single query to make use of multiple I/O channels simultaneously.  One idea is to create a background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be expanded to allow concurrent reads from multiple devices in a partitioned table.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-02/msg00123.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Experiment with multi-threaded backend for better CPU utilization&lt;br /&gt;
|This would allow several CPUs to be used for a single query, such as for sorting or query execution.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00945.php &amp;lt;nowiki&amp;gt;Multi CPU Queries - Feedback and/or suggestions wanted!&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|SMP scalability improvements&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00439.php &amp;lt;nowiki&amp;gt;Straightforward changes for increased SMP scalability&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00206.php &amp;lt;nowiki&amp;gt;Re: Reducing Transaction Start/End Contention&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00361.php &amp;lt;nowiki&amp;gt;Re: Reducing Transaction Start/End Contention&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== TOAST ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow user configuration of TOAST thresholds&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00213.php &amp;lt;nowiki&amp;gt;Re: Proposed adjustments in MaxTupleSize and toastthresholds&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php &amp;lt;nowiki&amp;gt;pg_lzcompress strategy parameters&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce unnecessary cases of deTOASTing&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00895.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] Eliminate more detoast copies for packed varlenas&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce costs of repeat de-TOASTing of values&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg01096.php &amp;lt;nowiki&amp;gt;WIP patch: reducing overhead for repeat de-TOASTing&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Monitoring ==&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Expand pg_stat_activity for easier integration with monitoring tools&lt;br /&gt;
|* http://archives.postgresql.org/message-id/4DFA13A5.2060200@2ndQuadrant.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add column to pg_stat_activity that shows the progress of long-running commands like CREATE INDEX and VACUUM&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-04/msg00203.php &amp;lt;nowiki&amp;gt;EXPLAIN progress info&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* The CLUSTER/VACUUM FULL implementation would also be useful to track this way&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have pg_stat_activity display query strings in the correct client encoding&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00131.php &amp;lt;nowiki&amp;gt;pg_stats queries versus per-database encodings&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Expose pg_controldata via an SQL interface&lt;br /&gt;
|Helpful for monitoring replicated databases&lt;br /&gt;
* http://archives.postgresql.org/message-id/4B901D73.8030003@agliodbs.com&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4B959D7A.6010907@joeconway.com initial patch]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Miscellaneous Performance ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Use mmap() rather than SYSV for shared buffers?&lt;br /&gt;
|This would remove the requirement for SYSV SHM but would introduce portability issues. Anonymous mmap (or mmap to /dev/zero) is required to prevent I/O overhead. We could also consider mmap() for writing WAL.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00750.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00756.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Rather than consider mmap()-ing in 8k pages, consider mmap()'ing entire files into a backend?&lt;br /&gt;
|Doing I/O to large tables would consume a lot of address space or require frequent mapping/unmapping.  Extending the file also causes mapping problems that might require mapping only individual pages, leading to thousands of mappings.  Another problem is that there is no way to _prevent_ I/O to disk from the dirty shared buffers so changes could hit disk before WAL is written.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01239.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider ways of storing rows more compactly on disk:&lt;br /&gt;
* Reduce the row header size?&lt;br /&gt;
* Consider reducing on-disk varlena length from four bytes to two because a heap row cannot be more than 64k in length}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider transaction start/end performance improvements&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php &amp;lt;nowiki&amp;gt;Reducing Transaction Start/End Contention&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00361.php &amp;lt;nowiki&amp;gt;Re: Reducing Transaction Start/End Contention&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow configuration of backend priorities via the operating system&lt;br /&gt;
|Though backend priorities make priority inversion during lock waits possible, research shows that this is not a huge problem.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-02/msg00493.php &amp;lt;nowiki&amp;gt;Priorities for users or queries?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider increasing the minimum allowed number of shared buffers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-02/msg00157.php &amp;lt;nowiki&amp;gt;Re: [PATCH] Don't bail with legitimate -N/-B options&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider if CommandCounterIncrement() can avoid its AcceptInvalidationMessages() call&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-committers/2007-11/msg00585.php &amp;lt;nowiki&amp;gt;pgsql: Avoid incrementing the CommandCounter when&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2007-12/msg00090.php &amp;lt;nowiki&amp;gt;Re: TB-sized databases&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider not storing a NULL bitmap on disk if all the NULLs are trailing&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00624.php &amp;lt;nowiki&amp;gt;Proposal for Null Bitmap Optimization(for Trailing NULLs)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-12/msg00109.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] Proposal for Null Bitmap Optimization(for TrailingNULLs)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Sort large UPDATE/DELETEs so it is done in heap order&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg01119.php &amp;lt;nowiki&amp;gt;Possible future performance improvement: sort updates/deletes by ctid&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow one transaction to see tuples using the snapshot of another transaction&lt;br /&gt;
|This would assist multiple backends in working together. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00400.php &amp;lt;nowiki&amp;gt;Transaction Snapshot Cloning&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00135.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00260.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00466.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-08/msg00684.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider decreasing the I/O caused by updating tuple hint bits&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00847.php &amp;lt;nowiki&amp;gt;Hint Bits and Write I/O&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-07/msg00199.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] Hint Bits and Write I/O&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg00695.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00792.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg01063.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01408.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01453.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid the requirement of freezing pages that are infrequently modified &lt;br /&gt;
|If all rows on a page are visible, it is possible to set a bit in the visibility map (once the visibility map is 100% reliable) and not need to freeze the page, avoiding a page rewrite&lt;br /&gt;
*  http://archives.postgresql.org/message-id/4BF701CF.2090205@agliodbs.com&lt;br /&gt;
*  http://archives.postgresql.org/pgsql-hackers/2010-06/msg00082.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid reading in b-tree pages when replaying vacuum records in hot standby mode&lt;br /&gt;
* [http://archives.postgresql.org/message-id/1272571938.4161.14739.camel@ebony &amp;lt;nowiki&amp;gt;Hot Standby tuning for btree_xlog_vacuum()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Restructure truncation logic to be more resistant to failure&lt;br /&gt;
|This also involves not writing dirty buffers for a truncated or dropped relation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01032.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider adding logic to increase large tables by more than 8k&lt;br /&gt;
|This would reduce file system fragmentation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-03/msg00337.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Miscellaneous Other ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Deal with encoding issues for filenames in the server filesystem&lt;br /&gt;
* {{MessageLink|20090413184335.39BE.52131E4D@oss.ntt.co.jp|a proposed patch here}}&lt;br /&gt;
* {{MessageLink|8484.1244655656@sss.pgh.pa.us|some issues about it here}}&lt;br /&gt;
* {{MessageLink|20100107103740.97A5.52131E4D@oss.ntt.co.jp|Windows-specific patch here}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Deal with encoding issues in the output of localeconv()&lt;br /&gt;
* [http://archives.postgresql.org/message-id/40c6d9160904210658y590377cfw6dbbecb53d2b8be0@mail.gmail.com bug report]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/49EF8DA0.90008@tpf.co.jp draft patch]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/21710.1243620986@sss.pgh.pa.us review of patch]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide schema name and other fields available from SQL GET DIAGNOSTICS in error reports&lt;br /&gt;
* [http://archives.postgresql.org/message-id/dcc563d10810211907n3c59a920ia9eb7cd2a6d5ea58@mail.gmail.com &amp;lt;nowiki&amp;gt;How to get schema name which violates fk constraint&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-11/msg00846.php &amp;lt;nowiki&amp;gt;patch - Report the schema along table name in a referential failure 	error message&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* {{MessageLink|3191.1263306359@sss.pgh.pa.us|Re: NOT NULL violation and error-message}}&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg00213.php &amp;lt;nowiki&amp;gt;the case for machine-readable error fields&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
| Provide [http://developer.postgresql.org/pgdocs/postgres/libpq-connect.html#LIBPQ-CONNECT-FALLBACK-APPLICATION-NAME fallback_application_name] in contrib/pgbench, oid2name, and dblink.&lt;br /&gt;
* {{MessageLink|w2g9837222c1004070216u3bc46b3ahbddfdffdbfb46212@mail.gmail.com|fallback_application_name and pgbench}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add 64-bit support to /contrib/pgbench&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-07/msg00153.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg00705.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Source Code ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add use of 'const' for variables in source tree&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg00473.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Remove warnings created by -Wcast-align}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move platform-specific ps status display info from ps_status.c to ports}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add optional CRC checksum to heap and index pages&lt;br /&gt;
|One difficulty is how to prevent hint bit changes from affecting the computed CRC checksum.&lt;br /&gt;
* http://archives.postgresql.org/message-id/19934.1226601952%40sss.pgh.pa.us&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00002.php &amp;lt;nowiki&amp;gt;Re: Block-level CRC checks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01028.php &amp;lt;nowiki&amp;gt;double-buffering page writes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00524.php &amp;lt;nowiki&amp;gt;Re: Block-level CRC checks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg01101.php &amp;lt;nowiki&amp;gt;Re: Block-level CRC checks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg00011.php &amp;lt;nowiki&amp;gt;Re: Block-level CRC checks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00249.php&lt;br /&gt;
* http://archives.postgresql.org/message-id/20111221215913.GA4536@fetter.org&lt;br /&gt;
* http://archives.postgresql.org/message-id/CA+U5nMJzQyxcObkpNAf1SYTX-gO_Mom3O9JXHnGpxRo1kXJ7ww@mail.gmail.com&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-01/msg00128.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-01/msg00113.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-02/msg00172.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-03/msg00001.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-03/msg00188.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a faster CRC32 algorithm&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01112.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow cross-compiling by generating the zic database on the target system}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve NLS maintenance of libpgport messages linked onto applications}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Use UTF8 encoding for NLS messages so all server encodings can read them properly}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow creation of universal binaries for Darwin&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00884.php &amp;lt;nowiki&amp;gt;Getting to universal binaries for Darwin&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider GnuTLS if OpenSSL license becomes a problem&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg00892.php&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-05/msg00040.php &amp;lt;nowiki&amp;gt;[PATCH] Add support for GnuTLS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg01213.php &amp;lt;nowiki&amp;gt;TODO: GNU TLS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider making NAMEDATALEN more configurable in future releases}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Research use of signals and sleep wake ups&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00003.php &amp;lt;nowiki&amp;gt;Restartable signals 'n all that&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow C++ code to more easily access backend code&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00302.php &amp;lt;nowiki&amp;gt;Mostly Harmless: Welcoming our C++ friends&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider simplifying how memory context resets handle child contexts&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-08/msg00067.php &amp;lt;nowiki&amp;gt;Re: Memory leak in nodeAgg&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Create three versions of libpgport to simplify client code&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00154.php &amp;lt;nowiki&amp;gt;8.4 TODO item: make src/port support libpq and ecpg directly&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve detection of shared memory segments being used by others by checking the SysV shared memory field 'nattch'&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00656.php &amp;lt;nowiki&amp;gt;postgresql in FreeBSD jails: proposal&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00673.php &amp;lt;nowiki&amp;gt;Re: postgresql in FreeBSD jails: proposal&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement the non-threaded Avahi service discovery protocol&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00939.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] Avahi support for Postgresql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-02/msg00097.php &amp;lt;nowiki&amp;gt;Re: Avahi support for Postgresql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg01211.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] Avahi support for Postgresql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-04/msg00001.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] Avahi support for Postgresql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce data row alignment requirements on some 64-bit systems&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00369.php &amp;lt;nowiki&amp;gt;[WIP] Reduce alignment requirements on 64-bit systems.&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Restructure TOAST internal storage format for greater flexibility&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00049.php &amp;lt;nowiki&amp;gt;Re: PG_PAGE_LAYOUT_VERSION 5 - time for change&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Add regression tests for pg_dump/restore&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01967.php &amp;lt;nowiki&amp;gt;&amp;quot;make install-check-pg_dump&amp;quot; target in src/regress]&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Research different memory allocation methods for lists&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg01467.php &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Consider removing the attribute options cache&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00039.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Restructure /contrib section&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00705.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== /contrib/pg_upgrade ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Handle large object comments&lt;br /&gt;
|This is difficult to do because the large object doesn't exist when --schema-only is loaded.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider using pg_depend for checking object usage in version.c&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|If reindex is necessary, allow it to be done in parallel with pg_dump custom format&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Migrate pg_statistic by dumping it out as a flat file, so analyze is not necessary&lt;br /&gt;
|pg_class.oid is not preserved so schema.tablename must be used.&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CAAZKuFaWdLkK8eozSAooZBets9y_mfo2HS6urPAKXEPbd-JLCA@mail.gmail.com pg_upgrade and statistics]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve testing, perhaps using the buildfarm&lt;br /&gt;
|The buildfarm has access to multiple versions of PostgreSQL.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Create machine-readable output of pg_controldata&lt;br /&gt;
|This would avoid parsing its output.  The problem is we need pg_controldata output from both the old and new clusters so we would need to support both formats.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Windows ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Remove configure.in check for link failure when cause is found}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Remove readdir() errno patch when runtime/mingwex/dirent.c rev 1.4 is released}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow psql to use readline once non-US code pages work with backslashes}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problem with shared memory on the Win32 Terminal Server}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve signal handling&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2005-06/msg00027.php &amp;lt;nowiki&amp;gt;Simplify Win32 Signaling code&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Convert MSVC build system to remove most batch files&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00961.php &amp;lt;nowiki&amp;gt;MSVC build system&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support pgxs when using MSVC}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix MSVC NLS support, like for to_char()&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00485.php &amp;lt;nowiki&amp;gt;NLS on MSVC  strikes back!&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-02/msg00038.php &amp;lt;nowiki&amp;gt;Fix for 8.3 MSVC locale (Was  [HACKERS] NLS on MSVC strikes back!)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Find a correct rint() substitute on Windows&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00808.php &amp;lt;nowiki&amp;gt;Minor bug in src/port/rint.c&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix global namespace issues when using multiple terminal server sessions&lt;br /&gt;
* [http://archives.postgresql.org/message-id/48F3BFCC.8030107@dunslane.net problems with Windows global namespace]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change from the current autoconf/gmake build system to cmake&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg01869.php &amp;lt;nowiki&amp;gt;About CMake (was Re: [COMMITTERS] pgsql: Append major version number and for libraries soname major)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve consistency of path separator usage&lt;br /&gt;
* http://archives.postgresql.org/message-id/49C0BDC5.4010002@hagander.net&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix cross-compiling on Windows&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2010-10/msg00110.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow multiple Postgres clusters running on the same machine to distinguish themselves in the event log&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01297.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00574.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Wire Protocol Changes ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow dynamic character set handling}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add decoded type, length, precision}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Mark result columns as known-not-null when possible&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-11/msg01029.php &amp;lt;nowiki&amp;gt;Adding nullable indicator to Describe&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide more control over planner treatment of statements being prepared}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Use compression?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Update clients to use data types, typmod, schema.table.column names of result sets using new statement protocol}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Set protocol for wire format negotiation&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CACMqXCKkGrGXxQhjHCKCe0B8hn6sTt-1sdgHZOSGQMxrusOsQA@mail.gmail.com GUC_REPORT for protocol tunables]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make sure upgrading to a 4.1 protocol version will actually work smoothly&lt;br /&gt;
* [http://archives.postgresql.org/message-id/28307.1318255008@sss.pgh.pa.us Re: libpq, PQdescribePrepared -&amp;gt; PQftype, PQfmod, no PQnullable]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Convert single quotes to apostrophes in the PDF documentation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-docs/2007-12/msg00059.php &amp;lt;nowiki&amp;gt;SGML docs and pdf single-quotes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide a manpage for postgresql.conf&lt;br /&gt;
* {{messageLink|20080819194311.GH4428@alvh.no-ip.org|A smaller default postgresql.conf}}&lt;br /&gt;
* {{messageLink|200808211910.37524.peter_e@gmx.net|A smaller default postgresql.conf}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change the manpage-generating toolchain to use the new XML-based docbook2x tools&lt;br /&gt;
* {{messageLink|200808211910.37524.peter_e@gmx.net|A smaller default postgresql.conf}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider changing documentation format from SGML to XML&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-docs/2006-12/msg00152.php &amp;lt;nowiki&amp;gt;Re: Authoring Tools WAS: Switching to XML&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-docs/2011-04/msg00020.php&lt;br /&gt;
* http://wiki.postgresql.org/wiki/Switching_PostgreSQL_documentation_from_SGML_to_XML&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Document support for N&amp;lt;nowiki&amp;gt;' '&amp;lt;/nowiki&amp;gt; national character string literals, if it matches the SQL standard&lt;br /&gt;
* http://archives.postgresql.org/message-id/1275895438.1849.1.camel@fsopti579.F-Secure.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add diagrams to the documentation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-docs/2010-07/msg00001.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Exotic Features ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add pre-parsing phase that converts non-ISO syntax to supported syntax&lt;br /&gt;
|This could allow SQL written for other databases to run without modification.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow plug-in modules to emulate features from other databases}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add features of Oracle-style packages&lt;br /&gt;
|A package would be a schema with session-local variables, public/private functions, and initialization functions.  It is also possible to implement these capabilities in any schema and not use a separate &amp;amp;quot;packages&amp;amp;quot; syntax at all.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-08/msg00384.php &amp;lt;nowiki&amp;gt;proposal for PL packages for 8.3.&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider allowing control of upper/lower case folding of unquoted identifiers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-04/msg00818.php &amp;lt;nowiki&amp;gt;Bringing PostgreSQL torwards the standard regarding case folding&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-10/msg01527.php &amp;lt;nowiki&amp;gt;Re: [SQL] Case Preservation disregarding case sensitivity?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php &amp;lt;nowiki&amp;gt;TODO Item: Consider allowing control of upper/lower case folding of unquoted,  identifiers&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00415.php &amp;lt;nowiki&amp;gt;Identifier case folding notes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00415.php &amp;lt;nowiki&amp;gt;Identifier case folding notes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add autonomous transactions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php &amp;lt;nowiki&amp;gt;autonomous transactions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Give query progress indication&lt;br /&gt;
* [[Query progress indication]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Rethink our type system&lt;br /&gt;
* [[Rethinking datatypes]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Features We Do ''Not'' Want ==&lt;br /&gt;
&lt;br /&gt;
The following features have been discussed ad nauseum on the PostgreSQL mailing lists and the consensus has been that the project is not interested in them.  As such, if you are going to bring them up as potential features, you will want to be familiar with all of the arguments against these features which have been previously made over the years.  If you decide to work on such features anyway, you should be aware that you face a higher-than-normal barrier to get the Project to accept them.&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|All backends running as threads in a single process (not wanted)&lt;br /&gt;
|This eliminates the process protection we get from the current setup. Thread creation is usually the same overhead as process creation on modern systems, so it seems unwise to use a pure threaded model, and MySQL and DB2 have demonstrated that threads introduce as many issues as they solve.  Threading specific operations such as I/O, seq scans, and connection management has been discussed and will probably be implemented to enable specific performance features.  Moving to a threaded engine would also require halting all other work on PostgreSQL for one to two years.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;quot;Oracle-style&amp;quot; optimizer hints (not wanted)&lt;br /&gt;
|Optimizer hints, as implemented in Oracle and other RDBMSes, are used to work around problems in the optimizer and introduce upgrade and maintenance issues.  We would rather have such problems reported and fixed.  We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. See [[OptimizerHintsDiscussion|Optimizer Hints Discussion]] for further information.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Embedded server (not wanted)&lt;br /&gt;
|While PostgreSQL clients runs fine in limited-resource environments, the server requires multiple processes and a stable pool of resources to run reliably and efficiently. Stripping down the PostgreSQL server to run in the same process address space as the client application would add too much complexity and failure cases. Besides, there are several very mature embedded SQL databases already available.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Obfuscated function source code (not wanted)&lt;br /&gt;
|Obfuscating function source code has minimal protective benefits because anyone with super-user access can find a way to view the code. At the same time, it would greatly complicate backups and other administrative tasks. To prevent non-super-users from viewing function source code, remove SELECT permission on pg_proc.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-09/msg00668.php &amp;lt;nowiki&amp;gt;Obfuscated stored procedures (was Re: Oracle and Postgresql)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Indeterminate behavior for the GROUP BY clause (not wanted)&lt;br /&gt;
|At least one other database product allows specification of a subset of the result columns which GROUP BY would need to be able to provide predictable results; the server is free to return any value from the group.  This is not viewed as a desirable feature.  PostgreSQL 9.1 allows result columns that are not referenced by GROUP BY if a primary key for the same table is referenced in GROUP BY.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-03/msg00297.php &amp;lt;nowiki&amp;gt;Re: SQL compatibility reminder: MySQL vs PostgreSQL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Todo]]&lt;/div&gt;</description>
			<pubDate>Tue, 20 Mar 2012 16:56:11 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Todo</comments>		</item>
		<item>
			<title>Todo</title>
			<link>http://wiki.postgresql.org/wiki/Todo</link>
			<guid>http://wiki.postgresql.org/wiki/Todo</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Locking */ Mark SSI optimization item done which was committed to 9.2 and a 9.1.1.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;margin: 1ex 1em; float: right;&amp;quot;&amp;gt;&lt;br /&gt;
__TOC__&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This list contains '''all known PostgreSQL bugs and feature requests'''. If you would like to work on an item, please read the [[Developer FAQ]] first. There is also a [[Development_information|development information page]].&lt;br /&gt;
&lt;br /&gt;
* {{TodoPending}} - marks ordinary, incomplete items&lt;br /&gt;
* {{TodoEasy}} - marks items that are easier to implement&lt;br /&gt;
* {{TodoDone}} - marks changes that are done, and will appear in the PostgreSQL 9.2 release.&lt;br /&gt;
&lt;br /&gt;
For help on editing this list, please see [[Talk:Todo]]. &amp;lt;b&amp;gt;Please do not add items here without discussion on the mailing list.&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div style=&amp;quot;padding: 1ex 4em;&amp;quot;&amp;gt;&lt;br /&gt;
== Administration ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow administrators to cancel multi-statement idle transactions&lt;br /&gt;
|This allows locks to be released, but it is complex to report the cancellation back to the client.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php &amp;lt;nowiki&amp;gt;Cancelling idle in transaction state&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg00441.php &amp;lt;nowiki&amp;gt;Re: Cancelling idle in transaction state&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Check for unreferenced table files created by transactions that were in-progress when the server terminated abruptly&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-06/msg00096.php &amp;lt;nowiki&amp;gt;Removing unreferenced files&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Set proper permissions on non-system schemas during db creation&lt;br /&gt;
|Currently all schemas are owned by the super-user because they are copied from the template1 database.  However, since all objects are inherited from the template database, it is not clear that setting schemas to the db owner is correct.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow log_min_messages to be specified on a per-module basis&lt;br /&gt;
|This would allow administrators to see more detailed information from specific sections of the backend, e.g. checkpoints, autovacuum, etc. Another idea is to allow separate configuration files for each module, or allow arbitrary SET commands to be passed to them. See also [[Logging Brainstorm]].}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Simplify creation of partitioned tables&lt;br /&gt;
|This would allow creation of partitioned tables without requiring creation of triggers or rules for INSERT/UPDATE/DELETE, and constraints for rapid partition selection.  Options could include range and hash partition selection. See also [[Table partitioning]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow custom variables to appear in pg_settings()&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00850.php &amp;lt;nowiki&amp;gt;Re: count(*) performance improvement ideas&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have custom variables be transaction-safe&lt;br /&gt;
* {{MessageLink|4B577E9F.8000505@dunslane.net|Custom GUCs still a bit broken}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement the SQL-standard mechanism whereby REVOKE ROLE revokes only the privilege granted by the invoking role, and not those granted by other roles&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-05/msg00010.php &amp;lt;nowiki&amp;gt;Re: Grantor name gets lost when grantor role dropped&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent query cancel packets from being replayed by an attacker, especially when using SSL&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-08/msg00345.php &amp;lt;nowiki&amp;gt;Replay attack of query cancel&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide a way to query the log collector subprocess to determine the name of the currently active log file&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-11/msg00418.php &amp;lt;nowiki&amp;gt;Current log files when rotating?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow simpler reporting of the unix domain socket directory and allow easier configuration of its default location&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg01555.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-10/msg01482.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow custom daemons to be automatically stopped/started along with the postmaster&lt;br /&gt;
|This allows easier administration of daemons like user job schedulers or replication-related daemons.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01701.php &amp;lt;nowiki&amp;gt;Re: scheduler in core&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve logging of prepared transactions recovered during startup&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-11/msg00092.php &amp;lt;nowiki&amp;gt;&amp;amp;quot;recovering prepared transaction&amp;amp;quot; after server restart message&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider using POSIX shared memory to avoid System V shared memory kernel limits&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4DFA2673.3010009@enterprisedb.com &amp;lt;nowiki&amp;gt;POSIX shared memory patch status&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Address problem where superusers are assumed to be members of all groups&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00337.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Configuration files ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Change pg_ident.conf parsing to be the same as pg_hba.conf&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg02204.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow postgresql.conf file values to be changed via an SQL API, perhaps using SET GLOBAL&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg00764.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider normalizing fractions in postgresql.conf, perhaps using '%'&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg00550.php &amp;lt;nowiki&amp;gt;Fractions in GUC variables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow Kerberos to disable stripping of realms so we can check the username@realm against multiple realms&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00009.php &amp;lt;nowiki&amp;gt;krb_match_realm patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve LDAP authentication configuration options&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01745.php &amp;lt;nowiki&amp;gt;Proposed Patch - LDAPS support for servers on port 636 w/o TLS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add external tool to auto-tune some postgresql.conf parameters&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00000.php &amp;lt;nowiki&amp;gt;Re: Overhauling GUCS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00033.php &amp;lt;nowiki&amp;gt;Simple postgresql.conf wizard&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add 'hostgss' pg_hba.conf option to allow GSS link-level encryption&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg01454.php &amp;lt;nowiki&amp;gt;Re: Plans for 8.4&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Process pg_hba.conf keywords as case-insensitive&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00432.php &amp;lt;nowiki&amp;gt;More robust pg_hba.conf parsing/error logging&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Create utility to compute accurate random_page_cost value&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-04/msg00162.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-04/msg00362.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow configuration files to be independently validated&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php&lt;br /&gt;
* http://archives.postgresql.org/message-id/12666.1310774573@sss.pgh.pa.us&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow postgresql.conf settings to be accepted by backends even if some settings are invalid for those backends&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow all backends to receive postgresql.conf setting changes at the same time&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Tablespaces ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow a database in tablespace t1 with tables created in tablespace t2 to be used as a template for a new database created with default tablespace t2&lt;br /&gt;
|Currently all objects in the default database tablespace must have default tablespace specifications. This is because new databases are created by copying directories. If you mix default tablespace tables and tablespace-specified tables in the same directory, creating a new database from such a mixed directory would create a new database with tables that had incorrect explicit tablespaces.  To fix this would require modifying pg_class in the newly copied database, which we don't currently do.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow reporting of which objects are in which tablespaces&lt;br /&gt;
|This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow WAL replay of CREATE TABLESPACE to work when the directory structure on the recovery computer is different from the original}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow per-tablespace quotas}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow tablespaces on RAM-based partitions for unlogged tables&lt;br /&gt;
* http://archives.postgresql.org/pgsql-advocacy/2011-05/msg00033.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow toast tables to be moved to a different tablespace&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00980.php&lt;br /&gt;
* {{messageLink|CAFEQCbH756DyyAPQ1ykh3+b+kE1-EhWRww1WO_x5v38C-uLnUg@mail.gmail.com|patch : Allow toast tables to be moved to a different tablespace}} (issues remain)&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CAFEQCbEq07OopgE5xFYv2Q3eMq45hRSJkjCBO+kvpJq9NEVhow@mail.gmail.com Allow toast tables to be moved to a different tablespace]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Statistics Collector ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow statistics last vacuum/analyze execution times to be displayed without requiring track_counts to be enabled&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-docs/2007-04/msg00028.php &amp;lt;nowiki&amp;gt;row-level stats and last analyze time&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Clear table counters on TRUNCATE&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php &amp;lt;nowiki&amp;gt;Small TRUNCATE glitch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== SSL ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow SSL authentication/encryption over unix domain sockets&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00924.php &amp;lt;nowiki&amp;gt;Re: Spoofing as the postmaster&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow SSL key file permission checks to be optionally disabled when sharing SSL keys with other applications&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-12/msg00069.php &amp;lt;nowiki&amp;gt;BUG #3809: SSL &amp;amp;quot;unsafe&amp;amp;quot; private key permissions bug&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow SSL CRL files to be re-read during configuration file reload, rather than requiring a server restart&lt;br /&gt;
|Unlike SSL CRT files, CRL (Certificate Revocation List) files are updated frequently&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-12/msg00832.php &amp;lt;nowiki&amp;gt;Automatic CRL reload&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
Alternatively or additionally supporting OCSP (online certificate security protocol) would provide real-time revocation discovery without reloading&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Allow automatic selection of SSL client certificates from a certificate store&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00406.php &amp;lt;nowiki&amp;gt;Allow multiple certificates or keys in the postgresql.crt/.key files&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Send the full certificate server chain to the client&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-12/msg00145.php BUG #5245: Full Server Certificate Chain Not Sent to client]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Point-In-Time Recovery (PITR) ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Create dump tool for write-ahead logs for use in determining transaction id for point-in-time recovery&lt;br /&gt;
|This is useful for checking PITR recovery.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow archive_mode to be changed without server restart?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01655.php &amp;lt;nowiki&amp;gt;Enabling archive_mode without restart&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider avoiding WAL switching via archive_timeout if there has been no database activity&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-01/msg01469.php &amp;lt;nowiki&amp;gt;archive_timeout behavior for no activity&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg00395.php &amp;lt;nowiki&amp;gt;Re: archive_timeout behavior for no activity&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Standby server mode ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Allow pg_xlogfile_name() to be used in recovery mode&lt;br /&gt;
* [http://archives.postgresql.org/message-id/3f0b79eb1001190135vd9f62f1sa7868abc1ea61d12@mail.gmail.com &amp;lt;nowiki&amp;gt;Streaming replication and pg_xlogfile_name()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Prevent variables inherited from the server environment from begin used for making streaming replication connections.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01011.php &amp;lt;nowiki&amp;gt;Re: Parameter name standby_mode&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
| Allow hot file system backups on standby servers&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01727.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Change walsender so that it applies per-role settings&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg00642.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
| Add more control over waiting for synchronous commit&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01611.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Restructure configuration parameters for standby mode&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg01820.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Allow time-delayed application of logs on the standby&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00992.php&lt;br /&gt;
}}&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Add -X parameter to pg_basebackup to specify a different directory for px_xlog, like initdb&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Data Types ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix data types where equality comparison is not intuitive, e.g. box&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-10/msg01643.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for public SYNONYMs&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-03/msg00519.php &amp;lt;nowiki&amp;gt;Proposal for SYNONYMS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg02043.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-general/2010-12/msg00139.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for SQL-standard GENERATED/IDENTITY columns&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-07/msg00543.php &amp;lt;nowiki&amp;gt;Re: Three weeks left until feature freeze&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-08/msg00038.php &amp;lt;nowiki&amp;gt;GENERATED ... AS IDENTITY, Was: Re: Feature Freeze&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00344.php &amp;lt;nowiki&amp;gt;Behavior of GENERATED columns per SQL2003&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00076.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] Behavior of GENERATED columns per SQL2003&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00604.php &amp;lt;nowiki&amp;gt;IDENTITY/GENERATED patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider placing all sequences in a single table, or create a system view&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php &amp;lt;nowiki&amp;gt;Re: newbie: renaming sequences task&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2012-02/msg00258.php Removing special case OID generation]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a special data type for regular expressions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg01067.php &amp;lt;nowiki&amp;gt;Why is there a tsquery data type?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce BIT data type overhead using short varlena headers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-12/msg00273.php &amp;lt;nowiki&amp;gt;storage size of &amp;amp;quot;bit&amp;amp;quot; data type..&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow renaming and deleting enumerated values from an existing enumerated data type&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support scoped IPv6 addresses in the inet type&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-05/msg00111.php &amp;lt;nowiki&amp;gt;strange problem with ip6&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Add a JSON (JavaScript Object Notation) data type&lt;br /&gt;
|This would behave similar to the XML data type, which is stored as text, but allows element lookup and conversion functions.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg01494.php &amp;lt;nowiki&amp;gt;PATCH: Add hstore_to_json()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-01/msg00001.php &amp;lt;nowiki&amp;gt;Re: PATCH: Add hstore_to_json()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-03/msg01092.php &amp;lt;nowiki&amp;gt;Proposal: Add JSON support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00057.php &amp;lt;nowiki&amp;gt;Re: Proposal: Add JSON support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00481.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01694.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-12/msg00219.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Considering improving performance of computing CHAR() value lengths&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg00900.php &amp;lt;nowiki&amp;gt;char() overhead on read-only workloads not so insignifcant as the docs claim it is...&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01787.php &amp;lt;nowiki&amp;gt;Re: [PATCH] backend: compare word-at-a-time in bcTruelen&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add overlaps geometric operators that ignore point overlaps&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-03/msg00861.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Add IMMUTABLE column attribute&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg00623.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Domains ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow functions defined as casts to domains to be called during casting&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-05/msg00072.php &amp;lt;nowiki&amp;gt;bug? non working casts for domain&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-09/msg01681.php &amp;lt;nowiki&amp;gt;TODO: Fix CREATE CAST on DOMAINs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow values to be cast to domain types&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2003-06/msg01206.php &amp;lt;nowiki&amp;gt;Domain casting still doesn't work right&amp;lt;/nowiki&amp;gt;] &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00289.php &amp;lt;nowiki&amp;gt;domain casting?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00812.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make domains work better with polymorphic functions&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4887.1228700773@sss.pgh.pa.us Polymorphic types vs. domains]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/15535.1238774571@sss.pgh.pa.us some difficulties with fixing it]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Dates and Times ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow infinite intervals just like infinite timestamps&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg00076.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Determine how to represent date/time field extraction on infinite timestamps&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CA+mi_8bda-Fnev9iXeUbnqhVaCWzbYhHkWoxPQfBca9eDPpRMw@mail.gmail.com extract(epoch from infinity) is not 0]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CADAkt-icuESH16uLOCXbR-dKpcvwtUJE4JWXnkdAjAAwP6j12g@mail.gmail.com converting between infinity timestamp and float8]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow TIMESTAMP WITH TIME ZONE to store the original timezone information, either zone name or offset from UTC&lt;br /&gt;
|If the TIMESTAMP value is stored with a time zone name, interval computations should adjust based on the time zone rules. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-10/msg00705.php &amp;lt;nowiki&amp;gt;timestamp with time zone a la sql99&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have timestamp subtraction not call justify_hours()?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-sql/2006-10/msg00059.php &amp;lt;nowiki&amp;gt;timestamp subtraction (was Re: formatting intervals with to_char)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve TIMESTAMP WITH TIME ZONE subtraction to be DST-aware&lt;br /&gt;
|Currently subtracting one date from another that crosses a daylight savings time adjustment can return '1 day 1 hour', but adding that back to the first date returns a time one hour in the future.  This is caused by the adjustment of '25 hours' to '1 day 1 hour', and '1 day' is the same time the next day, even if daylight savings adjustments are involved.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix interval display to support values exceeding 2^31 hours}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add overflow checking to timestamp and interval arithmetic}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add function to allow the creation of timestamps using parameters&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-06/msg00232.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Arrays ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for arrays of domains&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00114.php &amp;lt;nowiki&amp;gt;Re: updated WIP: arrays of composites&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow single-byte header storage for array elements}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add function to detect if an array is empty&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00475.php &amp;lt;nowiki&amp;gt;Re: array_length()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of empty arrays&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01033.php &amp;lt;nowiki&amp;gt;So what's an &amp;amp;quot;empty&amp;amp;quot; array anyway?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of NULLs in arrays&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-11/msg00009.php &amp;lt;nowiki&amp;gt;BUG #4509: array_cat's null behaviour is inconsistent&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg01040.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Binary Data ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve vacuum of large objects, like contrib/vacuumlo?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Auto-delete large objects when referencing row is deleted&lt;br /&gt;
|contrib/lo offers this functionality.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow read/write into TOAST values like large objects&lt;br /&gt;
|Writing might require the TOAST column to be stored EXTERNAL.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00049.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add API for 64-bit large object access&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-09/msg00781.php &amp;lt;nowiki&amp;gt;64-bit API for large objects&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg01790.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== MONEY Data Type ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add locale-aware MONEY type, and support multiple currencies&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2005-08/msg01432.php &amp;lt;nowiki&amp;gt;A real currency type&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01181.php &amp;lt;nowiki&amp;gt;Money type todos?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|MONEY dumps in a locale-specific format making it difficult to restore to a system with a different locale}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Text Search ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow dictionaries to change the token that is passed on to later dictionaries&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-11/msg00081.php &amp;lt;nowiki&amp;gt;a tsearch2 (8.2.4) dictionary that only filters out stopwords&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a function-based API for '@@' searches&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00511.php &amp;lt;nowiki&amp;gt;Simplifying Text Search&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve text search error messages&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00966.php &amp;lt;nowiki&amp;gt;Poorly designed tsearch NOTICEs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg01146.php &amp;lt;nowiki&amp;gt;Re: Poorly designed tsearch NOTICEs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider changing error to warning for strings larger than one megabyte&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-02/msg00190.php &amp;lt;nowiki&amp;gt;BUG #3975: tsearch2 index should not bomb out of 1Mb limit&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-03/msg00062.php &amp;lt;nowiki&amp;gt;Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|tsearch and tsdicts regression tests fail in Turkish locale on glibc&lt;br /&gt;
* [http://archives.postgresql.org/message-id/49749645.5070801@gmx.net tsearch with Turkish locale]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|tsquery negator operator treated as part of lexeme&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-06/msg00346.php BUG #4887: inclusion operator (@&amp;gt;) on tsqeries behaves not conforming to documentation]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of dash and plus signs in email address user names, and perhaps improve URL parsing&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg00772.php&lt;br /&gt;
* [http://archives.postgresql.org/message-id/E1Ri8il-0008Ct-9p@wrigleys.postgresql.org tsearch does not recognize all valid emails]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve default parser, to more easily allow adding new tokens&lt;br /&gt;
* http://archives.postgresql.org/message-id/23485.1297727826@sss.pgh.pa.us&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add additional support functions&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00319.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== XML ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow XML arrays to be cast to other data types&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00981.php &amp;lt;nowiki&amp;gt;proposal casting from XML[] to int[], numeric[], text[]&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00231.php &amp;lt;nowiki&amp;gt;Re: proposal casting from XML[] to int[], numeric[], text[]&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00471.php &amp;lt;nowiki&amp;gt;Re: proposal casting from XML[] to int[], numeric[], text[]&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add XML Schema validation and xmlvalidate functions (SQL:2008)}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add xmlvalidatedtd variant to support validating against a DTD?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Relax-NG validation; libxml2 supports this already}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow reliable XML operation non-UTF8 server encodings (xpath(), in particular, is known to not work)&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-01/msg00135.php &amp;lt;nowiki&amp;gt;BUG #4622: xpath only work in utf-8 server encoding&amp;lt;/nowiki&amp;gt;] &lt;br /&gt;
* http://archives.postgresql.org/message-id/4110.1238973350@sss.pgh.pa.us}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add functions from SQL:2006: XMLDOCUMENT, XMLCAST, XMLTEXT}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add XMLNAMESPACES support in XMLELEMENT and elsewhere}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move XSLT from contrib/xml2 to a more reasonable location&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg00539.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Report errors returned by the XSLT library&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg00562.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve the XSLT parameter passing API&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg00416.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|XML Canonical: Convert XML documents to canonical form to compare them. libxml2 has support for this.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add pretty-printed XML output option&lt;br /&gt;
|Parse a document and serialize it back in some indented form. libxml2 might support this.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add XMLQUERY (from the SQL/XML standard)}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow XML sthredding&lt;br /&gt;
|In some cases shredding could be better option (if there is no need to keep XML docs entirely, e.g. if we have already developed tools that understand only relational data.  This would be a separate module that implements annotated schema decomposition technique, similar to DB2 and SQL Server functionality.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix Nested or repeated xpath() that apparently mess up namespaces [http://archives.postgresql.org/pgsql-bugs/2008-03/msg00097.php] [http://archives.postgresql.org/pgsql-bugs/2008-03/msg00144.php] [http://archives.postgresql.org/pgsql-general/2008-03/msg00295.php] [http://archives.postgresql.org/pgsql-bugs/2008-07/msg00054.php] [http://archives.postgresql.org/message-id/004f01c90e91$138e9d10$3aabd730$@anstett@iaas.uni-stuttgart.de]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|XPath: Adding the &amp;lt;x&amp;gt; at the root causes problems [http://archives.postgresql.org/pgsql-bugs/2008-05/msg00184.php] [http://archives.postgresql.org/pgsql-bugs/2008-07/msg00054.php] [http://archives.postgresql.org/pgsql-general/2008-07/msg00613.php]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|xpath_table needs to be implemented/implementable to get rid of contrib/xml2 [http://archives.postgresql.org/pgsql-general/2008-05/msg00823.php]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|xpath_table is pretty broken anyway [http://archives.postgresql.org/pgsql-hackers/2010-02/msg02424.php]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|better handling of XPath data types [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php] [http://archives.postgresql.org/message-id/004a01c90e90$4b986d90$e2c948b0$@anstett@iaas.uni-stuttgart.de]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of PIs and DTDs in xmlconcat() [http://archives.postgresql.org/message-id/200904211211.n3LCB09p008988@wwwmaster.postgresql.org]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Restructure XML and /contrib/xml2 functionality&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02314.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00017.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Functions ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow INET subnet comparisons using non-constants to be indexed}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add an INET overlaps operator, for use by exclusion constraints &lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-03/msg00845.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Enforce typmod for function inputs, function results and parameters for spi_prepare'd statements called from PLs&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-01/msg01403.php &amp;lt;nowiki&amp;gt;Re: BUG #2917: spi_prepare doesn't accept typename aliases&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-11/msg01160.php &amp;lt;nowiki&amp;gt;RFC for adding typmods to functions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix IS OF so it matches the ISO specification, and add documentation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2003-08/msg00060.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] IS OF&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00060.php &amp;lt;nowiki&amp;gt;ToDo: add documentation for operator IS OF&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement Boyer-Moore searching in LIKE queries&lt;br /&gt;
* {{messageLink|27645.1220635769@sss.pgh.pa.us|TODO item: Implement Boyer-Moore searching (First time hacker)}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent malicious functions from being executed with the permissions of unsuspecting users&lt;br /&gt;
|Index functions are safe, so VACUUM and ANALYZE are safe too.  Triggers, CHECK and DEFAULT expressions, and rules are still vulnerable. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00268.php &amp;lt;nowiki&amp;gt;Some notes about the index-functions security vulnerability&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce memory usage of aggregates in set returning functions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2008-01/msg00031.php &amp;lt;nowiki&amp;gt;Re: Performance of aggregates over set-returning functions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix /contrib/ltree operator&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-11/msg00044.php &amp;lt;nowiki&amp;gt;BUG #3720: wrong results at using ltree&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix /contrib/btree_gist's implementation of inet indexing&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2010-10/msg00099.php &amp;lt;nowiki&amp;gt;BUG #5705: btree_gist: Index on inet changes query result&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;lt;nowiki&amp;gt;Fix inconsistent precedence of =, &amp;amp;gt;, and &amp;amp;lt; compared to &amp;amp;lt;&amp;amp;gt;, &amp;amp;gt;=, and &amp;amp;lt;=&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-12/msg00145.php &amp;lt;nowiki&amp;gt;BUG #3822: Nonstandard precedence for comparison operators&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix regular expression bug when using complex back-references&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-10/msg00000.php &amp;lt;nowiki&amp;gt;BUG #3645: regular expression back references seem broken&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have /contrib/dblink reuse unnamed connections&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00895.php &amp;lt;nowiki&amp;gt;dblink un-named connection doesn't get re-used&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve formatting of pg_get_viewdef() output&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg01648.php &amp;lt;nowiki&amp;gt;pg_get_viewdef formattiing&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01885.php &amp;lt;nowiki&amp;gt;Re: pretty print viewdefs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-12/msg00906.php reprise: pretty print viewdefs]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add function to dump pg_depend information cleanly&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00226.php &amp;lt;nowiki&amp;gt;Elementary dependency look-up&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation&lt;br /&gt;
* [http://archives.postgresql.org/message-id/28488.1286461610@sss.pgh.pa.us pg_relation_size / could not open relation with OID #]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Character Formatting ===&lt;br /&gt;
&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow to_date() and to_timestamp() to accept localized month names}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add missing parameter handling in to_char()&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-12/msg00948.php &amp;lt;nowiki&amp;gt;Re: to_char and i18n&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Throw an error from to_char() instead of printing a string of &amp;quot;#&amp;quot; when a number doesn't fit in the desired output format.&lt;br /&gt;
* discussed in [http://archives.postgresql.org/message-id/37ed240d0907290836w42187222n18664dfcbcb445b1@mail.gmail.com &amp;quot;to_char, support for EEEE format&amp;quot;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow to_char() on interval values to accumulate the highest unit requested&lt;br /&gt;
|2= Some special format flag would be required to request such accumulation.  Such functionality could also be added to EXTRACT. Prevent accumulation that crosses the month/day boundary because of the uneven number of days in a month.&lt;br /&gt;
* to_char(INTERVAL '1 hour 5 minutes', 'MI') =&amp;amp;gt; 65&lt;br /&gt;
* to_char(INTERVAL '43 hours 20 minutes', 'MI' ) =&amp;amp;gt; 2600&lt;br /&gt;
* to_char(INTERVAL '43 hours 20 minutes', 'WK:DD:HR:MI') =&amp;amp;gt; 0:1:19:20&lt;br /&gt;
* to_char(INTERVAL '3 years 5 months','MM') =&amp;amp;gt; 41&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix to_number() handling for values not matching the format string&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg01447.php &amp;lt;nowiki&amp;gt;Re: numeric_to_number() function skipping some digits&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Multi-Language Support ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add NCHAR (as distinguished from ordinary varchar),}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a cares-about-collation column to pg_proc, so that unresolved-collation errors can be thrown at parse time&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-03/msg01520.php &amp;lt;nowiki&amp;gt;Open issues for collations&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Integrate collations with text search configurations&lt;br /&gt;
* [http://archives.postgresql.org/message-id/28887.1303579034@sss.pgh.pa.us &amp;lt;nowiki&amp;gt;Some TODO items for collations&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Integrate collations with to_char() and related functions&lt;br /&gt;
* [http://archives.postgresql.org/message-id/28887.1303579034@sss.pgh.pa.us &amp;lt;nowiki&amp;gt;Some TODO items for collations&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support collation-sensitive equality and hashing functions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-06/msg00472.php &amp;lt;nowiki&amp;gt; contrib/citext versus collations&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a LOCALE option to CREATE DATABASE, as a shorthand&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00119.php &amp;lt;nowiki&amp;gt; Re: 8.4 open items list&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support multiple simultaneous character sets, per SQL:2008}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve UTF8 combined character handling?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add octet_length_server() and octet_length_client()}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make octet_length_client() the same as octet_length()?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problems with wrong runtime encoding conversion for NLS message files}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add URL to more complete multi-byte regression tests&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-07/msg00272.php &amp;lt;nowiki&amp;gt;Multi-byte and client side character encoding tests for copy command..&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix contrib/fuzzystrmatch to work with multibyte encodings&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-04/msg00047.php &amp;lt;nowiki&amp;gt; soundex function returns UTF-16 characters&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00138.php &amp;lt;nowiki&amp;gt; dmetaphone woes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change memory allocation for multi-byte functions so memory is allocated inside conversion functions&lt;br /&gt;
|Currently we preallocate memory based on worst-case usage.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ability to use case-insensitive regular expressions on multi-byte characters&lt;br /&gt;
|Currently it works for UTF-8, but not other multi-byte encodings&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00433.php &amp;lt;nowiki&amp;gt;Regexps vs. locale&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* {{MessageLink|20091201210024.B1393753FB7@cvs.postgresql.org|A partial solution for UTF-8}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve encoding of connection startup messages sent to the client&lt;br /&gt;
|Currently some authentication error messages are sent in the server encoding&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-12/msg00801.php &amp;lt;nowiki&amp;gt;encoding of PostgreSQL messages&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2009-01/msg00005.php &amp;lt;nowiki&amp;gt;Re: encoding of PostgreSQL messages&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|More sensible support for Unicode combining characters, normal forms&lt;br /&gt;
* http://archives.postgresql.org/message-id/200904141532.44618.peter_e@gmx.net&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Views / Rules ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Automatically create rules on views so they are updateable, per SQL:2008&lt;br /&gt;
|We can only auto-create rules for simple views.  For more complex cases users will still have to write rules manually.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-03/msg00586.php &amp;lt;nowiki&amp;gt;Proposal for updatable views&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-08/msg00255.php &amp;lt;nowiki&amp;gt;Updatable views&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php &amp;lt;nowiki&amp;gt;Re: [COMMITTERS] pgsql: Automatic view update rules Bernd Helmle&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://wiki.postgresql.org/wiki/Updatable_views&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add the functionality of the WITH CHECK OPTION clause to CREATE VIEW}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow VIEW/RULE recompilation when the underlying tables change&lt;br /&gt;
|This is both difficult and controversial.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg01723.php Re: About &amp;quot;Allow VIEW/RULE recompilation when the underlying tables change&amp;quot;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg01724.php Re: About &amp;quot;Allow VIEW/RULE recompilation when the underlying tables change&amp;quot;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CACk=U9NFSzWrEba8G5dZ=TZLy3_hx3QXGyCcKVWT=4iA1FjMuA@mail.gmail.com VIEW still referring to old name of field]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make it possible to use RETURNING together with conditional DO INSTEAD rules, such as for partitioning setups&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00577.php &amp;lt;nowiki&amp;gt;RETURNING and DO INSTEAD ... Intentional or not?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add the ability to automatically create materialized views&lt;br /&gt;
|Right now materialized views require the user to create triggers on the main table to keep the summary table current.  SQL syntax should be able to manage the triggers and summary table automatically.  A more sophisticated implementation would automatically retrieve from the summary table when the main table is referenced, if possible.  See [[Materialized Views]] for implementation details&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00479.php &amp;lt;nowiki&amp;gt;GSoC - proposal - Materialized Views in PostgreSQL&amp;lt;/nowiki&amp;gt;] &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve ability to modify views via ALTER TABLE&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00691.php &amp;lt;nowiki&amp;gt;Re: idea: storing view source in system catalogs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg01410.php &amp;lt;nowiki&amp;gt;modifying views&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-08/msg00300.php &amp;lt;nowiki&amp;gt;Re: patch: Add columns via CREATE OR REPLACE VIEW&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Prevent low-cost functions from seeing unauthorized view rows&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-10/msg01346.php &amp;lt;nowiki&amp;gt;Using views for row-level access control is leaky&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SQL Commands ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT&lt;br /&gt;
* [http://dissipatedheat.com/2011/11/10/how-not-to-write-a-patch-for-postgresql/ How not to write this patch.]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve type determination of unknown (NULL or quoted literal) result columns for UNION/INTERSECT/EXCEPT&lt;br /&gt;
* [http://archives.postgresql.org/message-id/9799.1302719551@sss.pgh.pa.us &amp;lt;nowiki&amp;gt;UNION construct type cast gives poor error message&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ROLLUP, CUBE, GROUPING SETS options to GROUP BY&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00838.php &amp;lt;nowiki&amp;gt;WIP: grouping sets support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00466.php &amp;lt;nowiki&amp;gt;Implementation of GROUPING SETS (T431: Extended grouping 	capabilities)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow prepared transactions with temporary tables created and dropped in the same transaction, and when an ON COMMIT DELETE ROWS temporary table is accessed&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00047.php &amp;lt;nowiki&amp;gt;Re: &amp;amp;quot;could not open relation 1663/16384/16584: No such file or directory&amp;amp;quot; in a specific combination of transactions with temp tables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/492543D5.9050904@enterprisedb.com A suggestion on how to implement this]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a GUC variable to warn about non-standard SQL usage in queries}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add SQL-standard MERGE/REPLACE/UPSERT command&lt;br /&gt;
|MERGE is typically used to merge two tables.  REPLACE or UPSERT command does UPDATE, or on failure, INSERT. See [[SQL MERGE]] for notes on the implementation details.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add NOVICE output level for helpful messages&lt;br /&gt;
|For example, have it warn about unjoined tables.  This could also control automatic sequence/index creation messages.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow NOTIFY in rules involving conditionals}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow EXPLAIN to identify tables that were skipped because of constraint_exclusion&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Simplify dropping roles that have objects in several databases}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow the count returned by SELECT, etc to be represented as an int64 to allow a higher range of values}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for WITH RECURSIVE ... CYCLE&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00291.php &amp;lt;nowiki&amp;gt;WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows&amp;lt;/nowiki&amp;gt;]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add DEFAULT .. AS OWNER so permission checks are done as the table owner&lt;br /&gt;
|This would be useful for SERIAL nextval() calls and CHECK constraints.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow DISTINCT to work in multiple-argument aggregate calls}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add comments on system tables/columns using the information in catalogs.sgml&lt;br /&gt;
|Ideally the information would be pulled from the SGML file automatically.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent the specification of conflicting transaction read/write options&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00684.php &amp;lt;nowiki&amp;gt;Re: SET TRANSACTION and SQL Standard&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support LATERAL subqueries&lt;br /&gt;
|Lateral subqueries can reference columns of tables defined outside the subquery at the same level, i.e. ''laterally''.&lt;br /&gt;
For example, a LATERAL subquery in a FROM clause could reference tables defined in the same FROM clause.&lt;br /&gt;
Currently only the columns of tables defined ''above'' subqueries are recognized.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00292.php &amp;lt;nowiki&amp;gt;LATERAL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-10/msg00991.php &amp;lt;nowiki&amp;gt;Re: LATERAL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4F5AA202.9020906@gmail.com lateral function as a subquery - WIP patch]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent temporary tables created with ON COMMIT DELETE ROWS from repeatedly truncating the table on every commit if the table is already empty&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00842.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-03/msg00392.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-04/msg00046.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow DELETE and UPDATE to be used with LIMIT and ORDER BY&lt;br /&gt;
* http://archives.postgresql.org/pgadmin-hackers/2010-04/msg00078.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg01997.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00021.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Improve caching of prepared query plans&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow PREPARE of cursors}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have DISCARD PLANS discard plans cached by functions&lt;br /&gt;
|DISCARD all should do the same.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00431.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid multiple-evaluation of BETWEEN and IN arguments containing volatile expressions&lt;br /&gt;
* http://archives.postgresql.org/message-id/4D95B605.2020709@enterprisedb.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix nested CASE-WHEN constructs&lt;br /&gt;
* http://archives.postgresql.org/message-id/4DDCEEB8.50602@enterprisedb.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== CREATE ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow CREATE TABLE AS to determine column lengths for complex expressions like SELECT col1 || col2}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have WITH CONSTRAINTS also create constraint indexes&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-04/msg00149.php &amp;lt;nowiki&amp;gt;Re: CREATE TABLE LIKE INCLUDING INDEXES support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move NOT NULL constraint information to pg_constraint&lt;br /&gt;
|Currently NOT NULL constraints are stored in pg_attribute without any designation of their origins, e.g. primary keys.  One manifest problem is that dropping a PRIMARY KEY constraint does not remove the NOT NULL constraint designation.  Another issue is that we should probably force NOT NULL to be propagated from parent tables to children, just as CHECK constraints are.  (But then does dropping PRIMARY KEY affect children?)&lt;br /&gt;
* http://archives.postgresql.org/message-id/19768.1238680878@sss.pgh.pa.us&lt;br /&gt;
* http://archives.postgresql.org/message-id/200909181005.n8IA5Ris061239@wwwmaster.postgresql.org&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-07/msg01223.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent concurrent CREATE TABLE from sometimes returning a cryptic error message&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-10/msg00169.php &amp;lt;nowiki&amp;gt;BUG #3692: Conflicting create table statements throw unexpected error&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add CREATE SCHEMA ... LIKE that copies a schema}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix CREATE OR REPLACE FUNCTION to not leave objects depending on the function in inconsistent state&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-08/msg00985.php indexes on functions and create or replace function]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow temporary tables to exist as empty by default in all sessions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00006.php &amp;lt;nowiki&amp;gt;what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg01329.php &amp;lt;nowiki&amp;gt;idea: global temp tables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org//pgsql-hackers/2009-05/msg00016.php &amp;lt;nowiki&amp;gt;Re: idea: global temp tables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg01098.php &amp;lt;nowiki&amp;gt;global temporary tables&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow the creation of &amp;quot;distinct&amp;quot; types&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01647.php &amp;lt;nowiki&amp;gt;Distinct types&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider analyzing temporary tables when they are first used in a query&lt;br /&gt;
|Autovacuum cannot analyze or vacuum temporary tables.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-04/msg00416.php &amp;lt;nowiki&amp;gt;autovacuum and temp tables support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow an unlogged table to be changed to logged&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00437.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00323.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00237.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== UPDATE ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;lt;nowiki&amp;gt;Allow UPDATE tab SET ROW (col, ...) = (SELECT...)&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-07/msg01308.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] extension for sql update&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00865.php &amp;lt;nowiki&amp;gt;UPDATE using sub selects&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-04/msg00315.php &amp;lt;nowiki&amp;gt;UPDATE using sub selects&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-03/msg00237.php &amp;lt;nowiki&amp;gt;Re: UPDATE using sub selects&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Research self-referential UPDATEs that see inconsistent row versions in read-committed mode&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00507.php &amp;lt;nowiki&amp;gt;Concurrently updating an updatable view&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg00016.php &amp;lt;nowiki&amp;gt;Re: Do we need a TODO? (was Re: Concurrently updating anupdatable view)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve performance of EvalPlanQual mechanism that rechecks already-updated rows&lt;br /&gt;
|This is related to the previous item, which questions whether it even has the right semantics&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-09/msg00045.php &amp;lt;nowiki&amp;gt;BUG #4401: concurrent updates to a table blocks one update indefinitely&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-07/msg00302.php &amp;lt;nowiki&amp;gt;BUG #4945: Parallel update(s) gone wild&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== ALTER ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have ALTER TABLE RENAME of a SERIAL column rename the sequence&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php &amp;lt;nowiki&amp;gt;Re: newbie: renaming sequences task&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CADLWmXUV4LbLhMZL8rYMhCy72aZZLB5BSARPQVgoX0BrxA0FFg@mail.gmail.com renaming implicit sequences]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have ALTER SEQUENCE RENAME rename the sequence name stored in the sequence table&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-09/msg00092.php &amp;lt;nowiki&amp;gt;BUG #3619: Renaming sequence does not update its 'sequence_name' field&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-10/msg00007.php &amp;lt;nowiki&amp;gt;Re: BUG #3619: Renaming sequence does not update its 'sequence_name' field&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php &amp;lt;nowiki&amp;gt;Re: newbie: renaming sequences task&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow ALTER TABLE ... ALTER CONSTRAINT ... RENAME or ALTER TABLE RENAME CONSTRAINT&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-02/msg00168.php &amp;lt;nowiki&amp;gt;ALTER CONSTRAINT RENAME patch reverted&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ALTER DOMAIN to modify the underlying data type}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow ALTER TABLESPACE to move the tablespace to different directories}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow moving system tables to other tablespaces, where possible&lt;br /&gt;
|Currently non-global system tables must be in the default database tablespace. Global system tables can never be moved.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have ALTER INDEX update the name of a constraint using that index}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow column display reordering by recording a display, storage, and permanent id for every column?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00782.php &amp;lt;nowiki&amp;gt;Re: column ordering, was Re: [PATCHES] Enums patch v2&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg01029.php &amp;lt;nowiki&amp;gt;Column reordering in pg_dump&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/message-id/1324412114-sup-9608@alvh.no-ip.org&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow deactivating (and reactivating) indexes via ALTER TABLE&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg01191.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ALTER OPERATOR ... RENAME&lt;br /&gt;
|needs to consider effects of changing operator precedence&lt;br /&gt;
* [http://archives.postgresql.org/message-id/1322948781.26266.9.camel@vanquo.pezone.net Missing rename support]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ALTER TABLE ... RENAME RULE&lt;br /&gt;
* [http://archives.postgresql.org/message-id/1322948781.26266.9.camel@vanquo.pezone.net Missing rename support]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== CLUSTER ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Automatically maintain clustering on a table&lt;br /&gt;
|This might require some background daemon to maintain clustering during periods of low usage. It might also require tables to be only partially filled for easier reorganization.  Another idea would be to create a merged heap/index data file so an index lookup would automatically access the heap data too.  A third idea would be to store heap rows in hashed groups, perhaps using a user-supplied hash function.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2004-08/msg00350.php &amp;lt;nowiki&amp;gt;Equivalent praxis to CLUSTERED INDEX?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00155.php &amp;lt;nowiki&amp;gt;Re: Grouped Index Tuples&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://community.enterprisedb.com/git/&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2009-10/msg00346.php &amp;lt;nowiki&amp;gt;Re: maintain_cluster_order_v5.patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== COPY ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY to report error lines and continue&lt;br /&gt;
|This requires the use of a savepoint before each COPY line is processed, with ROLLBACK on COPY failure. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00572.php &amp;lt;nowiki&amp;gt;Re: VLDB Features&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY FROM to create index entries in bulk&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00811.php &amp;lt;nowiki&amp;gt;Batch update of indexes on data loading&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY in CSV mode to control whether a quoted zero-length string is treated as NULL&lt;br /&gt;
|Currently this is always treated as a zero-length string, which generates an error when loading into an integer column &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] allow CSV quote in NULL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve COPY performance&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00954.php &amp;lt;nowiki&amp;gt;Re: 8.3 / 8.2.6 restore comparison&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01882.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY to report errors sooner&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01169.php &amp;lt;nowiki&amp;gt;Timely reporting of COPY errors&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow COPY to handle other number formats&lt;br /&gt;
|E.g. the German notation. Best would be something like WITH DECIMAL ','.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow a stalled COPY to exit if the backend is terminated&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-04/msg00067.php &amp;lt;nowiki&amp;gt;Re: possible bug not in open items&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== GRANT/REVOKE ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow SERIAL sequences to inherit permissions from the base table?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow dropping of a role that has connection rights&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00736.php &amp;lt;nowiki&amp;gt;DROP ROLE dependency tracking ...&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== DECLARE CURSOR ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent DROP TABLE from dropping a table referenced by its own open cursor?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide some guarantees about the behavior of cursors that invoke volatile functions&lt;br /&gt;
* [http://archives.postgresql.org/message-id/20997.1244563664@sss.pgh.pa.us Re: Cursor with hold emits the same row more than once across commits in 8.3.7]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== INSERT ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow INSERT/UPDATE of the system-generated oid value for a row}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|In rules, allow VALUES() to contain a mixture of 'old' and 'new' references}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== SHOW/SET ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM ANALYZE, and CLUSTER}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Rationalize the discrepancy between settings that use values in bytes and SHOW that returns the object count&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-docs/2008-07/msg00007.php &amp;lt;nowiki&amp;gt;Re: [ADMIN] shared_buffers and shmmax&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== ANALYZE ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified percentage}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have EXPLAIN ANALYZE report rows as floating-point numbers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg01363.php &amp;lt;nowiki&amp;gt;explain analyze rows=%.0f&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg00108.php &amp;lt;nowiki&amp;gt;Re: explain analyze rows=%.0f&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve how ANALYZE computes in-doubt tuples&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00771.php &amp;lt;nowiki&amp;gt;VACUUM/ANALYZE counting of in-doubt tuples&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Window Functions ===&lt;br /&gt;
See {{messageLink|357.1230492361@sss.pgh.pa.us|TODO items for window functions}}.&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support creation of user-defined window functions&lt;br /&gt;
|We have the ability to create new window functions written in C.  Is it&lt;br /&gt;
worth the effort to create an API that would let them be written in PL/pgsql, etc?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement full support for window framing clauses&lt;br /&gt;
|In addition to done clauses described in the [http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS latest doc], these clauses are not implemented yet.&lt;br /&gt;
* RANGE BETWEEN ... PRECEDING/FOLLOWING&lt;br /&gt;
* EXCLUDE&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Investigate tuplestore performance issues&lt;br /&gt;
|The tuplestore_in_memory() thing is just a band-aid, we ought to try to solve it properly.  tuplestore_advance seems like a weak spot as well.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00152.php &amp;lt;nowiki&amp;gt;tuplestore potential performance problem&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem|Do we really need so much duplicated code between Agg and WindowAgg?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Teach planner to evaluate multiple windows in the optimal order&lt;br /&gt;
|Currently windows are always evaluated in the query-specified order.&lt;br /&gt;
* http://archives.postgresql.org/message-id/3CDAD71E9D70417290FCF66F0178D1E1@amd64&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement DISTINCT clause in window aggregates&lt;br /&gt;
|Some proprietary RDBMSs have implemented it already, so it helps with porting from those.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Integrity Constraints ==&lt;br /&gt;
=== Keys ===&lt;br /&gt;
&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve deferrable unique constraints for cases with many conflicts&lt;br /&gt;
|The current implementation fires a trigger for each potentially conflicting row.  This might not scale well for an update that changes many key values at once.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Referential Integrity ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add MATCH PARTIAL referential integrity}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change foreign key constraint for array -&amp;amp;gt; element to mean element in array?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-10/msg01814.php &amp;lt;nowiki&amp;gt;foreign keys for array/period contains relationships&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problem when cascading referential triggers make changes on cascaded tables, seeing the tables in an intermediate state&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-09/msg00174.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] Work-in-progress referential action trigger timing&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Optimize referential integrity checks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php &amp;lt;nowiki&amp;gt;Re: Effects of cascading references in foreign keys&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-04/msg00744.php &amp;lt;nowiki&amp;gt;Can't ri_KeysEqual() consider two nulls as equal?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Check Constraints ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Run check constraints only when affected columns are changed&lt;br /&gt;
* http://archives.postgresql.org/message-id/1326055327.15293.13.camel@vanquo.pezone.net&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Server-Side Languages ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for polymorphic arguments and return types to languages other than PL/PgSQL}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for OUT and INOUT parameters to languages other than PL/PgSQL}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add more fine-grained specification of functions taking arbitrary data types&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00367.php &amp;lt;nowiki&amp;gt;RfD: more powerful &amp;amp;quot;any&amp;amp;quot; types&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement stored procedures&lt;br /&gt;
|This might involve the control of transaction state and the return of multiple result sets&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-10/msg00454.php &amp;lt;nowiki&amp;gt;PL/pgSQL stored procedure returning multiple result sets (SELECTs)?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg01375.php &amp;lt;nowiki&amp;gt;Proposal: real procedures again (8.4)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg00542.php&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-04/msg01149.php &amp;lt;nowiki&amp;gt;Gathering specs and discussion on feature (post 9.1)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow holdable cursors in SPI}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Add SPI_gettypmod() to return a field's typemod from a TupleDesc&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2005-11/msg00250.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== SQL-Language Functions ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow SQL-language functions to reference parameters by parameter name&lt;br /&gt;
|Currently SQL-language functions can only refer to dollar parameters, e.g. $1&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01479.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01519.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00221.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Rethink query plan caching and timing of parse analysis within SQL-language functions&lt;br /&gt;
|They should work more like plpgsql functions do ...&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2011-05/msg00078.php &amp;lt;nowiki&amp;gt;Re: BUG #6019: invalid cached plan on inherited table&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== PL/pgSQL ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;lt;nowiki&amp;gt;Allow listing of record column names, and access to record columns via variables, e.g. columns := r.(*), tval2 := r.(colname)&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2005-07/msg00458.php &amp;lt;nowiki&amp;gt;Re: PL/PGSQL: Dynamic Record Introspection&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-05/msg00302.php &amp;lt;nowiki&amp;gt;Re: PL/PGSQL: Dynamic Record Introspection&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-06/msg00031.php &amp;lt;nowiki&amp;gt;Re: PL/PGSQL: Dynamic Record Introspection&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow row and record variables to be set to NULL constants, and allow NULL tests on such variables&lt;br /&gt;
|Because a row is not scalar, do not allow assignment from NULL-valued scalars.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-10/msg00070.php &amp;lt;nowiki&amp;gt;NULL and plpgsql rows&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider keeping separate cached copies when search_path changes&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg01009.php &amp;lt;nowiki&amp;gt;pl/pgsql Plan Invalidation and search_path&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve handling of NULL row values vs. NULL rows&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-09/msg01758.php &amp;lt;nowiki&amp;gt;Null row vs. row of nulls in plpgsql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg01973.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve PERFORM handling of WITH queries or document limitation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-03/msg00309.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== PL/Perl ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow regex operations in plperl using UTF8 characters in non-UTF8 encoded databases}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== PL/Python ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Develop a trusted variant of PL/Python.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Create a new restricted execution class that will allow passing function arguments in as locals.  Passing them as globals means functions cannot be called recursively.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-02/msg01468.php &amp;lt;nowiki&amp;gt;Re: pl/python do not delete function arguments&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a DB-API compliant interface on top of the SPI interface&lt;br /&gt;
* http://petereisentraut.blogspot.com/2011/11/plpydbapi-db-api-for-plpython.html&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|For functions returning a setof record with a composite type, cache the I/O functions for the composite type&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg02007.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix loss of information during conversion of numeric type to Python float}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== PL/Tcl ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add table function support}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Check encoding validity of values passed back to Postgres in function returns, trigger tuple changes, and SPI calls.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Clients ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a function like pg_get_indexdef() that report more detailed index information&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2007-12/msg00166.php &amp;lt;nowiki&amp;gt;BUG #3829: Wrong index reporting from pgAdmin III (v1.8.0 rev 6766-6767)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Split out pg_resetxlog output into pre- and post-sections&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg02040.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== pg_ctl ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow pg_ctl to work properly with configuration files located outside the PGDATA directory&lt;br /&gt;
|pg_ctl can not read the pid file because it isn't located in the config directory but in the PGDATA directory.  The solution is to allow pg_ctl to read and understand postgresql.conf to find the data_directory value.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2009-10/msg00024.php &amp;lt;nowiki&amp;gt;BUG #5103: &amp;amp;quot;pg_ctl -w (re)start&amp;amp;quot; fails with custom unix_socket_directory&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Modify pg_ctl behavior and exit codes to make it easier to write an LSB conforming init script&lt;br /&gt;
|It may be desirable to condition some of the changes on a command-line switch, to avoid breaking existing scripts.  A Linux shell (sh) script is referenced which has been tested and seems to provide a high degree of conformance in multiple environments.  Study of this script might suggest areas where pg_ctl could be modified to make writing an LSB conforming script easier; however, some aspects of that script would be unnecessary with other suggested changes to pg_ctl, and discussion on the lists did not reach consensus on support for all aspects of this script.  Further discussion of particular changes is needed before beginning any work.&lt;br /&gt;
* [[Lsb_conforming_init_script|LSB conforming init script]]&lt;br /&gt;
These threads should be studied for other ideas on improvements:&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01390.php &amp;lt;nowiki&amp;gt;We should Axe /contrib/start-scripts&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01843.php &amp;lt;nowiki&amp;gt;Linux LSB init script&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00008.php &amp;lt;nowiki&amp;gt;Re: Linux LSB init script&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve pg_ctl's detection of running postmasters&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00000.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-committers/2011-06/msg00001.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== psql ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have psql \ds show all sequences and their settings&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00916.php &amp;lt;nowiki&amp;gt;Re: TODO item: Have psql show current values for a sequence&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00401.php &amp;lt;nowiki&amp;gt;Quick patch: Display sequence owner&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Have \d on a sequence indicate if the sequence is owned by a table}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move psql backslash database information into the backend, use mnemonic commands?&lt;br /&gt;
|This would allow non-psql clients to pull the same information out of the database as psql. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-01/msg00191.php &amp;lt;nowiki&amp;gt;Re: psql \d option list overloaded&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make psql's \d commands more consistent in their handling of schemas&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-11/msg00014.php &amp;lt;nowiki&amp;gt;Re: psql and schemas&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make psql's \d commands distinguish default privileges from no privileges&lt;br /&gt;
|ACL displays were visibly different for the two cases before we &amp;quot;improved&amp;quot; them by using array_to_string.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2011-05/msg00082.php &amp;lt;nowiki&amp;gt;BUG #6021: There is no difference between default and empty access privileges with \dp&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consistently display privilege information for all objects in psql}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Add &amp;amp;quot;auto&amp;amp;quot; expanded mode that outputs in expanded format if &amp;amp;quot;wrapped&amp;amp;quot; mode can't wrap the output to the screen width&lt;br /&gt;
|Consider using auto-expanded mode for backslash commands like \df+.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00417.php &amp;lt;nowiki&amp;gt;Re: psql wrapped format default for backslash-d commands&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg01638.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent tab completion of SET TRANSACTION from querying the database and therefore preventing the transaction isolation level from being set.&lt;br /&gt;
|Currently SET &amp;amp;lt;tab&amp;amp;gt; causes a database lookup to check all supported session variables.  This query causes problems because setting the transaction isolation level must be the first statement of a transaction.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|\s without arguments (display history) fails with libedit, doesn't use pager either&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2011-06/msg00114.php &amp;lt;nowiki&amp;gt; psql \s not working - OS X&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a \set variable to control whether \s displays line numbers&lt;br /&gt;
|Another option is to add \# which lists line numbers, and allows command execution.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00255.php &amp;lt;nowiki&amp;gt;Re: psql possible TODO&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Include the symbolic SQLSTATE name in verbose error reports&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-09/msg00438.php &amp;lt;nowiki&amp;gt;Re: Checking is TSearch2 query is valid&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add prompt escape to display the client and server versions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00310.php &amp;lt;nowiki&amp;gt;WIP patch for TODO Item: Add prompt escape to display the client and server versions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add option to wrap column values at whitespace boundaries, rather than chopping them at a fixed width.&lt;br /&gt;
|Currently, &amp;amp;quot;wrapped&amp;amp;quot; format chops values into fixed widths.  Perhaps the word wrapping could use the same algorithm documented in the W3C specification. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00404.php &amp;lt;nowiki&amp;gt;Re: psql wrapped format default for backslash-d commands&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://www.w3.org/TR/CSS21/tables.html#auto-table-layout}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support the ReST table output format&lt;br /&gt;
|Details about the ReST format:  http://docutils.sourceforge.net/rst.html#reference-documentation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-08/msg01007.php &amp;lt;nowiki&amp;gt;Proposal: new border setting in psql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00518.php &amp;lt;nowiki&amp;gt;Re: Proposal: new border setting in psql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00609.php &amp;lt;nowiki&amp;gt;Re: Proposal: new border setting in psql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add option to print advice for people familiar with other databases&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-01/msg01845.php &amp;lt;nowiki&amp;gt;MySQL-ism help patch for psql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|\dd is missing comments for several types of objects&lt;br /&gt;
|Comments are not handled at all for some object types, and are handled by both \dd and the individual backslash command for others. Consider a system view like pg_comments to manage this mess.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2009-09/msg00199.php &amp;lt;nowiki&amp;gt;comment on constraint&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-09/msg01080.php &amp;lt;nowiki&amp;gt;pg_comments&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-05/msg00885.php &amp;lt;nowiki&amp;gt;patch: Allow \dd to show constraint comments&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add ability to edit views with \ev&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg00023.php &amp;lt;nowiki&amp;gt;Adding \ev view editor?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix FETCH_COUNT to handle SELECT ... INTO and WITH queries&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01565.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00192.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent psql from sending remaining single-line multi-statement queries after reconnecting&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01283.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Add \i option to bring in the specified file as a quoted literal&lt;br /&gt;
|This would be useful for creating functions and other areas.  Details still need to be worked out.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-02/msg00016.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-02/msg00020.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider having psql -c read .psqlrc, for consistency&lt;br /&gt;
|psql -f already reads .psqlrc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow processing of multiple -f (file) options&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve line drawing characters&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00386.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider improving the continuation prompt&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg01772.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== pg_dump / pg_restore ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|&amp;lt;nowiki&amp;gt;Add full object name to the tag field.  eg. for operators we need '=(integer, integer)', instead of just '='.&amp;lt;/nowiki&amp;gt;}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add pg_dumpall custom format dumps?&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2010-05/msg00509.php pg_dumpall custom format]&lt;br /&gt;
|}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid using platform-dependent locale names in pg_dumpall output&lt;br /&gt;
|Using native locale names puts roadblocks in the way of porting a dump to another platform.  One possible solution is to get&lt;br /&gt;
CREATE DATABASE to accept some agreed-on set of locale names and fix them up to meet the platform's requirements.&lt;br /&gt;
* http://archives.postgresql.org/message-id/21396.1241716688@sss.pgh.pa.us&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow selection of individual object(s) of all types, not just tables}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|In a selective dump, allow dumping of an object and all its dependencies}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add options like pg_restore -l and -L to pg_dump}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for multiple pg_restore -t options, like pg_dump&lt;br /&gt;
|pg_restore's -t switch is less useful than pg_dump's in quite a few ways: no multiple switches, no pattern matching, no ability to pick up indexes and other dependent items for a selected table.  It should be made to handle this switch just like pg_dump does.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Stop dumping CASCADE on DROP TYPE commands in clean mode}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow pg_dump --clean to drop roles that own objects or have privileges&lt;br /&gt;
|tgl says: if this is about pg_dumpall, it's done as of 8.4.  If it's really about pg_dump, what does it mean?  pg_dump has no business dropping roles.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow pg_dump to utilize multiple CPUs and I/O channels by dumping multiple objects simultaneously&lt;br /&gt;
|The difficulty with this is getting multiple dump processes to produce a single dump output file.  It also would require several sessions to share the same snapshot. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00205.php &amp;lt;nowiki&amp;gt;pg_dump additional options for performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00135.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00040.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg02454.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow pg_restore to load different parts of the COPY data for a single table simultaneously}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Remove support for dumping from pre-7.3 servers&lt;br /&gt;
|In 7.3 and later, we can get accurate dependency information from the server.  pg_dump still contains a lot of crufty code&lt;br /&gt;
to try to deal with the lack of dependency info in older servers, but the usefulness of maintaining that code grows small.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow pre/data/post files when schema and data are dumped separately, for performance reasons&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00205.php &amp;lt;nowiki&amp;gt;pg_dump additional options for performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-07/msg00185.php &amp;lt;nowiki&amp;gt;Re: pg_dump additional options for performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00821.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00135.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Refactor handling of database attributes between pg_dump and pg_dumpall&lt;br /&gt;
|Currently only pg_dumpall emits database attributes, such as ALTER DATABASE SET commands and database-level GRANTs.&lt;br /&gt;
Many people wish that pg_dump would do that.  One proposal is to let pg_dump issue such commands if the -C switch was used,&lt;br /&gt;
but it's unclear whether that will satisfy the demand.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg01031.php &amp;lt;nowiki&amp;gt;ALTER DATABASE vs pg_dump&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2010-05/msg00010.php summary of the issues]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change pg_dump so that a comment on the dumped database is applied to the loaded database, even if the database has a different name.&lt;br /&gt;
|This will require new backend syntax, perhaps COMMENT ON CURRENT DATABASE. This is related to the previous item.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow parallel restore of tar dumps&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-02/msg01154.php &amp;lt;nowiki&amp;gt;Re: parallel restore&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow pg_dumpall to output restorable ALTER USER/DATABASE SET settings&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00916.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00394.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02359.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-10/msg00489.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== ecpg ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Docs&lt;br /&gt;
|Document differences between ecpg and the SQL standard and information about the Informix-compatibility module.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Solve cardinality &amp;amp;gt; 1 for input descriptors / variables?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add a semantic check level, e.g. check if a table really exists}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|fix handling of DB attributes that are arrays}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix nested C comments}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|sqlwarn[6] should be 'W' if the PRECISION or SCALE value specified}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make SET CONNECTION thread-aware, non-standard?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow multidimensional arrays}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement COPY FROM STDIN}} &lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide a way to specify size of a bytea parameter&lt;br /&gt;
* [http://archives.postgresql.org/message-id/200906192131.n5JLVoMo044178@wwwmaster.postgresql.org &amp;lt;nowiki&amp;gt;BUG #4866: ECPG and BYTEA&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Fix small memory leaks in ecpg&lt;br /&gt;
|Memory leaks in a short running application like ecpg are not really a problem, but make debugging more complicated}} &lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow reuse of cursor name variables&lt;br /&gt;
* [http://archives.postgresql.org/message-id/20100329113435.GA3430@feivel.credativ.lan &amp;lt;nowiki&amp;gt;Problems with variable cursorname in ecpg&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== libpq ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent PQfnumber() from lowercasing unquoted column names&lt;br /&gt;
|PQfnumber() should never have been doing lowercasing, but historically it has so we need a way to prevent it}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow statement results to be automatically batched to the client&lt;br /&gt;
|Currently all statement results are transferred to the libpq client before libpq makes the results available to the application.  This feature would allow the application to make use of the first result rows while the rest are transferred, or held on the server waiting for them to be requested by libpq. One complexity is that a statement like SELECT 1/col could error out mid-way through the result set.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider disallowing multiple queries in PQexec() as an additional barrier to SQL injection attacks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-01/msg00184.php &amp;lt;nowiki&amp;gt;Re: InitPostgres and flatfiles question&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add PQexecf() that allows complex parameter substitution&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01803.php &amp;lt;nowiki&amp;gt;Last minute mini-proposal (I know, know) for PQexecf()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add SQLSTATE and severity to errors generated within libpq itself&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-interfaces/2007-11/msg00015.php &amp;lt;nowiki&amp;gt;v8.1: Error severity on libpq PGconn*&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01425.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add support for interface/ipaddress binding to libpq&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01811.php &amp;lt;nowiki&amp;gt;SR/libpq - outbound interface/ipaddress binding&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Triggers ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve storage of deferred trigger queue&lt;br /&gt;
|Right now all deferred trigger information is stored in backend memory.  This could exhaust memory for very large trigger queues. This item involves dumping large queues into files, or doing some kind of join to process all the triggers, some bulk operation, or a bitmap. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00876.php &amp;lt;nowiki&amp;gt;Re: BUG #4204: COPY to table with FK has memory leak&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-10/msg00464.php &amp;lt;nowiki&amp;gt;Scaling up deferred unique checks and the after trigger queue&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-08/msg00023.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow triggers to be disabled in only the current session.&lt;br /&gt;
|This is currently possible by starting a multi-statement transaction, modifying the system tables, performing the desired SQL, restoring the system tables, and committing the transaction.  ALTER TABLE ... TRIGGER requires a table lock so it is not ideal for this usage.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|With disabled triggers, allow pg_dump to use ALTER TABLE ADD FOREIGN KEY&lt;br /&gt;
|If the dump is known to be valid, allow foreign keys to be added without revalidating the data.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow statement-level triggers to access modified rows}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|When statement-level triggers are defined on a parent table, have them fire only on the parent table, and fire child table triggers only where appropriate&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg01883.php &amp;lt;nowiki&amp;gt;Statement-level triggers and inheritance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow AFTER triggers on system tables&lt;br /&gt;
|System tables are modified in many places in the backend without going through the executor and therefore not causing triggers to fire. To complete this item, the functions that modify system tables will have to fire triggers.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01665.php&lt;br /&gt;
* http://wiki.postgresql.org/wiki/DDL_Triggers&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-12/msg00022.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Tighten trigger permission checks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00564.php &amp;lt;nowiki&amp;gt;Security leak with trigger functions?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow BEFORE INSERT triggers on views&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-02/msg01466.php &amp;lt;nowiki&amp;gt;Re: Why can't I put a BEFORE EACH ROW trigger on a view?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add database and transaction-level triggers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00451.php &amp;lt;nowiki&amp;gt;Proposal for db level triggers&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00620.php &amp;lt;nowiki&amp;gt;triggers on prepare, commit, rollback... ?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce locking requirements for creating a trigger&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg00635.php &amp;lt;nowiki&amp;gt;Re: Change lock requirements for adding a trigger&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid requirement for &amp;quot;AFTER&amp;quot; trigger functions to return a value&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02384.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow creation of inline triggers&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-02/msg00708.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Inheritance ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow inherited tables to inherit indexes, UNIQUE constraints, and primary/foreign keys&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-05/msg00285.php &amp;lt;nowiki&amp;gt;Partitioning/inherited tables vs FKs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00039.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00305.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Honor UNIQUE INDEX on base column in INSERTs/UPDATEs on inherited table, e.g.  INSERT INTO inherit_table (unique_index_col) VALUES (dup) should fail&lt;br /&gt;
|The main difficulty with this item is the problem of creating an index that can span multiple tables.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Determine whether ALTER TABLE / SET SCHEMA should work on inheritance hierarchies (and thus support ONLY).  If yes, implement it.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|ALTER TABLE variants sometimes support recursion and sometimes not, but this is poorly/not documented, and the ONLY marker would then be silently ignored. Clarify the documentation, and reject ONLY if it is not supported.}}&lt;br /&gt;
&lt;br /&gt;
== Indexes ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent index uniqueness checks when UPDATE does not modify the column&lt;br /&gt;
|Uniqueness (index) checks are done when updating a column even if the column is not modified by the UPDATE.&lt;br /&gt;
However, HOT already short-circuits this in common cases, so more work might not be helpful.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow the creation of on-disk bitmap indexes which can be quickly combined with other bitmap indexes&lt;br /&gt;
|Such indexes could be more compact if there are only a few distinct values. Such indexes can also be compressed.  Keeping such indexes updated can be costly.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2005-07/msg00512.php &amp;lt;nowiki&amp;gt;Re: Bitmap index AM&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg01107.php &amp;lt;nowiki&amp;gt;Bitmap index thoughts&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00265.php &amp;lt;nowiki&amp;gt;Stream bitmaps&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01214.php &amp;lt;nowiki&amp;gt;Re: Bitmapscan changes - Requesting further feedback&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00013.php &amp;lt;nowiki&amp;gt;Updated bitmap index patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00741.php &amp;lt;nowiki&amp;gt;Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01023.php &amp;lt;nowiki&amp;gt;Bitmap Indexes: request for feedback&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/message-id/800923.27831.qm@web29010.mail.ird.yahoo.com &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow accurate statistics to be collected on indexes with more than one column or expression indexes, perhaps using per-index statistics&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2006-10/msg00222.php &amp;lt;nowiki&amp;gt;Re: Simple join optimized badly?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01131.php &amp;lt;nowiki&amp;gt;Stats for multi-column indexes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00741.php &amp;lt;nowiki&amp;gt;Cross-column statistics revisited&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg01431.php &amp;lt;nowiki&amp;gt;Multi-Dimensional Histograms&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00913.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg02179.php &lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00459.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg02054.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg01731.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00894.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-09/msg00679.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider having a larger statistics target for indexed columns and expression indexes. &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider smaller indexes that record a range of values per heap page, rather than having one index entry for every heap row&lt;br /&gt;
|This is useful if the heap is clustered by the indexed values. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00341.php &amp;lt;nowiki&amp;gt;Grouped Index Tuples&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg01264.php &amp;lt;nowiki&amp;gt;Grouped Index Tuples&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg00465.php &amp;lt;nowiki&amp;gt;Grouped Index Tuples / Clustered Indexes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php &amp;lt;nowiki&amp;gt;Bitmapscan changes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00014.php &amp;lt;nowiki&amp;gt;Re: GIT patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00487.php &amp;lt;nowiki&amp;gt;Re: Index Tuple Compression Approach?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01589.php &amp;lt;nowiki&amp;gt;Re: Index AM change proposals, redux&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY&lt;br /&gt;
|This is difficult because you must upgrade to an exclusive table lock to replace the existing index file.  CREATE INDEX CONCURRENTLY does not have this complication.  This would allow index compaction without downtime. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2007-08/msg00289.php &amp;lt;nowiki&amp;gt;Re: When/if to Reindex&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow multiple indexes to be created concurrently, ideally via a single heap scan&lt;br /&gt;
|pg_restore allows parallel index builds, but it is done via subprocesses, and there is no SQL interface for this.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-04/msg00093.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider sorting entries before inserting into btree index&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-01/msg01010.php &amp;lt;nowiki&amp;gt;Re: ATTN: Clodaldo was Performance problem. Could it be related to 8.3-beta4?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow index scans to return matching index keys, not just the matching heap locations&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01657.php &amp;lt;nowiki&amp;gt;Re: Is this TODO item done?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01477.php &amp;lt;nowiki&amp;gt;Index-only quals&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow creation of an index that can do comparisons to test if a value is between two column values&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00757.php &amp;lt;nowiki&amp;gt;Proposal: temporal extension &amp;amp;quot;period&amp;amp;quot; data type&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider using &amp;quot;effective_io_concurrency&amp;quot; for index scans&lt;br /&gt;
* Currently only bitmap scans use this, which might be fine because most multi-row index scans use bitmap scans.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problem with btree page splits during checkpoints&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00052.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-09/msg00184.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== GIST ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add more GIST index support for geometric data types}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow GIST indexes to create certain complex index types, like digital trees (see Aoki)}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix performance issues in contrib/seg and contrib/cube GiST support&lt;br /&gt;
* [http://archives.postgresql.org/message-id/alpine.DEB.2.00.0904161633160.4053@aragorn.flymine.org GiST index performance]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/alpine.DEB.2.00.0904221704470.22330@aragorn.flymine.org draft patch]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2009-05/msg00069.php &amp;lt;nowiki&amp;gt;Re: GiST index performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2009-06/msg00068.php &amp;lt;nowiki&amp;gt;GiST index performance&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Hash ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add UNIQUE capability to hash indexes}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add hash WAL logging for crash recovery&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-09/msg00196.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow multi-column hash indexes}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Sorting ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider whether duplicate keys should be sorted by block/offset&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00558.php &amp;lt;nowiki&amp;gt;Remove hacks for old bad qsort() implementations?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider being smarter about memory and external files used during sorts&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg01101.php &amp;lt;nowiki&amp;gt;Sorting Improvements for 8.4&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00045.php &amp;lt;nowiki&amp;gt;Re: Sorting Improvements for 8.4&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider detoasting keys before sorting}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow sorts to use more available memory&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2007-11/msg01026.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-09/msg01123.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg01957.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Fsync ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options and whether fsync does anything&lt;br /&gt;
|Ideally this requires a separate test program like /contrib/pg_test_fsync that can be run at initdb time or optionally later.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider sorting writes during checkpoint&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php &amp;lt;nowiki&amp;gt;Sorted writes in checkpoint&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-07/msg00050.php &amp;lt;nowiki&amp;gt;Re: Sorting writes during checkpoint&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg02012.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg00278.php&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CA+TgmoaHu1zuNohoE=cEP0nSc+0wtuRSyEAj_Af2XhxU+ry6-w@mail.gmail.com checkpoint writeback via sync_file_range]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Cache Usage ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Speed up COUNT(*)&lt;br /&gt;
|We could use a fixed row count and a +/- count to follow MVCC visibility rules, or a single cached value could be used and invalidated if anyone modifies the table.  Another idea is to get a count directly from a unique index, but for this to be faster than a sequential scan it must avoid access to the heap to obtain tuple visibility information.  Note that the index-only scans feature is now implemented which now dramatically speeds up some COUNT(*) cases.&lt;br /&gt;
* http://wiki.postgresql.org/wiki/Slow_Counting&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide a way to calculate an &amp;amp;quot;estimated COUNT(*)&amp;amp;quot;&lt;br /&gt;
|Perhaps by using the optimizer's cardinality estimates or random sampling.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-11/msg00943.php &amp;lt;nowiki&amp;gt;Re: Improving count(*)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://wiki.postgresql.org/wiki/Slow_Counting&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow data to be pulled directly from indexes&lt;br /&gt;
|Currently indexes do not have enough tuple visibility information to allow data to be pulled from the index without also accessing the heap.  The idea is to use the visibility map used for vacuum to avoid heap lookups on pages where all tuples are visible.&lt;br /&gt;
* [http://wiki.postgresql.org/wiki/Index-only_scans Index-Only Scans wiki]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider automatic caching of statements at various levels:&lt;br /&gt;
* Parsed query tree&lt;br /&gt;
* Query execute plan&lt;br /&gt;
* Query results &lt;br /&gt;
&lt;br /&gt;
:&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-04/msg00823.php &amp;lt;nowiki&amp;gt;Cached Query Plans (was: global prepared statements)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider increasing internal areas (NUM_CLOG_BUFFERS) when shared buffers is increased&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-10/msg01419.php &amp;lt;nowiki&amp;gt;Re: slru.c race condition (was Re: TRAP: FailedAssertion(&amp;amp;quot;!((itemid)-&amp;amp;gt;lp_flags &amp;amp;amp; 0x01)&amp;amp;quot;,)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00030.php &amp;lt;nowiki&amp;gt;clog_buffers to 64 in 8.3?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2007-08/msg00024.php &amp;lt;nowiki&amp;gt;CLOG Patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider decreasing the amount of memory used by PrivateRefCount&lt;br /&gt;
|&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php &amp;lt;nowiki&amp;gt;PrivateRefCount (for 8.3)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php &amp;lt;nowiki&amp;gt;Re: PrivateRefCount (for 8.3)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider allowing higher priority queries to have referenced buffer cache pages stay in memory longer&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00562.php &amp;lt;nowiki&amp;gt;Re: How to keep a table in memory?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Vacuum ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Auto-fill the free space map by scanning the buffer cache or by checking pages written by the background writer&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-02/msg01125.php &amp;lt;nowiki&amp;gt;Dead Space Map&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-03/msg00011.php &amp;lt;nowiki&amp;gt;Re: Automatic free space map filling&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow concurrent inserts to use recently created pages rather than creating new ones&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg00853.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider having single-page pruning update the visibility map&lt;br /&gt;
* &amp;lt;nowiki&amp;gt;https://commitfest.postgresql.org/action/patch_view?id=75&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg02344.php &amp;lt;nowiki&amp;gt;Re: visibility maps and heap_prune&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve tracking of total relation tuple counts now that vacuum doesn't always scan the whole heap&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-06/msg00531.php Partial vacuum versus pg_class.reltuples]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Bias FSM towards returning free space near the beginning of the heap file, in hopes that empty pages at the end can be truncated by VACUUM&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-09/msg01124.php &amp;lt;nowiki&amp;gt;FSM search modes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a more compact data representation for dead tuple locations within VACUUM&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-05/msg00143.php &amp;lt;nowiki&amp;gt;Re: Have vacuum emit a warning when it runs out of maintenance_work_mem&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide more information in order to improve user-side estimates of dead space bloat in relations&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2009-05/msg01039.php &amp;lt;nowiki&amp;gt;Re: Bloated Table&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve locking behaviour of vacuum during trailing page truncation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-03/msg00319.php&lt;br /&gt;
* http://archives.postgresql.org/message-id/4D8DF88E.7080205@Yahoo.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce the number of table scans performed by vacuum&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg01119.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00605.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-07/msg00624.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Auto-vacuum ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Issue log message to suggest VACUUM FULL if a table is nearly empty?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent long-lived temporary tables from causing frozen-xid advancement starvation&lt;br /&gt;
|The problem is that autovacuum cannot vacuum them to set frozen xids; only the session that created them can do that. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-06/msg01645.php &amp;lt;nowiki&amp;gt;Re: AutoVacuum Behaviour Question&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Prevent autovacuum from running if an old transaction is still running from the last vacuum&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00899.php &amp;lt;nowiki&amp;gt;Re: Autovacuum and OldestXmin&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have autoanalyze of parent tables occur when child tables are modified&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-06/msg00137.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-10/msg00271.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow parallel cores to be used by vacuumdb&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4F10A728.7090403@agliodbs.com vacuumdb -j]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Locking ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix priority ordering of read and write light-weight locks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-11/msg00893.php &amp;lt;nowiki&amp;gt;lwlocks and starvation&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-11/msg00905.php &amp;lt;nowiki&amp;gt;Re: lwlocks and starvation&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problem when multiple subtransactions of the same outer transaction hold different types of locks, and one subtransaction aborts&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-11/msg01011.php &amp;lt;nowiki&amp;gt;FOR SHARE vs FOR UPDATE locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg00001.php &amp;lt;nowiki&amp;gt;Re: FOR SHARE vs FOR UPDATE locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00435.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00773.php &amp;lt;nowiki&amp;gt;Re: savepoints and upgrading locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow UPDATEs on only non-referential integrity columns not to conflict with referential integrity locks&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00073.php &amp;lt;nowiki&amp;gt;Referential Integrity and SHARE locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add idle_in_transaction_timeout GUC so locks are not held for long periods of time}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve deadlock detection when a page cleaning lock conflicts with a shared buffer that is pinned&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-01/msg00138.php &amp;lt;nowiki&amp;gt;BUG #3883: Autovacuum deadlock with truncate?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00873.php &amp;lt;nowiki&amp;gt;Thoughts about bug #3883&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-committers/2008-01/msg00365.php &amp;lt;nowiki&amp;gt;Re: pgsql: Add checks to TRUNCATE, CLUSTER, and REINDEX to prevent&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Detect deadlocks involving LockBufferForCleanup()&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00873.php &amp;lt;nowiki&amp;gt;Thoughts about bug #3883&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow finer control over who is cancelled in a deadlock&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01727.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a lock timeout parameter&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-05/msg00485.php &amp;lt;nowiki&amp;gt;SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Reduce number of unnecessary false positives in Serializable Snapshot Isolation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2011-06/msg00609.php &amp;lt;nowiki&amp;gt;SSI heap_insert and page-level predicate locks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Startup Time Improvements ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Experiment with multi-threaded backend for backend creation&lt;br /&gt;
|This would prevent the overhead associated with process creation. Most operating systems have trivial process creation time compared to database startup overhead, but a few operating systems (Win32, Solaris) might benefit from threading.  Also explore the idea of a single session using multiple threads to execute a statement faster.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow backends to change their database without restart&lt;br /&gt;
|This allows for faster server startup.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00843.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00336.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Write-Ahead Log ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Eliminate need to write full pages to WAL before page modification&lt;br /&gt;
|Currently, to protect against partial disk page writes, we write full page images to WAL before they are modified so we can correct any partial page writes during recovery.  These pages can also be eliminated from point-in-time archive files. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2002-06/msg00655.php &amp;lt;nowiki&amp;gt;Re: Index Scans become Seq Scans after VACUUM ANALYSE&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg01191.php&lt;br /&gt;
* [http://archives.postgresql.org/message-id/20120105061916.GB21048@fetter.org WIP double writes]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4EFC449F02000025000441CD@gw.wicourts.gov double writes]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/20120110214344.GB21106@fetter.org Double-write with Fast Checksums]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/1962493974.656458.1327703514780.JavaMail.root@zimbra-prod-mbox-4.vmware.com double writes using &amp;quot;double-write buffer&amp;quot; approach]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|When full page writes are off, write CRC to WAL and check file system blocks on recovery&lt;br /&gt;
|If CRC check fails during recovery, remember the page in case a later CRC for that page properly matches.  The difficulty is that hint bits are not WAL logged, meaning a valid page might not match the earlier CRC.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Write full pages during file system write and not when the page is modified in the buffer cache&lt;br /&gt;
|This allows most full page writes to happen in the background writer.  It might cause problems for applying WAL on recovery into a partially-written page, but later the full page will be replaced from WAL.&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CAGvK12UST-tPhyLrSLuSpwFxZbAO79yYrhV2xaLmS2MkUxNUVQ@mail.gmail.com Page Checksums + Double Writes]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce WAL traffic so only modified values are written rather than entire rows&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-03/msg01589.php &amp;lt;nowiki&amp;gt;Reduction in WAL for UPDATEs&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow WAL information to recover corrupted pg_controldata&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] pg_resetxlog -r flag&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Find a way to reduce rotational delay when repeatedly writing last WAL page&lt;br /&gt;
|Currently fsync of WAL requires the disk platter to perform a full rotation to fsync again. One idea is to write the WAL to different offsets that might reduce the rotational delay. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2002-11/msg00483.php &amp;lt;nowiki&amp;gt;500 tpsQL + WAL log implementation&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Speed WAL recovery by allowing more than one page to be prefetched&lt;br /&gt;
|This should be done utilizing the same infrastructure used for prefetching in general to avoid introducing complex error-prone code in WAL replay. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php &amp;lt;nowiki&amp;gt;Slow PITR restore&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php &amp;lt;nowiki&amp;gt;Re: [GENERAL] Slow PITR restore&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg01279.php &amp;lt;nowiki&amp;gt;Read-ahead and parallelism in redo recovery&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve WAL concurrency by increasing lock granularity&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00556.php &amp;lt;nowiki&amp;gt;Reworking WAL locking&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Be more aggressive about creating WAL files&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg01325.php &amp;lt;nowiki&amp;gt;Re: PANIC caused by open_sync on Linux&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-07/msg01075.php &amp;lt;nowiki&amp;gt;PreallocXlogFiles&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2005-04/msg00556.php &amp;lt;nowiki&amp;gt;WAL/PITR additional items&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have resource managers report the duration of their status changes&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg01468.php &amp;lt;nowiki&amp;gt;Recovery of Multi-stage WAL actions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move pgfoundry's xlogdump to /contrib and have it rely more closely on the WAL backend code&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-11/msg00035.php &amp;lt;nowiki&amp;gt;xlogdump&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Close deleted WAL files held open in *nix by long-lived read-only backends&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-11/msg01754.php &amp;lt;nowiki&amp;gt;Deleted WAL files held open by backends in Linux&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg00060.php &amp;lt;nowiki&amp;gt;Re: Deleted WAL files held open by backends in Linux&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Optimizer / Executor ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve selectivity functions for geometric operators}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider increasing the default values of from_collapse_limit, join_collapse_limit, and/or geqo_threshold&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4136ffa0905210551u22eeb31bn5655dbe7c9a3aed5@mail.gmail.com from_collapse_limit vs. geqo_threshold]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve ability to display optimizer analysis using OPTIMIZER_DEBUG}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Log statements where the optimizer row estimates were dramatically different from the number of rows actually found?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider compressed annealing to search for query plans&lt;br /&gt;
|This might replace GEQO.&lt;br /&gt;
* http://archives.postgresql.org/message-id/15658.1241278636%40sss.pgh.pa.us&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve use of expression indexes for ORDER BY &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg01553.php &amp;lt;nowiki&amp;gt;Resjunk sort columns, Heikki's index-only quals patch, and bug #5000&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Modify the planner to better estimate caching effects&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2010-11/msg00117.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow shared buffer cache contents to affect index cost computations&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg01140.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== Hashing ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider using a hash for joining to a large IN (VALUES ...) list&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-05/msg00450.php &amp;lt;nowiki&amp;gt;Planning large IN lists&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow single batch hash joins to preserve outer pathkeys&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php Re: Potential Join Performance Issue]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;quot;lazy&amp;quot; hash tables - look up only the tuples that are actually requested&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid building the same hash table more than once during the same query&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid hashing for distinct and then re-hashing for hash join&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4136ffa0902191346g62081081v8607f0b92c206f0a@mail.gmail.com Re: Fixing Grittner's planner issues]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-04/msg00153.php a few crazy ideas about hash joins]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Background Writer ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider having the background writer update the transaction status hint bits before writing out the page&lt;br /&gt;
|Implementing this requires the background writer to have access to system catalogs and the transaction status log.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider adding buffers the background writer finds reusable to the free list &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php &amp;lt;nowiki&amp;gt;Background LRU Writer/free list&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CA+U5nMKtvyDcV4zTr7bq7t6cA2nBfLxCJ8tQgVBnc5ddRPO+Bg@mail.gmail.com our buffer replacement strategy is kind of lame]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Automatically tune bgwriter_delay based on activity rather then using a fixed interval&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php &amp;lt;nowiki&amp;gt;Background LRU Writer/free list&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CA+U5nMKtvyDcV4zTr7bq7t6cA2nBfLxCJ8tQgVBnc5ddRPO+Bg@mail.gmail.com our buffer replacement strategy is kind of lame]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider whether increasing BM_MAX_USAGE_COUNT improves performance&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-06/msg01007.php &amp;lt;nowiki&amp;gt;Bgwriter LRU cleaning: we've been going at this all wrong&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Test to see if calling PreallocXlogFiles() from the background writer will help with WAL segment creation latency&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-06/msg00340.php &amp;lt;nowiki&amp;gt;Re: Load Distributed Checkpoints, final patch&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Concurrent Use of Resources ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Do async I/O for faster random read-ahead of data&lt;br /&gt;
|Async I/O allows multiple I/O requests to be sent to the disk with results coming back asynchronously.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php &amp;lt;nowiki&amp;gt;Asynchronous I/O Support&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2007-09/msg00255.php &amp;lt;nowiki&amp;gt;Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00027.php &amp;lt;nowiki&amp;gt;There's random access and then there's random access&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-01/msg00170.php &amp;lt;nowiki&amp;gt;Bitmap index scan preread using posix_fadvise (Was: There's random access and then there's random access)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
The above patch is already applied as of 8.4, but it still remains to figure out how to handle plain indexscans effectively.&lt;br /&gt;
* [http://archives.postgresql.org//pgsql-hackers/2009-01/msg00806.php Problems with the patch submitted for posix_fadvise in index scans]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Experiment with multi-threaded backend for better I/O utilization&lt;br /&gt;
|This would allow a single query to make use of multiple I/O channels simultaneously.  One idea is to create a background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be expanded to allow concurrent reads from multiple devices in a partitioned table.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-performance/2011-02/msg00123.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Experiment with multi-threaded backend for better CPU utilization&lt;br /&gt;
|This would allow several CPUs to be used for a single query, such as for sorting or query execution.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00945.php &amp;lt;nowiki&amp;gt;Multi CPU Queries - Feedback and/or suggestions wanted!&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|SMP scalability improvements&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00439.php &amp;lt;nowiki&amp;gt;Straightforward changes for increased SMP scalability&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00206.php &amp;lt;nowiki&amp;gt;Re: Reducing Transaction Start/End Contention&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00361.php &amp;lt;nowiki&amp;gt;Re: Reducing Transaction Start/End Contention&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== TOAST ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow user configuration of TOAST thresholds&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-02/msg00213.php &amp;lt;nowiki&amp;gt;Re: Proposed adjustments in MaxTupleSize and toastthresholds&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php &amp;lt;nowiki&amp;gt;pg_lzcompress strategy parameters&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce unnecessary cases of deTOASTing&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-09/msg00895.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] Eliminate more detoast copies for packed varlenas&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce costs of repeat de-TOASTing of values&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-06/msg01096.php &amp;lt;nowiki&amp;gt;WIP patch: reducing overhead for repeat de-TOASTing&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Monitoring ==&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Expand pg_stat_activity for easier integration with monitoring tools&lt;br /&gt;
|* http://archives.postgresql.org/message-id/4DFA13A5.2060200@2ndQuadrant.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add column to pg_stat_activity that shows the progress of long-running commands like CREATE INDEX and VACUUM&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-04/msg00203.php &amp;lt;nowiki&amp;gt;EXPLAIN progress info&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* The CLUSTER/VACUUM FULL implementation would also be useful to track this way&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Have pg_stat_activity display query strings in the correct client encoding&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-01/msg00131.php &amp;lt;nowiki&amp;gt;pg_stats queries versus per-database encodings&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Expose pg_controldata via an SQL interface&lt;br /&gt;
|Helpful for monitoring replicated databases&lt;br /&gt;
* http://archives.postgresql.org/message-id/4B901D73.8030003@agliodbs.com&lt;br /&gt;
* [http://archives.postgresql.org/message-id/4B959D7A.6010907@joeconway.com initial patch]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Miscellaneous Performance ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Use mmap() rather than SYSV for shared buffers?&lt;br /&gt;
|This would remove the requirement for SYSV SHM but would introduce portability issues. Anonymous mmap (or mmap to /dev/zero) is required to prevent I/O overhead. We could also consider mmap() for writing WAL.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00750.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00756.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Rather than consider mmap()-ing in 8k pages, consider mmap()'ing entire files into a backend?&lt;br /&gt;
|Doing I/O to large tables would consume a lot of address space or require frequent mapping/unmapping.  Extending the file also causes mapping problems that might require mapping only individual pages, leading to thousands of mappings.  Another problem is that there is no way to _prevent_ I/O to disk from the dirty shared buffers so changes could hit disk before WAL is written.&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01239.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider ways of storing rows more compactly on disk:&lt;br /&gt;
* Reduce the row header size?&lt;br /&gt;
* Consider reducing on-disk varlena length from four bytes to two because a heap row cannot be more than 64k in length}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider transaction start/end performance improvements&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php &amp;lt;nowiki&amp;gt;Reducing Transaction Start/End Contention&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00361.php &amp;lt;nowiki&amp;gt;Re: Reducing Transaction Start/End Contention&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow configuration of backend priorities via the operating system&lt;br /&gt;
|Though backend priorities make priority inversion during lock waits possible, research shows that this is not a huge problem.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2007-02/msg00493.php &amp;lt;nowiki&amp;gt;Priorities for users or queries?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider increasing the minimum allowed number of shared buffers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-bugs/2008-02/msg00157.php &amp;lt;nowiki&amp;gt;Re: [PATCH] Don't bail with legitimate -N/-B options&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider if CommandCounterIncrement() can avoid its AcceptInvalidationMessages() call&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-committers/2007-11/msg00585.php &amp;lt;nowiki&amp;gt;pgsql: Avoid incrementing the CommandCounter when&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-performance/2007-12/msg00090.php &amp;lt;nowiki&amp;gt;Re: TB-sized databases&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider not storing a NULL bitmap on disk if all the NULLs are trailing&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-12/msg00624.php &amp;lt;nowiki&amp;gt;Proposal for Null Bitmap Optimization(for Trailing NULLs)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-12/msg00109.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] Proposal for Null Bitmap Optimization(for TrailingNULLs)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Sort large UPDATE/DELETEs so it is done in heap order&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg01119.php &amp;lt;nowiki&amp;gt;Possible future performance improvement: sort updates/deletes by ctid&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow one transaction to see tuples using the snapshot of another transaction&lt;br /&gt;
|This would assist multiple backends in working together. &lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00400.php &amp;lt;nowiki&amp;gt;Transaction Snapshot Cloning&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00135.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00260.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg00466.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-08/msg00684.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider decreasing the I/O caused by updating tuple hint bits&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00847.php &amp;lt;nowiki&amp;gt;Hint Bits and Write I/O&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-07/msg00199.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] Hint Bits and Write I/O&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-10/msg00695.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00792.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-01/msg01063.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01408.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01453.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid the requirement of freezing pages that are infrequently modified &lt;br /&gt;
|If all rows on a page are visible, it is possible to set a bit in the visibility map (once the visibility map is 100% reliable) and not need to freeze the page, avoiding a page rewrite&lt;br /&gt;
*  http://archives.postgresql.org/message-id/4BF701CF.2090205@agliodbs.com&lt;br /&gt;
*  http://archives.postgresql.org/pgsql-hackers/2010-06/msg00082.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Avoid reading in b-tree pages when replaying vacuum records in hot standby mode&lt;br /&gt;
* [http://archives.postgresql.org/message-id/1272571938.4161.14739.camel@ebony &amp;lt;nowiki&amp;gt;Hot Standby tuning for btree_xlog_vacuum()&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Restructure truncation logic to be more resistant to failure&lt;br /&gt;
|This also involves not writing dirty buffers for a truncated or dropped relation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-08/msg01032.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider adding logic to increase large tables by more than 8k&lt;br /&gt;
|This would reduce file system fragmentation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2011-03/msg00337.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Miscellaneous Other ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Deal with encoding issues for filenames in the server filesystem&lt;br /&gt;
* {{MessageLink|20090413184335.39BE.52131E4D@oss.ntt.co.jp|a proposed patch here}}&lt;br /&gt;
* {{MessageLink|8484.1244655656@sss.pgh.pa.us|some issues about it here}}&lt;br /&gt;
* {{MessageLink|20100107103740.97A5.52131E4D@oss.ntt.co.jp|Windows-specific patch here}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Deal with encoding issues in the output of localeconv()&lt;br /&gt;
* [http://archives.postgresql.org/message-id/40c6d9160904210658y590377cfw6dbbecb53d2b8be0@mail.gmail.com bug report]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/49EF8DA0.90008@tpf.co.jp draft patch]&lt;br /&gt;
* [http://archives.postgresql.org/message-id/21710.1243620986@sss.pgh.pa.us review of patch]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide schema name and other fields available from SQL GET DIAGNOSTICS in error reports&lt;br /&gt;
* [http://archives.postgresql.org/message-id/dcc563d10810211907n3c59a920ia9eb7cd2a6d5ea58@mail.gmail.com &amp;lt;nowiki&amp;gt;How to get schema name which violates fk constraint&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-11/msg00846.php &amp;lt;nowiki&amp;gt;patch - Report the schema along table name in a referential failure 	error message&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* {{MessageLink|3191.1263306359@sss.pgh.pa.us|Re: NOT NULL violation and error-message}}&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-08/msg00213.php &amp;lt;nowiki&amp;gt;the case for machine-readable error fields&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
| Provide [http://developer.postgresql.org/pgdocs/postgres/libpq-connect.html#LIBPQ-CONNECT-FALLBACK-APPLICATION-NAME fallback_application_name] in contrib/pgbench, oid2name, and dblink.&lt;br /&gt;
* {{MessageLink|w2g9837222c1004070216u3bc46b3ahbddfdffdbfb46212@mail.gmail.com|fallback_application_name and pgbench}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add 64-bit support to /contrib/pgbench&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-07/msg00153.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg00705.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Source Code ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add use of 'const' for variables in source tree&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-11/msg00473.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemEasy&lt;br /&gt;
|Remove warnings created by -Wcast-align}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Move platform-specific ps status display info from ps_status.c to ports}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add optional CRC checksum to heap and index pages&lt;br /&gt;
|One difficulty is how to prevent hint bit changes from affecting the computed CRC checksum.&lt;br /&gt;
* http://archives.postgresql.org/message-id/19934.1226601952%40sss.pgh.pa.us&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00002.php &amp;lt;nowiki&amp;gt;Re: Block-level CRC checks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg01028.php &amp;lt;nowiki&amp;gt;double-buffering page writes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00524.php &amp;lt;nowiki&amp;gt;Re: Block-level CRC checks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg01101.php &amp;lt;nowiki&amp;gt;Re: Block-level CRC checks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2009-12/msg00011.php &amp;lt;nowiki&amp;gt;Re: Block-level CRC checks&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00249.php&lt;br /&gt;
* http://archives.postgresql.org/message-id/20111221215913.GA4536@fetter.org&lt;br /&gt;
* http://archives.postgresql.org/message-id/CA+U5nMJzQyxcObkpNAf1SYTX-gO_Mom3O9JXHnGpxRo1kXJ7ww@mail.gmail.com&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-01/msg00128.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-01/msg00113.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-02/msg00172.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-03/msg00001.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2012-03/msg00188.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider a faster CRC32 algorithm&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01112.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow cross-compiling by generating the zic database on the target system}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve NLS maintenance of libpgport messages linked onto applications}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Use UTF8 encoding for NLS messages so all server encodings can read them properly}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow creation of universal binaries for Darwin&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00884.php &amp;lt;nowiki&amp;gt;Getting to universal binaries for Darwin&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider GnuTLS if OpenSSL license becomes a problem&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-02/msg00892.php&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2006-05/msg00040.php &amp;lt;nowiki&amp;gt;[PATCH] Add support for GnuTLS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-12/msg01213.php &amp;lt;nowiki&amp;gt;TODO: GNU TLS&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider making NAMEDATALEN more configurable in future releases}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Research use of signals and sleep wake ups&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-07/msg00003.php &amp;lt;nowiki&amp;gt;Restartable signals 'n all that&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow C++ code to more easily access backend code&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg00302.php &amp;lt;nowiki&amp;gt;Mostly Harmless: Welcoming our C++ friends&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider simplifying how memory context resets handle child contexts&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2007-08/msg00067.php &amp;lt;nowiki&amp;gt;Re: Memory leak in nodeAgg&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Create three versions of libpgport to simplify client code&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-10/msg00154.php &amp;lt;nowiki&amp;gt;8.4 TODO item: make src/port support libpq and ecpg directly&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve detection of shared memory segments being used by others by checking the SysV shared memory field 'nattch'&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00656.php &amp;lt;nowiki&amp;gt;postgresql in FreeBSD jails: proposal&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00673.php &amp;lt;nowiki&amp;gt;Re: postgresql in FreeBSD jails: proposal&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Implement the non-threaded Avahi service discovery protocol&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00939.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] Avahi support for Postgresql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-02/msg00097.php &amp;lt;nowiki&amp;gt;Re: Avahi support for Postgresql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg01211.php &amp;lt;nowiki&amp;gt;Re: [PATCHES] Avahi support for Postgresql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-04/msg00001.php &amp;lt;nowiki&amp;gt;Re: [HACKERS] Avahi support for Postgresql&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Reduce data row alignment requirements on some 64-bit systems&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-10/msg00369.php &amp;lt;nowiki&amp;gt;[WIP] Reduce alignment requirements on 64-bit systems.&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Restructure TOAST internal storage format for greater flexibility&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-11/msg00049.php &amp;lt;nowiki&amp;gt;Re: PG_PAGE_LAYOUT_VERSION 5 - time for change&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Add regression tests for pg_dump/restore&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-02/msg01967.php &amp;lt;nowiki&amp;gt;&amp;quot;make install-check-pg_dump&amp;quot; target in src/regress]&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Research different memory allocation methods for lists&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-04/msg01467.php &lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Consider removing the attribute options cache&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg00039.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
| Restructure /contrib section&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-06/msg00705.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=== /contrib/pg_upgrade ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Handle large object comments&lt;br /&gt;
|This is difficult to do because the large object doesn't exist when --schema-only is loaded.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider using pg_depend for checking object usage in version.c&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|If reindex is necessary, allow it to be done in parallel with pg_dump custom format&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Migrate pg_statistic by dumping it out as a flat file, so analyze is not necessary&lt;br /&gt;
|pg_class.oid is not preserved so schema.tablename must be used.&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CAAZKuFaWdLkK8eozSAooZBets9y_mfo2HS6urPAKXEPbd-JLCA@mail.gmail.com pg_upgrade and statistics]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve testing, perhaps using the buildfarm&lt;br /&gt;
|The buildfarm has access to multiple versions of PostgreSQL.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Create machine-readable output of pg_controldata&lt;br /&gt;
|This would avoid parsing its output.  The problem is we need pg_controldata output from both the old and new clusters so we would need to support both formats.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Windows ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Remove configure.in check for link failure when cause is found}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Remove readdir() errno patch when runtime/mingwex/dirent.c rev 1.4 is released}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow psql to use readline once non-US code pages work with backslashes}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix problem with shared memory on the Win32 Terminal Server}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve signal handling&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2005-06/msg00027.php &amp;lt;nowiki&amp;gt;Simplify Win32 Signaling code&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Convert MSVC build system to remove most batch files&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2007-08/msg00961.php &amp;lt;nowiki&amp;gt;MSVC build system&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Support pgxs when using MSVC}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix MSVC NLS support, like for to_char()&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-02/msg00485.php &amp;lt;nowiki&amp;gt;NLS on MSVC  strikes back!&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-patches/2008-02/msg00038.php &amp;lt;nowiki&amp;gt;Fix for 8.3 MSVC locale (Was  [HACKERS] NLS on MSVC strikes back!)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Find a correct rint() substitute on Windows&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00808.php &amp;lt;nowiki&amp;gt;Minor bug in src/port/rint.c&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix global namespace issues when using multiple terminal server sessions&lt;br /&gt;
* [http://archives.postgresql.org/message-id/48F3BFCC.8030107@dunslane.net problems with Windows global namespace]}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change from the current autoconf/gmake build system to cmake&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-12/msg01869.php &amp;lt;nowiki&amp;gt;About CMake (was Re: [COMMITTERS] pgsql: Append major version number and for libraries soname major)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Improve consistency of path separator usage&lt;br /&gt;
* http://archives.postgresql.org/message-id/49C0BDC5.4010002@hagander.net&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Fix cross-compiling on Windows&lt;br /&gt;
* http://archives.postgresql.org/pgsql-bugs/2010-10/msg00110.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItemDone&lt;br /&gt;
|Allow multiple Postgres clusters running on the same machine to distinguish themselves in the event log&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-03/msg01297.php&lt;br /&gt;
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00574.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
=== Wire Protocol Changes ===&lt;br /&gt;
{{TodoSubsection}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow dynamic character set handling}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add decoded type, length, precision}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Mark result columns as known-not-null when possible&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-11/msg01029.php &amp;lt;nowiki&amp;gt;Adding nullable indicator to Describe&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide more control over planner treatment of statements being prepared}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Use compression?}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Update clients to use data types, typmod, schema.table.column names of result sets using new statement protocol}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Set protocol for wire format negotiation&lt;br /&gt;
* [http://archives.postgresql.org/message-id/CACMqXCKkGrGXxQhjHCKCe0B8hn6sTt-1sdgHZOSGQMxrusOsQA@mail.gmail.com GUC_REPORT for protocol tunables]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Make sure upgrading to a 4.1 protocol version will actually work smoothly&lt;br /&gt;
* [http://archives.postgresql.org/message-id/28307.1318255008@sss.pgh.pa.us Re: libpq, PQdescribePrepared -&amp;gt; PQftype, PQfmod, no PQnullable]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoEndSubsection}}&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Convert single quotes to apostrophes in the PDF documentation&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-docs/2007-12/msg00059.php &amp;lt;nowiki&amp;gt;SGML docs and pdf single-quotes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Provide a manpage for postgresql.conf&lt;br /&gt;
* {{messageLink|20080819194311.GH4428@alvh.no-ip.org|A smaller default postgresql.conf}}&lt;br /&gt;
* {{messageLink|200808211910.37524.peter_e@gmx.net|A smaller default postgresql.conf}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Change the manpage-generating toolchain to use the new XML-based docbook2x tools&lt;br /&gt;
* {{messageLink|200808211910.37524.peter_e@gmx.net|A smaller default postgresql.conf}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider changing documentation format from SGML to XML&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-docs/2006-12/msg00152.php &amp;lt;nowiki&amp;gt;Re: Authoring Tools WAS: Switching to XML&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* http://archives.postgresql.org/pgsql-docs/2011-04/msg00020.php&lt;br /&gt;
* http://wiki.postgresql.org/wiki/Switching_PostgreSQL_documentation_from_SGML_to_XML&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Document support for N&amp;lt;nowiki&amp;gt;' '&amp;lt;/nowiki&amp;gt; national character string literals, if it matches the SQL standard&lt;br /&gt;
* http://archives.postgresql.org/message-id/1275895438.1849.1.camel@fsopti579.F-Secure.com&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add diagrams to the documentation&lt;br /&gt;
* http://archives.postgresql.org/pgsql-docs/2010-07/msg00001.php&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Exotic Features ==&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add pre-parsing phase that converts non-ISO syntax to supported syntax&lt;br /&gt;
|This could allow SQL written for other databases to run without modification.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Allow plug-in modules to emulate features from other databases}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add features of Oracle-style packages&lt;br /&gt;
|A package would be a schema with session-local variables, public/private functions, and initialization functions.  It is also possible to implement these capabilities in any schema and not use a separate &amp;amp;quot;packages&amp;amp;quot; syntax at all.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-08/msg00384.php &amp;lt;nowiki&amp;gt;proposal for PL packages for 8.3.&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Consider allowing control of upper/lower case folding of unquoted identifiers&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2004-04/msg00818.php &amp;lt;nowiki&amp;gt;Bringing PostgreSQL torwards the standard regarding case folding&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2006-10/msg01527.php &amp;lt;nowiki&amp;gt;Re: [SQL] Case Preservation disregarding case sensitivity?&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php &amp;lt;nowiki&amp;gt;TODO Item: Consider allowing control of upper/lower case folding of unquoted,  identifiers&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00415.php &amp;lt;nowiki&amp;gt;Identifier case folding notes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-07/msg00415.php &amp;lt;nowiki&amp;gt;Identifier case folding notes&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Add autonomous transactions&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php &amp;lt;nowiki&amp;gt;autonomous transactions&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Give query progress indication&lt;br /&gt;
* [[Query progress indication]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Rethink our type system&lt;br /&gt;
* [[Rethinking datatypes]]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Features We Do ''Not'' Want ==&lt;br /&gt;
&lt;br /&gt;
The following features have been discussed ad nauseum on the PostgreSQL mailing lists and the consensus has been that the project is not interested in them.  As such, if you are going to bring them up as potential features, you will want to be familiar with all of the arguments against these features which have been previously made over the years.  If you decide to work on such features anyway, you should be aware that you face a higher-than-normal barrier to get the Project to accept them.&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|All backends running as threads in a single process (not wanted)&lt;br /&gt;
|This eliminates the process protection we get from the current setup. Thread creation is usually the same overhead as process creation on modern systems, so it seems unwise to use a pure threaded model, and MySQL and DB2 have demonstrated that threads introduce as many issues as they solve.  Threading specific operations such as I/O, seq scans, and connection management has been discussed and will probably be implemented to enable specific performance features.  Moving to a threaded engine would also require halting all other work on PostgreSQL for one to two years.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|&amp;quot;Oracle-style&amp;quot; optimizer hints (not wanted)&lt;br /&gt;
|Optimizer hints, as implemented in Oracle and other RDBMSes, are used to work around problems in the optimizer and introduce upgrade and maintenance issues.  We would rather have such problems reported and fixed.  We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. See [[OptimizerHintsDiscussion|Optimizer Hints Discussion]] for further information.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Embedded server (not wanted)&lt;br /&gt;
|While PostgreSQL clients runs fine in limited-resource environments, the server requires multiple processes and a stable pool of resources to run reliably and efficiently. Stripping down the PostgreSQL server to run in the same process address space as the client application would add too much complexity and failure cases. Besides, there are several very mature embedded SQL databases already available.}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Obfuscated function source code (not wanted)&lt;br /&gt;
|Obfuscating function source code has minimal protective benefits because anyone with super-user access can find a way to view the code. At the same time, it would greatly complicate backups and other administrative tasks. To prevent non-super-users from viewing function source code, remove SELECT permission on pg_proc.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-general/2008-09/msg00668.php &amp;lt;nowiki&amp;gt;Obfuscated stored procedures (was Re: Oracle and Postgresql)&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{TodoItem&lt;br /&gt;
|Indeterminate behavior for the GROUP BY clause (not wanted)&lt;br /&gt;
|At least one other database product allows specification of a subset of the result columns which GROUP BY would need to be able to provide predictable results; the server is free to return any value from the group.  This is not viewed as a desirable feature.  PostgreSQL 9.1 allows result columns that are not referenced by GROUP BY if a primary key for the same table is referenced in GROUP BY.&lt;br /&gt;
* [http://archives.postgresql.org/pgsql-hackers/2010-03/msg00297.php &amp;lt;nowiki&amp;gt;Re: SQL compatibility reminder: MySQL vs PostgreSQL&amp;lt;/nowiki&amp;gt;]&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Todo]]&lt;/div&gt;</description>
			<pubDate>Tue, 20 Mar 2012 16:26:14 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Todo</comments>		</item>
		<item>
			<title>PgCon 2012 Developer Meeting</title>
			<link>http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting</link>
			<guid>http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Proposed Agenda Items */ Include Kevin for Queuing agenda item, and remove question mark on Kevin for Materialized views&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A meeting of the most active PostgreSQL developers is being planned for Wednesday 16th May, 2012 near the University of Ottawa, prior to pgCon 2012. In order to keep the numbers manageable, this meeting is '''by invitation only'''. Unfortunately it is quite possible that we've overlooked important code developers during the planning of the event - if you feel you fall into this category and would like to attend, please contact Dave Page (dpage@pgadmin.org). &lt;br /&gt;
&lt;br /&gt;
Please note that this year the attendee numbers have been cut to try to keep the meeting more productive. Invitations have been sent only to developers that have been highly active on the database server over the 9.2 release cycle. We have not invited any contributors based on their contributions to related projects, or seniority in regional user groups or sponsoring companies, unlike in previous years.&lt;br /&gt;
&lt;br /&gt;
This is a PostgreSQL Community event. Room and refreshments/food sponsored by EnterpriseDB. Other companies sponsored attendance for their developers.&lt;br /&gt;
 &lt;br /&gt;
== Time &amp;amp; Location ==&lt;br /&gt;
&lt;br /&gt;
The meeting will be from 9AM to 5PM, and will be in the &amp;quot;Red Experience&amp;quot; room at:&lt;br /&gt;
&lt;br /&gt;
 Novotel Ottawa&lt;br /&gt;
 33 Nicholas Street&lt;br /&gt;
 Ottawa&lt;br /&gt;
 Ontario&lt;br /&gt;
 K1N 9M7&lt;br /&gt;
 &lt;br /&gt;
Food and drink will be provided throughout the day, including breakfast from 8AM.&lt;br /&gt;
&lt;br /&gt;
[http://maps.google.ca/maps?f=q&amp;amp;source=s_q&amp;amp;hl=en&amp;amp;geocode=&amp;amp;q=novotel+ottawa&amp;amp;aq=&amp;amp;sll=49.891235,-97.15369&amp;amp;sspn=36.237851,79.013672&amp;amp;ie=UTF8&amp;amp;hq=novotel+ottawa&amp;amp;hnear=&amp;amp;ll=45.421528,-75.683699&amp;amp;spn=0.036869,0.077162&amp;amp;z=14&amp;amp;iwloc=A&amp;amp;layer=c&amp;amp;cbll=45.425741,-75.689638&amp;amp;panoid=Z4FUGnkZkdHAOkIxyjjS9Q&amp;amp;cbp=12,25.83,,0,-0.6 View on Google Maps]&lt;br /&gt;
&lt;br /&gt;
== Attendees ==&lt;br /&gt;
&lt;br /&gt;
The following people have RSVPed to the meeting (in alphabetical order, by surname):&lt;br /&gt;
&lt;br /&gt;
* Oleg Bartunov&lt;br /&gt;
* Josh Berkus (Secretary)&lt;br /&gt;
* Jeff Davis&lt;br /&gt;
* Andrew Dunstan&lt;br /&gt;
* Dimitri Fontaine&lt;br /&gt;
* Stephen Frost&lt;br /&gt;
* Peter Geoghegan&lt;br /&gt;
* Kevin Grittner&lt;br /&gt;
* Robert Haas&lt;br /&gt;
* Magnus Hagander&lt;br /&gt;
* Hitoshi Harada&lt;br /&gt;
* KaiGai Kohei&lt;br /&gt;
* Tom Lane&lt;br /&gt;
* Bruce Momjian&lt;br /&gt;
* Dave Page (Chair)&lt;br /&gt;
* Simon Riggs&lt;br /&gt;
* Teodor Sigaev&lt;br /&gt;
* Greg Smith&lt;br /&gt;
&lt;br /&gt;
== Proposed Agenda Items ==&lt;br /&gt;
&lt;br /&gt;
Please list proposed agenda items here:&lt;br /&gt;
&lt;br /&gt;
* Queuing [Dimitri, Kevin]&lt;br /&gt;
* Materialized views [Dimitri, Kevin]&lt;br /&gt;
* Partitioning and Segment Exclusion [Dimitri]&lt;br /&gt;
* Row-level Access Control and SELinux [KaiGai]&lt;br /&gt;
** Security label on user tables&lt;br /&gt;
** Dynamic expandable enum data types&lt;br /&gt;
** Enforcement of triggers by extension&lt;br /&gt;
* Enhancement of FDW at v9.3 [KaiGai]&lt;br /&gt;
** Writable foreign tables&lt;br /&gt;
** Stuffs to be pushed down (Join, Aggregate, Sort, ...)&lt;br /&gt;
** Inheritance of foreign/regular tables&lt;br /&gt;
** Constraint (PK/FK) &amp;amp; Trigger support.&lt;br /&gt;
* GPU Acceleration [KaiGai]&lt;br /&gt;
&lt;br /&gt;
== Agenda ==&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;4&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
!Time&lt;br /&gt;
!Item&lt;br /&gt;
!Presenter&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:00&lt;br /&gt;
|Breakfast&lt;br /&gt;
|&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|08:45 - 09:00&lt;br /&gt;
|Welcome and introductions&lt;br /&gt;
|Dave Page&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|10:30 - 10:45&lt;br /&gt;
|Coffee break&lt;br /&gt;
|&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|12:30 - 13:30&lt;br /&gt;
|Lunch	&lt;br /&gt;
|&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|15:00 - 15:15&lt;br /&gt;
|Tea break&lt;br /&gt;
|&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|16:45 - 17:00&lt;br /&gt;
|Any other business/group photo&lt;br /&gt;
|Dave Page&lt;br /&gt;
|- style=&amp;quot;font-style:italic;background-color:lightgray;&amp;quot;&lt;br /&gt;
|17:00&lt;br /&gt;
|Finish&lt;br /&gt;
|	&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Minutes==&lt;/div&gt;</description>
			<pubDate>Fri, 02 Mar 2012 17:58:48 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:PgCon_2012_Developer_Meeting</comments>		</item>
		<item>
			<title>User:Kgrittn</title>
			<link>http://wiki.postgresql.org/wiki/User:Kgrittn</link>
			<guid>http://wiki.postgresql.org/wiki/User:Kgrittn</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Serializable transaction isolation level */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about Kevin Grittner and his activities.&lt;br /&gt;
&lt;br /&gt;
[[image:Kevin-Grittner.jpg]]&lt;br /&gt;
&lt;br /&gt;
== Bio ==&lt;br /&gt;
&lt;br /&gt;
Kevin is currently employed as a database administrator with the [http://www.wicourts.gov/about/organization/offices/ccap.htm Consolidated Court Automation Programs (CCAP)] at the [http://www.wicourts.gov Wisconsin Court System]. He's been making a living in the computer industry since 1972. He started a consulting company in 1980 after working in data control, operations, programming, systems analysis, and design. He was finally lured back to employee status in his current position after more than a quarter century of consulting.&lt;br /&gt;
&lt;br /&gt;
Kevin has experience with many types of applications in many types of organizations, businesses, and government. In 1984 he was the architect and primary author of the PROBER database and development system, used by thousands of corrections agencies, health departments, hospitals, and fire departments around the world. (The last known use was a statewide health department which converted off the product in 2006.) He has also developed application development platforms used by clients to speed development, improve standardization, and provide portability.&lt;br /&gt;
&lt;br /&gt;
== Current Work In Process ==&lt;br /&gt;
&lt;br /&gt;
=== Declarative materialized views ===&lt;br /&gt;
&lt;br /&gt;
For the 9.3 release.&lt;br /&gt;
&lt;br /&gt;
== Possible Future Work ==&lt;br /&gt;
&lt;br /&gt;
=== Rewrite tsearch parser to use regular expressions ===&lt;br /&gt;
&lt;br /&gt;
In reviewing a patch to fix some performance problems in the current parser, I became interested in the possibility of rewriting the current state machine implementation with a regular expression implementation.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/200912102005.16560.andres@anarazel.de Re: tsearch parser inefficiency if text includes urls or emails - new version]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B210D9E020000250002D344@gw.wicourts.gov tsearch parser overhaul]&lt;br /&gt;
&lt;br /&gt;
=== Deleted WAL files held open by backends in Linux ===&lt;br /&gt;
&lt;br /&gt;
I wasted some time tracking down an oddity which is more of an annoyance in system administration than a real problem, but might look at cleaning it up to save others the bother of investigating the issue.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/15412.1259630304@sss.pgh.pa.us Deleted WAL files held open by backends in Linux]&lt;br /&gt;
&lt;br /&gt;
=== Temporal data improvements ===&lt;br /&gt;
&lt;br /&gt;
Temporal data handling is weak in SQL in general, and the PostgreSQL implementation seems skewed toward scientific or engineering applications, leaving it weaker than some databases on business applications.  (One example would be clean handling of monthy payment schedules.)  Enhancements to this area might be interesting.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4AE944BD.90809@comcast.net Proposal - temporal contrib module]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/48692c2d0911171231h6ab16a64yc4db35a6e26909e0@mail.gmail.com Re: Timezones (in 8.5?)]&lt;br /&gt;
&lt;br /&gt;
=== LSB script compliance ===&lt;br /&gt;
&lt;br /&gt;
I came up with a pretty LSB compliant script for Linux; however, the community feels that most of the logic handled in the shell in this script should be moved into pg_ctl.  There's a pretty long and winding thread on the topic.  The last version of the script might be useful to identify what issues need to be covered in the current pg_ctl code.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A8C41EC.3080708@agliodbs.com We should Axe /contrib/start-scripts]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A9581A5020000250002A37A@gw.wicourts.gov Linux LSB init script]&lt;br /&gt;
&lt;br /&gt;
=== TOAST improvements ===&lt;br /&gt;
&lt;br /&gt;
There have been a few posts to the lists about specific use cases where TOAST defaults are far from optimal.  Allowing some tuning here might be helpful.&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A6088D50200002500028940@gw.wicourts.gov Higher TOAST compression]&lt;br /&gt;
&lt;br /&gt;
=== Literal and NULL handling anomalies ===&lt;br /&gt;
&lt;br /&gt;
There are a few corner cases where differences between standard and PostgreSQL behaviors with string literals or NULLs astonish those new to PostgreSQL.  These aren't easily solved, but might be worth the effort.&lt;br /&gt;
&lt;br /&gt;
=== README files ===&lt;br /&gt;
&lt;br /&gt;
Language is not always as readable as it could be, and some files still largely read like proposals for features which are now implemented.&lt;/div&gt;</description>
			<pubDate>Wed, 29 Feb 2012 17:26:04 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/User_talk:Kgrittn</comments>		</item>
		<item>
			<title>Replication, Clustering, and Connection Pooling</title>
			<link>http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling</link>
			<guid>http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Change obsolete references to http://pgpool.projects.postgresql.org/ to new location at http://www.pgpool.net/&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
There are many approaches available to scale PostgreSQL beyond running on a single server.  An outline of the terminology and basic technologies involved is at [http://www.postgresql.org/docs/current/interactive/high-availability.html High Availability and Load Balancing].  There is a [http://momjian.us/main/writings/pgsql/replication.pdf presentation] covering some of these solutions.&lt;br /&gt;
&lt;br /&gt;
There is no one-size fits all replication software.  You have to understand your requirements and how various approaches fit into that.  For example, here are two extremes in the replication problem space:&lt;br /&gt;
&lt;br /&gt;
* You have a few servers connected to a local network you want to always keep current for failover and load-balancing purposes.  Here you would be considering solutions that are synchronous, eager, and therefore conflict-free.&lt;br /&gt;
* Your users take a local copy of the database with them on laptops when they leave the office, make changes while they are away, and need to merge those with the main database when they return.  Here you'd want an asynchronous, lazy replication approach, and will be forced to consider how to handle conflicts in cases where the same record has been modified both on the master server and on a local copy.&lt;br /&gt;
&lt;br /&gt;
These are both database replication problems, but the best way to solve them is very different.  And as you can see from these examples, replication has a lot of specific terminology that you'll have to understand to figure out what class of solution makes sense for your requirements.  A great source for this background is in the&lt;br /&gt;
[http://www.postgres-r.org/documentation/terms Postgres-R Terms and Definitions for Database Replication].  The main theoretical topic it doesn't mention is how to resolve conflict resolution in lazy replication cases like the laptop situation, which involves voting and similar schemes.&lt;br /&gt;
&lt;br /&gt;
==Features in the Core of PostgreSQL==&lt;br /&gt;
The PostgreSQL core team considered replication and clustering technology outside the scope of the main project's focus but this changed in Spring 2008, see the [http://archives.postgresql.org/pgsql-hackers/2008-05/msg00913.php Core Team's statement].&lt;br /&gt;
&lt;br /&gt;
*[[Hot Standby]]/[[Streaming Replication]] is available as of PostgreSQL 9.0 and provides asynchronous binary replication to one or more standbys.  Standbys may also become hot standbys meaning they can be queried as a read-only database.  This is the fastest type of replication available as WAL data is sent immediately rather than waiting for a whole segment to be produced and shipped.&lt;br /&gt;
&lt;br /&gt;
*[[Warm Standby]]/Log Shipping is a HA solution which 'replicates' a database cluster to an archive or a warm (can be brought up quickly, but not available for querying) standby server.  Overhead is very low and it's easy to set up.  This is a simple and appropriate solution if all you care about is continuous backup and short failover times.&lt;br /&gt;
&lt;br /&gt;
==Comparison matrix==&lt;br /&gt;
&lt;br /&gt;
This page is being overhauled at [[Clustering]]&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;1&amp;quot; cellspacing=&amp;quot;0&amp;quot; style=&amp;quot;font-size: 85%; border: gray solid 1px; border-collapse: collapse; text-align: center; width: 100%; table-layout: fixed;&amp;quot;&lt;br /&gt;
|- style=&amp;quot;background: #ececec&amp;quot;&lt;br /&gt;
! Program&lt;br /&gt;
! License&lt;br /&gt;
! Maturity&lt;br /&gt;
! Replication Method&lt;br /&gt;
! Sync&lt;br /&gt;
! Connection Pooling&lt;br /&gt;
! Load Balancing&lt;br /&gt;
! Query Partitioning&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;text-align:block;&amp;quot; bgcolor=&amp;quot;#ececec&amp;quot; | [http://pgcluster.projects.postgresql.org/ PGCluster]&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | BSD&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | See version details on site&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Master-Master&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Synchronous&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ddffdd&amp;quot; | Yes&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;text-align:block;&amp;quot; bgcolor=&amp;quot;#ececec&amp;quot; | '''pgpool-I'''&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | BSD&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Stable&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Statement-Based Middleware&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Synchronous&lt;br /&gt;
| bgcolor=&amp;quot;#ddffdd&amp;quot; | Yes&lt;br /&gt;
| bgcolor=&amp;quot;#ddffdd&amp;quot; | Yes&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;text-align:block;&amp;quot; bgcolor=&amp;quot;#ececec&amp;quot; | [http://www.pgpool.net/ pgpool-II]&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | BSD&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Recent release&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Statement-Based Middleware&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Synchronous&lt;br /&gt;
| bgcolor=&amp;quot;#ddffdd&amp;quot; | Yes&lt;br /&gt;
| bgcolor=&amp;quot;#ddffdd&amp;quot; | Yes&lt;br /&gt;
| bgcolor=&amp;quot;#ddffdd&amp;quot; | Yes&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;text-align:block;&amp;quot; bgcolor=&amp;quot;#ececec&amp;quot; | [http://slony.info/ slony-I]&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | BSD&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Stable&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Master-Slave&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Asynchronous&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;text-align:block;&amp;quot; bgcolor=&amp;quot;#ececec&amp;quot; | [http://bucardo.org/ Bucardo]&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | BSD&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Stable&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Master-Master, Master-Slave&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Asynchronous&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;text-align:block;&amp;quot; bgcolor=&amp;quot;#ececec&amp;quot; | [http://wiki.postgresql.org/wiki/Skytools Londiste]&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | BSD&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Stable&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Master-Slave&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Asynchronous&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;text-align:block;&amp;quot; bgcolor=&amp;quot;#ececec&amp;quot; | [http://www.commandprompt.com/products/mammothreplicator/ Mammoth]&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | BSD&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Stable&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Master-Slave&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Asynchronous&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;text-align:block;&amp;quot; bgcolor=&amp;quot;#ececec&amp;quot; | [http://www.rubyrep.org/ rubyrep]&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | MIT&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Recent Release&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Master-Master, Master-Slave&lt;br /&gt;
| bgcolor=&amp;quot;#ffffaa&amp;quot; | Asynchronous&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
| bgcolor=&amp;quot;#ffaaaa&amp;quot; | No&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Replication==&lt;br /&gt;
&lt;br /&gt;
Aside from Warm Standby, mentioned above...&lt;br /&gt;
&lt;br /&gt;
*Slony-I: Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size).  with reasonable sysadmin you can implement failover system yourself.  regarding communications, you can cascade the replication to reduce load on the master.  If you were implementing a large replication cluster, this would probably be a good idea.  Slony is powerful, trigger based, and highly configurable.&lt;br /&gt;
&lt;br /&gt;
* PGCluster:  PGCluster (which, incidentally, is not the same as PGCluster-II, a shared-disk solution), which does synchronous multimaster replication.  Two single-points failure spots, load balancer and the data replicator.  The project has historically looked a bit dead, but they just released a new version and http://pgfoundry.org/projects/pgcluster is up to date (at least downloads page)  One major downside to PGCluster is that it uses a modified version of PostgreSQL, and it usually lags a few releases behind.&lt;br /&gt;
&lt;br /&gt;
* http://www.pgpool.net/ pgpool 1/2 is a reasonable solution.  it's statement level replication, which has some downsides, but is good for certain things.  pgpool 2 has a neat distributed table mechanism which is interesting.  You might want to be looking here if you have extremely high ratios of read to write but need to service a huge transaction volume.  Supports load-balancing and replication by implementing a proxy that duplicates all updates to all slaves. It can partition data by doing this, and it can semi-intelligently route queries to the appropriate servers.&lt;br /&gt;
&lt;br /&gt;
* &amp;quot;Mammoth Replicator&amp;quot; - BSD - http://www.commandprompt.com/products/mammothreplicator/ - Former proprietary solution, now open source. Uses a central logging process to distribute data changes amongst nodes. Essentially a fork of Postgres, as the changes are written directly into the backend. &lt;br /&gt;
&lt;br /&gt;
* &amp;quot;Bucardo&amp;quot; - BSD License - http://bucardo.org/ - Trigger-based, asynchronous, multi-master or master-slave, written using plperl.&lt;br /&gt;
&lt;br /&gt;
* Cybertec, an Austrian company, offers a proprietary packaging of PGCluster. They simply call it PostgreSQL Multimaster-Replication, see http://www.cybertec.at.&lt;br /&gt;
&lt;br /&gt;
* [[Londiste_Tutorial|Londiste]], a part of [[Skytools]] (https://developer.skype.com/SkypeGarage/DbProjects/SkyTools) which is a collection of replication tools from the Skype people. Purports to be simpler to use than Slony.&lt;br /&gt;
&lt;br /&gt;
* [http://www.continuent.com/index.php?option=com_content&amp;amp;task=view&amp;amp;id=212&amp;amp;Itemid=169 Continuent uni/cluster], proprietary and the related Sequoia (jdbc, formerly known as c-jdbc)&lt;br /&gt;
&lt;br /&gt;
* [http://www.postgres-r.org Postgres-R] is still in development. It features eager and thus conflict-free, but async multi-master replication.&lt;br /&gt;
&lt;br /&gt;
* [http://symmetricds.codehaus.org/ SymmetricDS] is an open-source, web-enabled, database independent, data synchronization software application. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outages. Supports several relational databases, including PostgreSQL. Licensed under Lesser GPL (LGPL).&lt;br /&gt;
&lt;br /&gt;
* DRBD (http://www.drbd.org/), a device driver that replicates disk blocks to other nodes. This works for failover only, not for scaling reads. Easy migration of devices if combined with an NFS export.&lt;br /&gt;
&lt;br /&gt;
* [http://sourceforge.net/projects/daffodilreplica/ Daffodil Replicator]. Supports several relational databases, including PostgreSQL. Licensed under GPL.&lt;br /&gt;
&lt;br /&gt;
* &amp;quot;RubyRep&amp;quot; - MIT License - http://www.rubyrep.org/ - Ruby based, asynchronous, multi-master replication system, which supports Postgres and MySQL.&lt;br /&gt;
&lt;br /&gt;
* &amp;quot;pg_comparator&amp;quot; - BSD License - http://pgfoundry.org/projects/pg-comparator/ - Perl-based, table-level async master-slave &amp;quot;diff&amp;quot; and &amp;quot;patch&amp;quot; method of replication.  Low configuration overhead.&lt;br /&gt;
&lt;br /&gt;
===Inactive projects===&lt;br /&gt;
* Slony-II&lt;br /&gt;
* PGReplication&lt;br /&gt;
&lt;br /&gt;
==Clustering==&lt;br /&gt;
&lt;br /&gt;
* [http://www.greenplum.com/index.php?page=greenplum-database Greenplum Database] (formerly Bizgres MPP), proprietary. Not so much a replication solution as a way to parallelize queries, and targeted at the data warehousing crowd. Similar to ExtenDB, but tightly integrated with PostgreSQL.&lt;br /&gt;
&lt;br /&gt;
*[http://www.enterprisedb.com/products/gridsql.do GridSQL for EnterpriseDB Advanced Server] (formerly ExtenDB) &lt;br /&gt;
&lt;br /&gt;
*sequoia (jdbc, formerly known as c-jdbc)&lt;br /&gt;
&lt;br /&gt;
* [[PL/Proxy]] - database partitioning system implemented as PL language.&lt;br /&gt;
&lt;br /&gt;
*[http://db.cs.yale.edu/hadoopdb/hadoopdb.html HadoopDB] - A MapReduce layer put in front of a cluster of postgres back end servers.   Shared-nothing clustering.&lt;br /&gt;
&lt;br /&gt;
==Connection Pooling and Acceleration==&lt;br /&gt;
&lt;br /&gt;
Connection pooling programs let you reduce database-related overhead when it's the sheer number of physical connections dragging performance down.  This is particularly important on Windows, where system limitations prevent large number of connections; see &amp;quot;I cannot run with more than about 125 connections at once&amp;quot; in the [http://www.postgresql.org/docs/faqs.FAQ_windows.html Windows FAQ].  It's also vital for web applications where the number of connections can get very large.&lt;br /&gt;
&lt;br /&gt;
Some programs that implement connection pooling are:&lt;br /&gt;
* [[PgBouncer]]&lt;br /&gt;
* [http://www.pgpool.net/ pgpool]&lt;br /&gt;
&lt;br /&gt;
Some people also or alternately use [http://www.danga.com/memcached/ memcached] in various ways to reduce the work the database handles directly by caching popular data.&lt;br /&gt;
&lt;br /&gt;
==Credits==&lt;br /&gt;
&lt;br /&gt;
Sources for the initial information on this page include:&lt;br /&gt;
*[http://archives.postgresql.org/pgsql-performance/2007-06/msg00264.php replication thread]&lt;br /&gt;
*[http://archives.postgresql.org/pgsql-general/2007-08/msg00085.php pgpool2 vs sequoia]&lt;br /&gt;
*[http://archives.postgresql.org/pgsql-hackers/2006-10/msg00810.php Postgresql Caching]&lt;br /&gt;
&lt;br /&gt;
A existing page covering this topic in German is at http://burger-ag.de/postgresql_replikation.whtml  It translates pretty well through [http://babelfish.altavista.com/ Babelfish].&lt;br /&gt;
&lt;br /&gt;
Sources for more information located but not yet integrated into here:&lt;br /&gt;
* [http://bristlecone.continuent.org/uploads/bristlecone/HomePage/PG_East-Scale-Out-Benchmarks_FINAL2.pdf Portable Scale-Out Benchmarks for PostgreSQL] by Robert Hodges&lt;br /&gt;
* [http://www.fastware.com.au/docs/PostgreSQL_HighAvailability.pdf High Availability and PostgreSQL] by Gavin Sherry&lt;br /&gt;
&lt;br /&gt;
[[Category:Replication]][[Category:Administration]][[Category:Performance]][[Category:Clustering]]&lt;/div&gt;</description>
			<pubDate>Mon, 19 Dec 2011 17:58:04 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Replication,_Clustering,_and_Connection_Pooling</comments>		</item>
		<item>
			<title>SSI/fr</title>
			<link>http://wiki.postgresql.org/wiki/SSI/fr</link>
			<guid>http://wiki.postgresql.org/wiki/SSI/fr</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Rapport de Dépôt */ Correct typo.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Languages}}&lt;br /&gt;
&lt;br /&gt;
Documentation de la «Serializable Snapshot Isolation» (Isolation par Instantanés Sérialisables, ou SSI) dans PostgreSQL, comparée à la «Snapshot Isolation» (Isolation par Instantanés, ou SI). Celles-ci correspondent respectivement aux niveaux d'isolation de transaction SERIALIZABLE et REPEATABLE READ dans PostgreSQL, à partir de la version 9.1.&lt;br /&gt;
&lt;br /&gt;
== Aperçu ==&lt;br /&gt;
&lt;br /&gt;
Avec de vraies transactions sérialisables, si vous pouvez prouver que votre transaction fera ce qui est prévu si il n'y a aucune transaction concurrente, elle fera ce qui est prévu quelles que soient les autres transactions sérialisables qui s'exécuteront en même temps qu'elle, ou sera annulée pour erreur de sérialisation.&lt;br /&gt;
&lt;br /&gt;
Ce document montre les problèmes qui peuvent se produire avec certaines combinaisons de transactions au niveau d'isolation de transaction REPEATABLE READ, et comment elles sont évitées avec le niveau d'isolation SERIALIZABLE, à partir de PostgreSQL 9.1.&lt;br /&gt;
&lt;br /&gt;
Ce document est destiné au programmeur d'applications ou à l'administrateur de bases de données. Pour les détails sur l'implémentation de SSI, voyez la page de Wiki [[Serializable]]. Pour plus d'informations sur comment utiliser ce niveau d'isolation, voyez  [http://docs.postgresql.fr/current/transaction-iso.html#XACT-SERIALIZABLE la documentation PostgreSQL courante].&lt;br /&gt;
&lt;br /&gt;
== Exemples ==&lt;br /&gt;
&lt;br /&gt;
Dans les environnements qui évitent de protéger leur intégrité en mettant en place des verrous bloquants, il sera fréquent que la base soit configurée (dans postgresql.conf) avec:&lt;br /&gt;
 default_transaction_isolation = 'serializable'&lt;br /&gt;
Pour cette raison, tous les exemples ont été effectués avec ce paramétrage, ce qui a évité de polluer les exemples en se contentant d'un simple begin plutôt que de déclarer explicitement le niveau d'isolation pour chaque transaction.&lt;br /&gt;
&lt;br /&gt;
=== Write Skew Simple (Écriture Faussée Simple?) ===&lt;br /&gt;
&lt;br /&gt;
Quand deux transactions concurrentes déterminent chacune ce qu'elles écrivent en lisant des données qui se chevauchent avec des données que l'autre modifie, on peut se retrouver dans un état qui ne devrait pas apparaître si une des deux s'était exécutée avant l'autre. C'est un phénomène connu sous le nom de ''write skew'', et c'est la forme la plus simple de défaut de sérialisation contre laquelle SSI vous protège.&lt;br /&gt;
&lt;br /&gt;
Quand il y a write skew dans SSI, les deux transactions se déroulent jusqu'à ce que l'une valide. La première à valider gagne, et l'autre transaction est annulée. La règle du &amp;quot;le premier à valider gagne&amp;quot; garantit que du travail peut avoir lieu sur la base et que la transaction qui est annulée puisse être tentée à nouveau immédiatement.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Noir et Blanc ====&lt;br /&gt;
&lt;br /&gt;
Dans ce cas, il y a des enregistrement avec une colonne couleur contenant 'blanc' ou 'noir'. Deux utilisateurs essayent simultanément de convertir tous les enregistrements vers une couleur unique, mais chacun dans une direction opposée. Un veut tout passer tous les blancs en noir, et l'autre tous les noirs en blanc.&lt;br /&gt;
&lt;br /&gt;
L'exemple peut être mis en place avec ces ordres: &lt;br /&gt;
 create table points&lt;br /&gt;
   (&lt;br /&gt;
     id int not null primary key,&lt;br /&gt;
     couleur text not null&lt;br /&gt;
   );&lt;br /&gt;
 insert into points&lt;br /&gt;
   with x(id) as (select generate_series(1,10))&lt;br /&gt;
   select id, case when id % 2 = 1 then 'noir'&lt;br /&gt;
     else 'blanc' end from x;&lt;br /&gt;
{|&lt;br /&gt;
|+ Exemple Noir et Blanc&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 begin;&lt;br /&gt;
 update points set couleur = 'noir'&lt;br /&gt;
   where couleur = 'blanc';&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 update points set couleur = 'blanc'&lt;br /&gt;
   where couleur = 'noir';&lt;br /&gt;
À ce moment, une des deux transaction est condamnée à mourir.&lt;br /&gt;
 commit;&lt;br /&gt;
Le premier à valider gagne.&lt;br /&gt;
 select * from points order by id;&lt;br /&gt;
&lt;br /&gt;
  id | couleur&lt;br /&gt;
 ----+-------&lt;br /&gt;
   1 | blanc&lt;br /&gt;
   2 | blanc&lt;br /&gt;
   3 | blanc&lt;br /&gt;
   4 | blanc&lt;br /&gt;
   5 | blanc&lt;br /&gt;
   6 | blanc&lt;br /&gt;
   7 | blanc&lt;br /&gt;
   8 | blanc&lt;br /&gt;
   9 | blanc&lt;br /&gt;
  10 | blanc&lt;br /&gt;
 (10 rows)&lt;br /&gt;
Celle-ci s'est exécutée comme si elle était seule.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
Une erreur de sérialisation. On annule et on réessaye.&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 update points set couleur = 'noir'&lt;br /&gt;
   where couleur = 'blanc';&lt;br /&gt;
 commit;&lt;br /&gt;
Il n'y a pas de transaction concurrente pour gêner.&lt;br /&gt;
 select * from points order by id;&lt;br /&gt;
&lt;br /&gt;
  id | couleur&lt;br /&gt;
 ----+-------&lt;br /&gt;
   1 | noir&lt;br /&gt;
   2 | noir&lt;br /&gt;
   3 | noir&lt;br /&gt;
   4 | noir&lt;br /&gt;
   5 | noir&lt;br /&gt;
   6 | noir&lt;br /&gt;
   7 | noir&lt;br /&gt;
   8 | noir&lt;br /&gt;
   9 | noir&lt;br /&gt;
  10 | noir&lt;br /&gt;
 (10 rows)&lt;br /&gt;
La transaction s'est exécutée seule, après l'autre.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Données en intersection ====&lt;br /&gt;
&lt;br /&gt;
Cet exemple est tiré de la documentation PostgreSQL. Deux transactions concurrentes lisent des données, et chacune utilise ces données pour mettre à jour l'ensemble lu par l'autre. Un exemple simple, même si un peu artificiel, de données faussées.&lt;br /&gt;
&lt;br /&gt;
L'exemple peut être mis en place avec ces ordres:&lt;br /&gt;
 CREATE TABLE mytab&lt;br /&gt;
 (&lt;br /&gt;
   class int NOT NULL,&lt;br /&gt;
   value int NOT NULL&lt;br /&gt;
 );&lt;br /&gt;
 INSERT INTO mytab VALUES&lt;br /&gt;
 (1, 10), (1, 20), (2, 100), (2, 200);&lt;br /&gt;
{|&lt;br /&gt;
|+ Exemple de données en intersection&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 BEGIN;&lt;br /&gt;
 SELECT SUM(value) FROM mytab WHERE class = 1;&lt;br /&gt;
&lt;br /&gt;
  sum&lt;br /&gt;
 -----&lt;br /&gt;
   30&lt;br /&gt;
 (1 row)&lt;br /&gt;
&lt;br /&gt;
 INSERT INTO mytab VALUES (2, 30);&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 BEGIN;&lt;br /&gt;
 SELECT SUM(value) FROM mytab WHERE class = 2;&lt;br /&gt;
&lt;br /&gt;
  sum&lt;br /&gt;
 -----&lt;br /&gt;
  300&lt;br /&gt;
 (1 row)&lt;br /&gt;
&lt;br /&gt;
 INSERT INTO mytab VALUES (1, 300);&lt;br /&gt;
Chaque transaction a modifié ce que l'autre transaction aurait lu. Si les deux étaient autorisées à valider, le comportement sérialisable ne serait plus respecté, parce que si elles avaient été exécutées une seule à la fois, une des transactions aurait vu l'INSERT que l'autre a validé. Nous attendons qu'une des transactions ait validé avant d'annuler quoi que ce soit, toutefois, pour garantir que des traitements soient effectués et éviter que le système ne s'effondre.&lt;br /&gt;
 COMMIT;&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 COMMIT;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
Donc, maintenant nous annulons la transaction en échec et nous la réessayons depuis le début.&lt;br /&gt;
 ROLLBACK;&lt;br /&gt;
 BEGIN;&lt;br /&gt;
 SELECT SUM(value) FROM mytab WHERE class = 1;&lt;br /&gt;
&lt;br /&gt;
  sum&lt;br /&gt;
 -----&lt;br /&gt;
  330&lt;br /&gt;
 (1 row)&lt;br /&gt;
&lt;br /&gt;
 INSERT INTO mytab VALUES (2, 330);&lt;br /&gt;
 COMMIT;&lt;br /&gt;
Cela réussit, et le résultat est cohérent avec une exécution sérialisée des transactions.&lt;br /&gt;
 SELECT * FROM mytab;&lt;br /&gt;
&lt;br /&gt;
  class | value&lt;br /&gt;
 -------+-------&lt;br /&gt;
      1 |    10&lt;br /&gt;
      1 |    20&lt;br /&gt;
      2 |   100&lt;br /&gt;
      2 |   200&lt;br /&gt;
      1 |   300&lt;br /&gt;
      2 |   330&lt;br /&gt;
 (6 rows)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Protection contre le Découvert ====&lt;br /&gt;
&lt;br /&gt;
Le cas hypothétique est celui d'une banque qui autorise ses clients à retirer de l'argent jusqu'au total de tout ce qu'ils ont sur tous leurs comptes. La banque transfèrera ensuite automatiquement les fonds au besoin pour terminer la journée avec un solde positif sur chaque compte. À l'intérieur d'une seule transaction, on vérifie que la somme de tous les comptes dépasse la somme requise.&lt;br /&gt;
&lt;br /&gt;
Quelqu'un essaye d'être malin et de piéger la banque en soumettant deux retraits de 900$ sur deux comptes ayant chacun 500$ de solde simultanément. Au niveau d'isolation de transaction REPEATABLE READ, cela pourrait marcher; mais si le niveau d'isolation de transaction SERIALIZABLE est utilisé, SSI détectera une &amp;quot;structure dangereuse&amp;quot; dans le schéma de lecture/écriture et rejettera une des deux transactions.&lt;br /&gt;
&lt;br /&gt;
Cet exemple peut être mis en place avec ces ordres:&lt;br /&gt;
&lt;br /&gt;
 create table compte&lt;br /&gt;
   (&lt;br /&gt;
     nom text not null,&lt;br /&gt;
     type text not null,&lt;br /&gt;
     solde money not null default '0.00'::money,&lt;br /&gt;
     primary key (nom, type)&lt;br /&gt;
   );&lt;br /&gt;
 insert into compte values&lt;br /&gt;
   ('kevin','epargne', 500),&lt;br /&gt;
   ('kevin','courant', 500);&lt;br /&gt;
 &lt;br /&gt;
{|&lt;br /&gt;
|+ Exemple de Protection contre le Découvert&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 begin;&lt;br /&gt;
 select type, solde from compte&lt;br /&gt;
   where nom = 'kevin';&lt;br /&gt;
&lt;br /&gt;
    type    | solde&lt;br /&gt;
 -----------+---------&lt;br /&gt;
  epargne   | $500.00&lt;br /&gt;
  courant   | $500.00&lt;br /&gt;
 (2 rows)&lt;br /&gt;
Le total est de $1000, un retrait de $900 est donc permis.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 select type, solde from compte&lt;br /&gt;
   where nom = 'kevin';&lt;br /&gt;
&lt;br /&gt;
    type    | solde&lt;br /&gt;
 -----------+---------&lt;br /&gt;
  epargne   | $500.00&lt;br /&gt;
  courant   | $500.00&lt;br /&gt;
 (2 rows)&lt;br /&gt;
Le total est de $1000, un retrait de $900 est donc permis.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 update compte&lt;br /&gt;
   set solde = solde - 900::money&lt;br /&gt;
   where nom = 'kevin' and type = 'epargne';&lt;br /&gt;
Jusqu'ici tout va bien.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 update compte&lt;br /&gt;
   set solde = solde - 900::money&lt;br /&gt;
   where nom = 'kevin' and type = 'courant';&lt;br /&gt;
Maintenant nous avons un problème. Cela ne peut co-exister avec l'activité de l'autre transaction. Nous n'annulons pas encore, parce que la transaction échouerait avec les mêmes conflits si on la réessayait. Le premier à valider va gagner, et l'autre échouera quand elle essayera de continuer après cela.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 commit;&lt;br /&gt;
Celle ci a validé la première. Son travail est enregistré.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
Cette transaction n'a pas réussi à retirer l'argent.&lt;br /&gt;
Maintenant nous l'annulons et réessayons la transaction.&lt;br /&gt;
&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 select type, solde from compte&lt;br /&gt;
   where nom = 'kevin';&lt;br /&gt;
&lt;br /&gt;
    type    | solde&lt;br /&gt;
 -----------+----------&lt;br /&gt;
  epargne   | -$400.00&lt;br /&gt;
  courant   |  $500.00&lt;br /&gt;
 (2 rows)&lt;br /&gt;
On voit qu'il y a un solde net de $100. Cette demande de $900 sera rejetée par l'application.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Trois Transactions ou Plus ===&lt;br /&gt;
&lt;br /&gt;
Des anomalies de sérialisation peuvent résulter de motifs plus complexes d'accès, impliquant trois transactions ou plus.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Couleurs Primaires ====&lt;br /&gt;
&lt;br /&gt;
C'est similaire à l'exemple &amp;quot;Blanc et Noir&amp;quot; précédent, à la différence que nous utilisons les trois couleurs primaires. Une transaction essaye de passer le rouge à jaune, la suivante le jaune au bleu, et la troisième le bleu au rouge. Si ces transactions étaient exécutées une seule à la fois, on aurait à la fin de l'exécution deux des trois couleurs, en fonction de l'ordre d'exécution. Si deux d'entre elles sont exécutées simultanément, celle essayant de lire les enregistrements mis à jour par l'autre semblera s'exécuter première, puisqu'elle ne verra pas le travail de l'autre transaction, il n'y a donc pas de problème dans ce cas. Que l'autre transaction soit exécutée avant ou après cela, les résultats sont cohérents avec un ordre d'exécution sérialisé.&lt;br /&gt;
&lt;br /&gt;
Si les trois s'exécutent en même temps, il y a un cycle dans l'ordre apparent d'exécution. Une transaction Repeatable Read ne détecterait pas cela, et la table aurait toujours trois couleurs. Une transaction Sérialisable détectera le problème et annulera une des transactions avec une erreur de sérialisation.&lt;br /&gt;
&lt;br /&gt;
L'exemple peut être mis en place avec ces ordres:&lt;br /&gt;
 create table points&lt;br /&gt;
   (&lt;br /&gt;
     id int not null primary key,&lt;br /&gt;
     couleur text not null&lt;br /&gt;
   );&lt;br /&gt;
 insert into points&lt;br /&gt;
   with x(id) as (select generate_series(1,9000))&lt;br /&gt;
   select id, case when id % 3 = 1 then 'rouge'&lt;br /&gt;
     when id % 3 = 2 then 'jaune'&lt;br /&gt;
     else 'blue' end from x;&lt;br /&gt;
 create index points_couleur on points (couleur);&lt;br /&gt;
 analyze points;&lt;br /&gt;
{|&lt;br /&gt;
|+ Primary Colors Example&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
! session 3&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 begin;&lt;br /&gt;
 update points set couleur = 'jaune'&lt;br /&gt;
   where couleur = 'rouge';&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 update points set couleur = 'blue'&lt;br /&gt;
   where couleur = 'jaune';&lt;br /&gt;
|-&lt;br /&gt;
|  ||  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 update points set couleur = 'rouge'&lt;br /&gt;
   where couleur = 'blue';&lt;br /&gt;
À ce point, au moins une des trois transactions est condamnée. Pour garantir que les traitement progressent, on attend qu'une valide. Le commit va réussir, ce qui non seulement garantit que les traitements progressent, mais qu'une tentative de reprendre une transaction échouée n'échouera pas ''sur la même combinaison de transactions''.&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 commit;&lt;br /&gt;
Le premier commit gagne. La session 2 doit échouer à ce point, parce que durant le commit il a été déterminé qu'elle a les plus grandes chances de réussir si réessayée immédiatement.&lt;br /&gt;
 select couleur, count(*) from points&lt;br /&gt;
   group by couleur&lt;br /&gt;
   order by couleur;&lt;br /&gt;
&lt;br /&gt;
  couleur  | count&lt;br /&gt;
 ----------+-------&lt;br /&gt;
  blue     |  3000&lt;br /&gt;
  jaune    |  6000&lt;br /&gt;
 (2 rows)&lt;br /&gt;
Cela semble avoir été exécuté avant les autres mises à jour.&lt;br /&gt;
|-&lt;br /&gt;
|  ||  ||&lt;br /&gt;
 commit;&lt;br /&gt;
Cela fonctionne si on l'essaye à ce moment. Si la session 2 effectue davantage de travail avant, cette transaction pourrait aussi devoir être annulée et réessayée.&lt;br /&gt;
 select couleur, count(*) from points&lt;br /&gt;
   group by couleur&lt;br /&gt;
   order by couleur;&lt;br /&gt;
&lt;br /&gt;
  couleur  | count&lt;br /&gt;
 ----------+-------&lt;br /&gt;
  rouge    |  3000&lt;br /&gt;
  jaune    |  6000&lt;br /&gt;
 (2 rows)&lt;br /&gt;
Elle semble s'être exécutée après la transaction de la session 1.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
Une erreur de sérialisation. Nous annulons et réessayons.&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 update points set couleur = 'blue'&lt;br /&gt;
   where couleur = 'jaune';&lt;br /&gt;
 commit;&lt;br /&gt;
Une nouvelle tentative réussira.&lt;br /&gt;
 select couleur, count(*) from points&lt;br /&gt;
   group by couleur&lt;br /&gt;
   order by couleur;&lt;br /&gt;
&lt;br /&gt;
  couleur | count&lt;br /&gt;
 ---------+-------&lt;br /&gt;
  blue    |  6000&lt;br /&gt;
  rouge   |  3000&lt;br /&gt;
 (2 rows)&lt;br /&gt;
Elle semble s'être exécutée en dernier, ce qu'elle a d'ailleurs fait.&lt;br /&gt;
|}&lt;br /&gt;
Un point intéressant est que si la session 2 avait tenté de valider après la session 1 et avant la session 3, elle aurait tout de même échoué, et une re-tentative aurait aussi réussi, mais le comportement de la transaction de la session 3 n'est pas déterministe. Elle pourrait avoir réussi, ou avoir reçu une erreur de sérialisation et avoir nécessité d'être rejouée.&lt;br /&gt;
&lt;br /&gt;
C'est parce que le verrouillage de prédicat utilisé par le mécanisme de détection de conflit s'appuie sur les pages et enregistrement effectivement accédés, et il y a un facteur aléatoire utilisé lors de l'insertion des entrées d'index qui ont des clés égales, afin de réduire la contention; donc même avec des séquences d'évènements identiques il est toujours possible de voir des différences sur où les erreurs de sérialisation se produisent. C'est pour cela qu'il est important, quand on s'appuie sur les transactions sérialisables pour gérer la concurrence, d'avoir un système généralisé permettant d'identifier les erreurs de sérialisation et de rejouer les transactions depuis leur début.&lt;br /&gt;
&lt;br /&gt;
Il convient aussi de noter que si la session 2 avait validé la seconde tentative de transaction avant que la session 3 ait validé sa transaction, toute requête ultérieure qui aurait vu des enregistrements mis à jour de jaune à bleu (et validés) aurait, de façon déterministe, fait échouer la transaction de la session 3, parce que ces enregistrements ne seraient pas des enregistrements que la session 3 verraient comme bleu et mettraient à jour à rouge. Pour que la transaction 3 réussisse, elle doit pouvoir être considérée comme ayant été exécutée avant la transaction validée de la session 2. Par conséquent, exposer un état dans lequel le travail de la transaction de la session 2 est visible, mais pas le travail de la transaction de la session 3 signifie que la transaction de la session 3 doit échouer. L'acte d' ''observer'' un état récemment modifié de la base peut entraîner des erreurs de sérialisation. Cela sera exploré plus avant dans d'autres exemples.&lt;br /&gt;
&lt;br /&gt;
=== Mettre en place des règles métier dans des triggers ===&lt;br /&gt;
&lt;br /&gt;
Si toutes les transactions sont sérialisables, des règles métier peuvent être vérifiées par des triggers sans les problèmes associés avec les autres niveaux d'isolation de transactions. Quand une contrainte déclarative fonctionne, elle sera en règle générale plus rapide, plus simple à implémenter et à maintenir, et moins sujette à bug - les triggers ne devront donc être utilisés comme suit que quand une contrainte déclarative ne fonctionnera pas.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Contraintes similaires à de l'unicité ====&lt;br /&gt;
&lt;br /&gt;
Imaginons que vous vouliez quelque chose de similaire à une contrainte unique, mais en un peu plus compliqué. Pour cet exemple, nous voulons l'unicité des six premiers caractères de la colonne texte.&lt;br /&gt;
&lt;br /&gt;
Cet exemple peut être mis en place avec les ordres suivants:&lt;br /&gt;
 create table t (id int not null, val text not null);&lt;br /&gt;
 with x (n) as (select generate_series(1,10000))&lt;br /&gt;
   insert into t select x.n, md5(x.n::text) from x;&lt;br /&gt;
 alter table t add primary key(id);&lt;br /&gt;
 create index t_val on t (val);&lt;br /&gt;
 vacuum analyze t;&lt;br /&gt;
 create function t_func()&lt;br /&gt;
   returns trigger&lt;br /&gt;
   language plpgsql as $$&lt;br /&gt;
 declare&lt;br /&gt;
   st text;&lt;br /&gt;
 begin&lt;br /&gt;
   st := substring(new.val from 1 for 6);&lt;br /&gt;
   if tg_op = 'UPDATE' and substring(old.val from 1 for 6) = st then&lt;br /&gt;
     return new;&lt;br /&gt;
   end if;&lt;br /&gt;
   if exists (select * from t where val between st and st || 'z') then&lt;br /&gt;
     raise exception 't.val pas unique sur les six premiers caractères: &amp;quot;%&amp;quot;', st;&lt;br /&gt;
   end if;&lt;br /&gt;
   return new;&lt;br /&gt;
 end;&lt;br /&gt;
 $$;&lt;br /&gt;
 create trigger t_trig&lt;br /&gt;
   before insert or update on t&lt;br /&gt;
   for each row execute procedure t_func();&lt;br /&gt;
&lt;br /&gt;
Pour vérifier que le trigger fait bien respecter la règle métier quand il n'y a pas de problème de concurrence, sur une connexion unique:&lt;br /&gt;
&lt;br /&gt;
 insert into t values (-1, 'this old dog');&lt;br /&gt;
 insert into t values (-2, 'this old cat');&lt;br /&gt;
&lt;br /&gt;
 ERROR:  t.val pas unique sur les six premiers caractères: &amp;quot;this o&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Essayons maintenant avec deux sessions concurrentes.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
|+ Exemple de contrainte similaire à de l'unicité&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 begin;&lt;br /&gt;
 insert into t values (-3, 'the river flows');&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 insert into t values (-4, 'the right stuff');&lt;br /&gt;
Cela fonctionne pour le moment, parce que le travail de l'autre transaction n'est pas visible de cette transaction, mais les deux transactions ne peuvent pas valider sans violer la règle métier.&lt;br /&gt;
 commit;&lt;br /&gt;
Le premier à valider gagne. La transaction est garantie.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Un commit ici échouerait, ainsi que n'importe quel autre ordre qu'on tenterait d'exécuter dans cette transaction condamnée.&lt;br /&gt;
 select * from t where id &amp;lt; 0;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Canceled on identification as a pivot,&lt;br /&gt;
          during conflict out checking.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
&lt;br /&gt;
Comme il s'agit d'une erreur de sérialisation, la transaction devrait être réessayée.&lt;br /&gt;
&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 insert into t values (-3, 'the river flows');&lt;br /&gt;
&lt;br /&gt;
Lors de la nouvelle tentative, nous recevons une erreur plus utile à l'utilisateur.&lt;br /&gt;
&lt;br /&gt;
 ERROR:  t.val pas unique sur les six premiers caractères: &amp;quot;the ri&amp;quot;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Contraintes similaires à des clés étrangères ====&lt;br /&gt;
&lt;br /&gt;
Quelquefois deux tables doivent avoir un lien très similaire à une relation de clé étrangère, mais il y a des critères supplémentaires qui rendrait la clé étrangère insuffisante à traiter la vérification d'intégrité nécessaire. Dans cet exemple un table project contient une référence à la clé d'une table person dans sa propre colonne project_manager, mais une personne ''quelconque'' ne suffira pas; la personne spécifiée doit être un gestionnaire de projet.&lt;br /&gt;
&lt;br /&gt;
On peut mettre en place cet exemple avec les ordres suivants:&lt;br /&gt;
 create table person&lt;br /&gt;
   (&lt;br /&gt;
     person_id int not null primary key,&lt;br /&gt;
     person_name text not null,&lt;br /&gt;
     is_project_manager boolean not null&lt;br /&gt;
   );&lt;br /&gt;
 create table project&lt;br /&gt;
   (&lt;br /&gt;
     project_id int not null primary key,&lt;br /&gt;
     project_name text not null,&lt;br /&gt;
     project_manager int not null&lt;br /&gt;
   );&lt;br /&gt;
 create index project_manager&lt;br /&gt;
   on project (project_manager);&lt;br /&gt;
 &lt;br /&gt;
 create function person_func()&lt;br /&gt;
   returns trigger&lt;br /&gt;
   language plpgsql as $$&lt;br /&gt;
 begin&lt;br /&gt;
   if tg_op = 'DELETE' and old.is_project_manager then&lt;br /&gt;
     if exists (select * from project&lt;br /&gt;
                where project_manager = old.person_id) then&lt;br /&gt;
       raise exception&lt;br /&gt;
         'une personne ne peut être supprimée tant qu''elle est responsable d''un projet';&lt;br /&gt;
     end if;&lt;br /&gt;
   end if;&lt;br /&gt;
   if tg_op = 'UPDATE' then&lt;br /&gt;
     if new.person_id is distinct from old.person_id then&lt;br /&gt;
       raise exception 'il est interdit de modifier person_id';&lt;br /&gt;
     end if;&lt;br /&gt;
     if old.is_project_manager and not new.is_project_manager then&lt;br /&gt;
       if exists (select * from project&lt;br /&gt;
                  where project_manager = old.person_id) then&lt;br /&gt;
         raise exception&lt;br /&gt;
           'une personne doit rester gestionnaire de projet tant qu''elle est responsable d''un projet';&lt;br /&gt;
       end if;&lt;br /&gt;
     end if;&lt;br /&gt;
   end if;&lt;br /&gt;
   if tg_op = 'DELETE' then&lt;br /&gt;
     return old;&lt;br /&gt;
   else&lt;br /&gt;
     return new;&lt;br /&gt;
   end if;&lt;br /&gt;
 end;&lt;br /&gt;
 $$;&lt;br /&gt;
 create trigger person_trig&lt;br /&gt;
   before update or delete on person&lt;br /&gt;
   for each row execute procedure person_func();&lt;br /&gt;
 &lt;br /&gt;
 create function project_func()&lt;br /&gt;
   returns trigger&lt;br /&gt;
   language plpgsql as $$&lt;br /&gt;
 begin&lt;br /&gt;
   if tg_op = 'INSERT'&lt;br /&gt;
   or (tg_op = 'UPDATE' and new.project_manager &amp;lt;&amp;gt; old.project_manager) then&lt;br /&gt;
     if not exists (select * from person&lt;br /&gt;
                      where person_id = new.project_manager&lt;br /&gt;
                        and is_project_manager) then&lt;br /&gt;
       raise exception&lt;br /&gt;
         'project_manager doit être défini en tant que gestionnaire de projet dans la table person';&lt;br /&gt;
     end if;&lt;br /&gt;
   end if;&lt;br /&gt;
   return new;&lt;br /&gt;
 end;&lt;br /&gt;
 $$;&lt;br /&gt;
 create trigger project_trig&lt;br /&gt;
   before insert or update on project&lt;br /&gt;
   for each row execute procedure project_func();&lt;br /&gt;
 &lt;br /&gt;
 insert into person values (1, 'Kevin Grittner', true);&lt;br /&gt;
 insert into person values (2, 'Peter Parker', true);&lt;br /&gt;
 insert into project values (101, 'parallel processing', 1);&lt;br /&gt;
{|&lt;br /&gt;
|+ Exemple de contrainte similaire à une contrainte de clé étrangère&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
Une personne est mise à jour pour ne plus être un gestionnaire de projet.&lt;br /&gt;
 begin;&lt;br /&gt;
 update person&lt;br /&gt;
   set is_project_manager = false&lt;br /&gt;
   where person_id = 2;&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
En même temps, un projet est mis à jour afin que de rendre cette personne responsable de ce projet.&lt;br /&gt;
 begin;&lt;br /&gt;
 update project&lt;br /&gt;
   set project_manager = 2&lt;br /&gt;
   where project_id = 101;&lt;br /&gt;
Il n'est pas possible de valider les deux. Le premier à valider gagne.&lt;br /&gt;
 commit;&lt;br /&gt;
L'affectation de la personne au projet valide d'abord, ce qui entraîne que l'autre transaction doit maintenant échouer. Si l'autre transaction s'était exécuté à un autre niveau d'isolation, les deux transactions auraient validé, entraînant une violation des règles métier.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
A serialization failure.  We roll back and try again.&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 update person&lt;br /&gt;
   set is_project_manager = false&lt;br /&gt;
   where person_id = 2;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  une personne doit rester gestionnaire de &lt;br /&gt;
         projet tant qu'elle est responsable d'un projet&lt;br /&gt;
Lors de la seconde tentative, nous récupérons un message intelligible.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Transactions en Lecture Seule ===&lt;br /&gt;
&lt;br /&gt;
Bien qu'une transaction en lecture seule ne puisse contribuer à une anomalie qui persiste dans la base, dans le mode Repeatable Read implémenté par le SSI, elle peut &amp;quot;voir&amp;quot; un état qui n'est pas cohérent avec l'exécution sérialisée (une à la fois) des transactions. Une transaction Serializable implémentée avec SSI ne verra jamais ces anomalies transitoires.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== Rapport de Dépôt ====&lt;br /&gt;
&lt;br /&gt;
Une classe générale de problèmes invoquant des transactions en lecture seule est le traitement par lots, où une table contrôle quel lot (batch) est actuellement la cible des insertions. Un lot est fermé en mettant à jour la table de contrôle, point à partir duquel le lot est considéré comme &amp;quot;verrouillé&amp;quot; contre tout changement ultérieur, et le traitement de ce lot se produit.&lt;br /&gt;
&lt;br /&gt;
Ce genre de problématique peut être trouvé de façon concrète dans le traitement de reçus. Des reçus peuvent être ajoutés à un lot identifié par une date de dépôt, ou (si plus d'un dépôt par jour est possible) un numéro de lot de reçu abstrait. Un un point durant la journée, alors que la banque est toujours ouverte, le lot est fermé, un rapport de l'argent reçu est imprimé, et l'argent est emmené à la banque pour y être déposé.&lt;br /&gt;
&lt;br /&gt;
L'exemple peut être mis en place avec ces ordres:&lt;br /&gt;
 create table control&lt;br /&gt;
   (&lt;br /&gt;
     deposit_no int not null&lt;br /&gt;
   );&lt;br /&gt;
 insert into control values (1);&lt;br /&gt;
 create table receipt&lt;br /&gt;
   (&lt;br /&gt;
     receipt_no serial primary key,&lt;br /&gt;
     deposit_no int not null,&lt;br /&gt;
     payee text not null,&lt;br /&gt;
     amount money not null&lt;br /&gt;
   );&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values ((select deposit_no from control), 'Crosby', '100');&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values ((select deposit_no from control), 'Stills', '200');&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values ((select deposit_no from control), 'Nash', '300');&lt;br /&gt;
{|&lt;br /&gt;
|+ Exemple de Rapport de Dépôt&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
Au comptoir de réception, un autre reçu est ajouté au lot courant.&lt;br /&gt;
 begin;  -- T1&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values&lt;br /&gt;
   (&lt;br /&gt;
     (select deposit_no from control),&lt;br /&gt;
     'Young', '100'&lt;br /&gt;
   );&lt;br /&gt;
Cette transaction peut voir son propre insert, mais il n'est pas visible pour les autres transactions jusqu'à sa validation.&lt;br /&gt;
  select * from receipt;&lt;br /&gt;
&lt;br /&gt;
  receipt_no | deposit_no | payee  | amount  &lt;br /&gt;
 ------------+------------+--------+---------&lt;br /&gt;
           1 |          1 | Crosby | $100.00&lt;br /&gt;
           2 |          1 | Stills | $200.00&lt;br /&gt;
           3 |          1 | Nash   | $300.00&lt;br /&gt;
           4 |          1 | Young  | $100.00&lt;br /&gt;
 (4 rows)&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
À peu près au même moment, un superviseur clique sur un bouton pour fermer le lot de reçus.&lt;br /&gt;
 begin;  -- T2&lt;br /&gt;
 select deposit_no from control;&lt;br /&gt;
&lt;br /&gt;
  deposit_no &lt;br /&gt;
 ------------&lt;br /&gt;
           1&lt;br /&gt;
 (1 row)&lt;br /&gt;
L'application note le lot de reçus qui est sur le point d'être fermé, incrémente le numéro de lot, et l'enregistre dans la table de contrôle.&lt;br /&gt;
 update control set deposit_no = 2;&lt;br /&gt;
 commit;&lt;br /&gt;
T1, la transaction qui insère le dernier reçu du dernier lot, n'a pas encore validé, bien que le lot ait été fermé. Si T1 valide avant que quelqu'un ne regarde le contenu du lot, tout va bien. Pour le moment nous n'avons aucun problème; le reçu &amp;quot;a l'air&amp;quot; d'avoir été ajouté avant que le lot ait été fermé. Nous avons un comportement qui est cohérent avec une exécution &amp;quot;une par une&amp;quot; des transactions: T1 -&amp;gt; T2.&lt;br /&gt;
&lt;br /&gt;
Pour le besoin de la démonstration, nous allons déclencher le rapport de dépôt avant que le dernier reçu ne soit validé.&lt;br /&gt;
 begin;  -- T3&lt;br /&gt;
 select * from receipt where deposit_no = 1;&lt;br /&gt;
&lt;br /&gt;
  receipt_no | deposit_no | payee  | amount  &lt;br /&gt;
 ------------+------------+--------+---------&lt;br /&gt;
           1 |          1 | Crosby | $100.00&lt;br /&gt;
           2 |          1 | Stills | $200.00&lt;br /&gt;
           3 |          1 | Nash   | $300.00&lt;br /&gt;
 (3 rows)&lt;br /&gt;
Maintenant nous avons un problème. T3 a été démarré en sachant que T2 a été validée, donc T3 doit être considérée comme ayant exécutée avant T2. (cela aurait pu aussi être vrai si T3 avait été lancé indépendamment et avait lu la table de contrôle, voyant le nouveau deposit_no.) Mais T3 ne peut pas voir le travail de T1, donc T1 a l'air d'avoir été exécuté après T3. Nous avons donc une boucle T1 -&amp;gt; T2 -&amp;gt; T3 -&amp;gt; T1. Et cela poserait problème en termes pratiques; le lot est censé être fermé et immuable, mais une modification apparaîtra sur le tard -- peut être après le voyage à la banque.&lt;br /&gt;
&lt;br /&gt;
Au niveau d'isolation REPEATABLE READ cela se déroulerait sans message d'erreur, sans que l'anomalie ne soit détectée. Au niveau d'isolation SERIALIZABLE une des transactions serait annulée pour préserver l'intégrité du système. Puisqu'une annulation de T3 entraînerait à nouveau la même erreur si T1 était encore active, PostgreSQL va annuler T1, pour qu'une nouvelle tentative ayant lieu immédiatement puisse réussir.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
OK, let's retry.&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;  -- T1 retry&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values&lt;br /&gt;
   (&lt;br /&gt;
     (select deposit_no from control),&lt;br /&gt;
     'Young', '100'&lt;br /&gt;
   );&lt;br /&gt;
&lt;br /&gt;
À quoi ressemble la table reçu maintenant?&lt;br /&gt;
&lt;br /&gt;
 select * from receipt;&lt;br /&gt;
&lt;br /&gt;
  receipt_no | deposit_no | payee  | amount  &lt;br /&gt;
 ------------+------------+--------+---------&lt;br /&gt;
           1 |          1 | Crosby | $100.00&lt;br /&gt;
           2 |          1 | Stills | $200.00&lt;br /&gt;
           3 |          1 | Nash   | $300.00&lt;br /&gt;
           5 |          2 | Young  | $100.00&lt;br /&gt;
 (4 rows)&lt;br /&gt;
&lt;br /&gt;
Le reçu est maintenant dans le nouveau lot, rendant le rapport de dépôt de T3 correct!&lt;br /&gt;
&lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
Plus de problème maintenant.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 commit;&lt;br /&gt;
Cela n'aurait posé aucun problème à n'importe quel moment après le SELECT de T3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
[[Category:Français]]&lt;/div&gt;</description>
			<pubDate>Sun, 27 Nov 2011 00:07:43 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:SSI/fr</comments>		</item>
		<item>
			<title>SSI</title>
			<link>http://wiki.postgresql.org/wiki/SSI</link>
			<guid>http://wiki.postgresql.org/wiki/SSI</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Deposit Report */ Fix typo.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Languages}}&lt;br /&gt;
&lt;br /&gt;
Documentation of Serializable Snapshot Isolation (SSI) in PostgreSQL compared to plain Snapshot Isolation (SI).  These correspond to the SERIALIZABLE and REPEATABLE READ transaction isolation levels, respectively, in PostgreSQL beginning with version 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document shows problems which can occur with certain combinations of transactions at the REPEATABLE READ transaction isolation level, and how they are avoided at the SERIALIZABLE transaction isolation level beginning with PostgreSQL version 9.1.&lt;br /&gt;
 &lt;br /&gt;
This document is oriented toward the application programmer or database administrator.  For internals of the SSI implementation, please see the [[Serializable]] Wiki page.  For more information about how to use this isolation level, see [http://www.postgresql.org/docs/current/interactive/transaction-iso.html#XACT-SERIALIZABLE the current PostgreSQL documentation].&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
In environments which avoid blocking-based integrity protection by counting on SSI, it will be common for the database to be configured (in postgresql.conf) with:&lt;br /&gt;
 default_transaction_isolation = 'serializable'&lt;br /&gt;
Because of this, all examples were tested with this setting, and clutter is avoided by using a simple &amp;quot;begin&amp;quot; without explicitly declaring the transaction isolation level for each transaction.&lt;br /&gt;
&lt;br /&gt;
=== Simple Write Skew ===&lt;br /&gt;
&lt;br /&gt;
When two concurrent transactions each determine what they are writing based on reading a data set which overlaps what the other is writing, you can get a state which could not occur if either had run before the other.  This is known as ''write skew'', and is the simplest form of serialization anomaly against which SSI protects you.&lt;br /&gt;
&lt;br /&gt;
When there is write skew in SSI, both transactions proceed until one transaction commits.  The first committer wins and the other transaction is rolled back.  The &amp;quot;first committer wins&amp;quot; rule ensures that there is progress and that the transaction which is rolled back can immediately be retried.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Black and White ====&lt;br /&gt;
&lt;br /&gt;
In this case there are rows with a color column containing 'black' or 'white'.  Two users concurrently try to make all rows contain matching color values, but their attempts go in opposite directions.  One is trying to update all white rows to black and the other is trying to update all black rows to white.&lt;br /&gt;
 &lt;br /&gt;
If these updates are run serially, all colors will match.  If they are run concurrently in REPEATABLE READ mode, the values will be switched, which is not consistent with any serial order of runs.  If they are run concurrently in SERIALIZABLE mode, SSI will notice the write skew and roll back one of the transactions.&lt;br /&gt;
&lt;br /&gt;
The example can be set up with these statements:&lt;br /&gt;
 create table dots&lt;br /&gt;
   (&lt;br /&gt;
     id int not null primary key,&lt;br /&gt;
     color text not null&lt;br /&gt;
   );&lt;br /&gt;
 insert into dots&lt;br /&gt;
   with x(id) as (select generate_series(1,10))&lt;br /&gt;
   select id, case when id % 2 = 1 then 'black'&lt;br /&gt;
     else 'white' end from x;&lt;br /&gt;
{|&lt;br /&gt;
|+ Black and White Example&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 begin;&lt;br /&gt;
 update dots set color = 'black'&lt;br /&gt;
   where color = 'white';&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 update dots set color = 'white'&lt;br /&gt;
   where color = 'black';&lt;br /&gt;
At this point one transaction or the other is doomed to fail.&lt;br /&gt;
 commit;&lt;br /&gt;
First commit wins.&lt;br /&gt;
 select * from dots order by id;&lt;br /&gt;
&lt;br /&gt;
  id | color&lt;br /&gt;
 ----+-------&lt;br /&gt;
   1 | white&lt;br /&gt;
   2 | white&lt;br /&gt;
   3 | white&lt;br /&gt;
   4 | white&lt;br /&gt;
   5 | white&lt;br /&gt;
   6 | white&lt;br /&gt;
   7 | white&lt;br /&gt;
   8 | white&lt;br /&gt;
   9 | white&lt;br /&gt;
  10 | white&lt;br /&gt;
 (10 rows)&lt;br /&gt;
This one ran as if by itself.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
A serialization failure.  We roll back and try again.&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 update dots set color = 'black'&lt;br /&gt;
   where color = 'white';&lt;br /&gt;
 commit;&lt;br /&gt;
No concurrent transaction to interfere.&lt;br /&gt;
 select * from dots order by id;&lt;br /&gt;
&lt;br /&gt;
  id | color&lt;br /&gt;
 ----+-------&lt;br /&gt;
   1 | black&lt;br /&gt;
   2 | black&lt;br /&gt;
   3 | black&lt;br /&gt;
   4 | black&lt;br /&gt;
   5 | black&lt;br /&gt;
   6 | black&lt;br /&gt;
   7 | black&lt;br /&gt;
   8 | black&lt;br /&gt;
   9 | black&lt;br /&gt;
  10 | black&lt;br /&gt;
 (10 rows)&lt;br /&gt;
This transaction ran by itself, after the other.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Intersecting Data ====&lt;br /&gt;
&lt;br /&gt;
This example is taken from the PostgreSQL documentation.  Two concurrent transactions read data, and each uses it to update the range read by the other.  A simple, though somewhat contrived, example of data skew.&lt;br /&gt;
&lt;br /&gt;
The example can be set up with these statements:&lt;br /&gt;
 CREATE TABLE mytab&lt;br /&gt;
 (&lt;br /&gt;
   class int NOT NULL,&lt;br /&gt;
   value int NOT NULL&lt;br /&gt;
 );&lt;br /&gt;
 INSERT INTO mytab VALUES&lt;br /&gt;
 (1, 10), (1, 20), (2, 100), (2, 200);&lt;br /&gt;
{|&lt;br /&gt;
|+ Intersecting Data Example&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 BEGIN;&lt;br /&gt;
 SELECT SUM(value) FROM mytab WHERE class = 1;&lt;br /&gt;
&lt;br /&gt;
  sum&lt;br /&gt;
 -----&lt;br /&gt;
   30&lt;br /&gt;
 (1 row)&lt;br /&gt;
&lt;br /&gt;
 INSERT INTO mytab VALUES (2, 30);&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 BEGIN;&lt;br /&gt;
 SELECT SUM(value) FROM mytab WHERE class = 2;&lt;br /&gt;
&lt;br /&gt;
  sum&lt;br /&gt;
 -----&lt;br /&gt;
  300&lt;br /&gt;
 (1 row)&lt;br /&gt;
&lt;br /&gt;
 INSERT INTO mytab VALUES (1, 300);&lt;br /&gt;
Each transaction has modified what the other transaction would have read.  If both were allowed to commit, this would break serializable behavior, because if they were run one at a time, one of the transactions would have seen the INSERT the other committed.  We wait for a successful COMMIT of one of the transactions before we roll anything back, though, to ensure progress and prevent thrashing.&lt;br /&gt;
 COMMIT;&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 COMMIT;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
So now we roll back the failed transaction and retry it from the beginning.&lt;br /&gt;
 ROLLBACK;&lt;br /&gt;
 BEGIN;&lt;br /&gt;
 SELECT SUM(value) FROM mytab WHERE class = 1;&lt;br /&gt;
&lt;br /&gt;
  sum&lt;br /&gt;
 -----&lt;br /&gt;
  330&lt;br /&gt;
 (1 row)&lt;br /&gt;
&lt;br /&gt;
 INSERT INTO mytab VALUES (2, 330);&lt;br /&gt;
 COMMIT;&lt;br /&gt;
This succeeds, leaving an end result consistent with a serial execution of the transactions.&lt;br /&gt;
 SELECT * FROM mytab;&lt;br /&gt;
&lt;br /&gt;
  class | value&lt;br /&gt;
 -------+-------&lt;br /&gt;
      1 |    10&lt;br /&gt;
      1 |    20&lt;br /&gt;
      2 |   100&lt;br /&gt;
      2 |   200&lt;br /&gt;
      1 |   300&lt;br /&gt;
      2 |   330&lt;br /&gt;
 (6 rows)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Overdraft Protection ====&lt;br /&gt;
&lt;br /&gt;
The hypothetical case is that there is a bank which allows depositors to withdraw money up to the total of what they have in all accounts.  The bank will later automatically transfer funds as needed to close the day with a positive balance in each account.  Within a single transaction they check that the total of all accounts exceeds the amount requested.&lt;br /&gt;
&lt;br /&gt;
Someone's trying to get clever and trick the bank by submitting $900 withdrawals to two accounts with $500 balances simultaneously.  At the REPEATABLE READ transaction isolation level, that could work; but if the SERIALIZABLE transaction isolation level is used, SSI will detect a &amp;quot;dangerous structure&amp;quot; in the read/write pattern and reject one of the transactions.&lt;br /&gt;
&lt;br /&gt;
The example can be set up with these statements:&lt;br /&gt;
&lt;br /&gt;
 create table account&lt;br /&gt;
   (&lt;br /&gt;
     name text not null,&lt;br /&gt;
     type text not null,&lt;br /&gt;
     balance money not null default '0.00'::money,&lt;br /&gt;
     primary key (name, type)&lt;br /&gt;
   );&lt;br /&gt;
 insert into account values&lt;br /&gt;
   ('kevin','saving', 500),&lt;br /&gt;
   ('kevin','checking', 500);&lt;br /&gt;
 &lt;br /&gt;
{|&lt;br /&gt;
|+ Overdraft Protection Example&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 begin;&lt;br /&gt;
 select type, balance from account&lt;br /&gt;
   where name = 'kevin';&lt;br /&gt;
&lt;br /&gt;
    type   | balance&lt;br /&gt;
 ----------+---------&lt;br /&gt;
  saving   | $500.00&lt;br /&gt;
  checking | $500.00&lt;br /&gt;
 (2 rows)&lt;br /&gt;
The total is $1000, so a $900 withdrawal is OK.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 select type, balance from account&lt;br /&gt;
   where name = 'kevin';&lt;br /&gt;
&lt;br /&gt;
    type   | balance&lt;br /&gt;
 ----------+---------&lt;br /&gt;
  saving   | $500.00&lt;br /&gt;
  checking | $500.00&lt;br /&gt;
 (2 rows)&lt;br /&gt;
The total is $1000, so a $900 withdrawal is OK.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 update account&lt;br /&gt;
   set balance = balance - 900::money&lt;br /&gt;
   where name = 'kevin' and type = 'saving';&lt;br /&gt;
So far everything's OK.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 update account&lt;br /&gt;
   set balance = balance - 900::money&lt;br /&gt;
   where name = 'kevin' and type = 'checking';&lt;br /&gt;
Now we have a problem.  This can't co-exist with the other transaction's activity.  We don't cancel yet, because the transaction would fail on the same conflicts if retried.  The first committer will win, and the other transaction will fail when it tries to continue after that.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 commit;&lt;br /&gt;
This one happened to commit first.  Its work is persisted.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
This transaction failed to withdraw the money.&lt;br /&gt;
Now we roll back and retry the transaction.&lt;br /&gt;
&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 select type, balance from account&lt;br /&gt;
   where name = 'kevin';&lt;br /&gt;
&lt;br /&gt;
    type   | balance&lt;br /&gt;
 ----------+----------&lt;br /&gt;
  saving   | -$400.00&lt;br /&gt;
  checking |  $500.00&lt;br /&gt;
 (2 rows)&lt;br /&gt;
We see they have a net of $100.  This request for $900 will be rejected by the application.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Three or More Transactions ===&lt;br /&gt;
&lt;br /&gt;
Serialization anomalies can result from more complex patterns of access, involving three or more transactions.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Primary Colors ====&lt;br /&gt;
&lt;br /&gt;
This is similar to the &amp;quot;Black and White&amp;quot; write skew example, except that we're using the three primary colors.  One transaction is trying to update red to yellow, the next is trying to update yellow to blue, and the third is trying to update blue to red.  If these were executed one at a time, you would be left with either one or two colors in the table, depending on the order of execution.  If any two are executed concurrently, the one trying to read the rows being updated by the other will appear to execute first, since it won't see the work of the other transaction, so there is no problem there.  Whether the other transaction is run before that or after that, the results are consistent with some serial order of execution.&lt;br /&gt;
&lt;br /&gt;
If all three are run concurrently, there is a cycle in the apparent order of execution.  A Repeatable Read transaction would not detect this, and the table would still have three colors.  A Serializable transaction will detect the problem and roll one of the transactions back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
The example can be set up with these statements:&lt;br /&gt;
 create table dots&lt;br /&gt;
   (&lt;br /&gt;
     id int not null primary key,&lt;br /&gt;
     color text not null&lt;br /&gt;
   );&lt;br /&gt;
 insert into dots&lt;br /&gt;
   with x(id) as (select generate_series(1,9000))&lt;br /&gt;
   select id, case when id % 3 = 1 then 'red'&lt;br /&gt;
     when id % 3 = 2 then 'yellow'&lt;br /&gt;
     else 'blue' end from x;&lt;br /&gt;
 create index dots_color on dots (color);&lt;br /&gt;
 analyze dots;&lt;br /&gt;
{|&lt;br /&gt;
|+ Primary Colors Example&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
! session 3&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 begin;&lt;br /&gt;
 update dots set color = 'yellow'&lt;br /&gt;
   where color = 'red';&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 update dots set color = 'blue'&lt;br /&gt;
   where color = 'yellow';&lt;br /&gt;
|-&lt;br /&gt;
|  ||  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 update dots set color = 'red'&lt;br /&gt;
   where color = 'blue';&lt;br /&gt;
At this point at least one of these three transactions is doomed to fail.  To ensure progress, we wait until one of them commits.  The commit will succeed, which will not only ensure that progress is made but that an immediate retry of a failed transaction won't fail again ''on the same combination of transactions''.&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 commit;&lt;br /&gt;
First commit wins.  Session 2 is bound to fail at this point, because during commit it was determined that it had the better chance of succeeding if retried immediately.&lt;br /&gt;
 select color, count(*) from dots&lt;br /&gt;
   group by color&lt;br /&gt;
   order by color;&lt;br /&gt;
&lt;br /&gt;
  color  | count&lt;br /&gt;
 --------+-------&lt;br /&gt;
  blue   |  3000&lt;br /&gt;
  yellow |  6000&lt;br /&gt;
 (2 rows)&lt;br /&gt;
This appears to have run before the other updates.&lt;br /&gt;
|-&lt;br /&gt;
|  ||  ||&lt;br /&gt;
 commit;&lt;br /&gt;
This works if attempted at this point.  If session 2 does more work first, this transaction might also need to be cancelled and retried.&lt;br /&gt;
 select color, count(*) from dots&lt;br /&gt;
   group by color&lt;br /&gt;
   order by color;&lt;br /&gt;
&lt;br /&gt;
  color  | count&lt;br /&gt;
 --------+-------&lt;br /&gt;
  red    |  3000&lt;br /&gt;
  yellow |  6000&lt;br /&gt;
 (2 rows)&lt;br /&gt;
This appears to have run after the transaction on session 1.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
A serialization failure.  We roll back and try again.&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 update dots set color = 'blue'&lt;br /&gt;
   where color = 'yellow';&lt;br /&gt;
 commit;&lt;br /&gt;
Things are OK on retry.&lt;br /&gt;
 select color, count(*) from dots&lt;br /&gt;
   group by color&lt;br /&gt;
   order by color;&lt;br /&gt;
&lt;br /&gt;
  color | count&lt;br /&gt;
 -------+-------&lt;br /&gt;
  blue  |  6000&lt;br /&gt;
  red   |  3000&lt;br /&gt;
 (2 rows)&lt;br /&gt;
This appears to have run last, which it did.&lt;br /&gt;
|}&lt;br /&gt;
An interesting point is that if session 2 attempted to commit after session 1 and before session 3, it would still have failed, and a retry would still have succeeded, but the fate of the transaction on session 3 is not deterministic.  It might have succeeded or it might have gotten a serialization failure and required a retry.&lt;br /&gt;
&lt;br /&gt;
This is because the predicate locking used as part of conflict detection works based on pages and tuples actually accessed, and there is a random factor used in inserting index entries which have equal keys, in order to minimize contention; so even with identical run sequences it is possible to see differences in where serialization failures occur.  That is why it is important, when relying on serializable transactions for managing concurrency, to have some generalized technique for identifying serialization failures and retrying transactions from the beginning.&lt;br /&gt;
&lt;br /&gt;
It is also worth noting that if session 2 committed the retry transaction before session 3 committed its transaction, any subsequent query which viewed rows successfully updated from yellow to blue by session 2 would deterministically doom the transaction on session 3, because these would not be rows which session 3 would see as blue and update to red.  For the transaction on session 3 to successfully commit, it must be considered to have run before the successful transaction on session 2, so exposing a state in which the work of the transaction on session 2 is visible, but not the work of the transaction on session 3, means that the transaction on session 3 must fail.  The act of ''observing'' a recently modified database state can cause serialization failures.  This will be further explored in other examples.&lt;br /&gt;
&lt;br /&gt;
=== Enforcing Business Rules in Triggers ===&lt;br /&gt;
&lt;br /&gt;
If all transactions are serializable, business rules can be enforced in triggers without the problems associated with other transaction isolation levels.  Where a declarative constraint works, it will generally be faster, easier to implement and maintain, and less prone to bugs -- so triggers should only be used this way where a declarative constraint won't work.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== Unique-Like Constraints ====&lt;br /&gt;
&lt;br /&gt;
Say you want something similar to a unique constraint, but it's a little more complicated.  For this example, we want uniqueness in the first six characters of the text column.&lt;br /&gt;
&lt;br /&gt;
The example can be set up with these statements:&lt;br /&gt;
 create table t (id int not null, val text not null);&lt;br /&gt;
 with x (n) as (select generate_series(1,10000))&lt;br /&gt;
   insert into t select x.n, md5(x.n::text) from x;&lt;br /&gt;
 alter table t add primary key(id);&lt;br /&gt;
 create index t_val on t (val);&lt;br /&gt;
 vacuum analyze t;&lt;br /&gt;
 create function t_func()&lt;br /&gt;
   returns trigger&lt;br /&gt;
   language plpgsql as $$&lt;br /&gt;
 declare&lt;br /&gt;
   st text;&lt;br /&gt;
 begin&lt;br /&gt;
   st := substring(new.val from 1 for 6);&lt;br /&gt;
   if tg_op = 'UPDATE' and substring(old.val from 1 for 6) = st then&lt;br /&gt;
     return new;&lt;br /&gt;
   end if;&lt;br /&gt;
   if exists (select * from t where val between st and st || 'z') then&lt;br /&gt;
     raise exception 't.val not unique on first six characters: &amp;quot;%&amp;quot;', st;&lt;br /&gt;
   end if;&lt;br /&gt;
   return new;&lt;br /&gt;
 end;&lt;br /&gt;
 $$;&lt;br /&gt;
 create trigger t_trig&lt;br /&gt;
   before insert or update on t&lt;br /&gt;
   for each row execute procedure t_func();&lt;br /&gt;
&lt;br /&gt;
To confirm that the trigger is enforcing the business rule when there are no concurrency issues, on a single connection:&lt;br /&gt;
&lt;br /&gt;
 insert into t values (-1, 'this old dog');&lt;br /&gt;
 insert into t values (-2, 'this old cat');&lt;br /&gt;
&lt;br /&gt;
 ERROR:  t.val not unique on first six characters: &amp;quot;this o&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Now we try in two concurrent sessions.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
|+ Unique-Like Constraint Example&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
 begin;&lt;br /&gt;
 insert into t values (-3, 'the river flows');&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 begin;&lt;br /&gt;
 insert into t values (-4, 'the right stuff');&lt;br /&gt;
This works for the moment, because the work of the other transaction is not visible to this transaction, but both transactions may not commit without violating the business rule.&lt;br /&gt;
 commit;&lt;br /&gt;
The first committer wins.  This transaction is safe.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
A commit here would fail, but so would an attempt to run any other statement within this doomed transaction.&lt;br /&gt;
 select * from t where id &amp;lt; 0;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Canceled on identification as a pivot,&lt;br /&gt;
          during conflict out checking.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
&lt;br /&gt;
Since this is a serialization failure, the transaction should be retried.&lt;br /&gt;
&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 insert into t values (-3, 'the river flows');&lt;br /&gt;
&lt;br /&gt;
On retry we get an error which is more helpful to the user.&lt;br /&gt;
&lt;br /&gt;
 ERROR:  t.val not unique on first six characters: &amp;quot;the ri&amp;quot;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==== FK-Like Constraints ====&lt;br /&gt;
&lt;br /&gt;
Sometimes two tables must have a relationship very similar to a foreign key relationship, but there are extra criteria which makes a foreign key insufficient to completely cover the necessary integrity checking.  In this example a project table contains a reference to a person table's key in a project_manager column, but not just ''any'' person will do; the person specified must be flagged as a project manager.&lt;br /&gt;
&lt;br /&gt;
The example can be set up with these statements:&lt;br /&gt;
 create table person&lt;br /&gt;
   (&lt;br /&gt;
     person_id int not null primary key,&lt;br /&gt;
     person_name text not null,&lt;br /&gt;
     is_project_manager boolean not null&lt;br /&gt;
   );&lt;br /&gt;
 create table project&lt;br /&gt;
   (&lt;br /&gt;
     project_id int not null primary key,&lt;br /&gt;
     project_name text not null,&lt;br /&gt;
     project_manager int not null&lt;br /&gt;
   );&lt;br /&gt;
 create index project_manager&lt;br /&gt;
   on project (project_manager);&lt;br /&gt;
 &lt;br /&gt;
 create function person_func()&lt;br /&gt;
   returns trigger&lt;br /&gt;
   language plpgsql as $$&lt;br /&gt;
 begin&lt;br /&gt;
   if tg_op = 'DELETE' and old.is_project_manager then&lt;br /&gt;
     if exists (select * from project&lt;br /&gt;
                where project_manager = old.person_id) then&lt;br /&gt;
       raise exception&lt;br /&gt;
         'person cannot be deleted while manager of any project';&lt;br /&gt;
     end if;&lt;br /&gt;
   end if;&lt;br /&gt;
   if tg_op = 'UPDATE' then&lt;br /&gt;
     if new.person_id is distinct from old.person_id then&lt;br /&gt;
       raise exception 'change to person_id is not allowed';&lt;br /&gt;
     end if;&lt;br /&gt;
     if old.is_project_manager and not new.is_project_manager then&lt;br /&gt;
       if exists (select * from project&lt;br /&gt;
                  where project_manager = old.person_id) then&lt;br /&gt;
         raise exception&lt;br /&gt;
           'person must remain a project manager while managing any projects';&lt;br /&gt;
       end if;&lt;br /&gt;
     end if;&lt;br /&gt;
   end if;&lt;br /&gt;
   if tg_op = 'DELETE' then&lt;br /&gt;
     return old;&lt;br /&gt;
   else&lt;br /&gt;
     return new;&lt;br /&gt;
   end if;&lt;br /&gt;
 end;&lt;br /&gt;
 $$;&lt;br /&gt;
 create trigger person_trig&lt;br /&gt;
   before update or delete on person&lt;br /&gt;
   for each row execute procedure person_func();&lt;br /&gt;
 &lt;br /&gt;
 create function project_func()&lt;br /&gt;
   returns trigger&lt;br /&gt;
   language plpgsql as $$&lt;br /&gt;
 begin&lt;br /&gt;
   if tg_op = 'INSERT'&lt;br /&gt;
   or (tg_op = 'UPDATE' and new.project_manager &amp;lt;&amp;gt; old.project_manager) then&lt;br /&gt;
     if not exists (select * from person&lt;br /&gt;
                      where person_id = new.project_manager&lt;br /&gt;
                        and is_project_manager) then&lt;br /&gt;
       raise exception&lt;br /&gt;
         'project_manager must be defined as a project manager in the person table';&lt;br /&gt;
     end if;&lt;br /&gt;
   end if;&lt;br /&gt;
   return new;&lt;br /&gt;
 end;&lt;br /&gt;
 $$;&lt;br /&gt;
 create trigger project_trig&lt;br /&gt;
   before insert or update on project&lt;br /&gt;
   for each row execute procedure project_func();&lt;br /&gt;
 &lt;br /&gt;
 insert into person values (1, 'Kevin Grittner', true);&lt;br /&gt;
 insert into person values (2, 'Peter Parker', true);&lt;br /&gt;
 insert into project values (101, 'parallel processing', 1);&lt;br /&gt;
{|&lt;br /&gt;
|+ FK-Like Constraints Example&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
One person is being updated to no longer be a project manager.&lt;br /&gt;
 begin;&lt;br /&gt;
 update person&lt;br /&gt;
   set is_project_manager = false&lt;br /&gt;
   where person_id = 2;&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
At the same time, a project is being updated to make that person the manager of that project.&lt;br /&gt;
 begin;&lt;br /&gt;
 update project&lt;br /&gt;
   set project_manager = 2&lt;br /&gt;
   where project_id = 101;&lt;br /&gt;
These can't both be committed.  The first commit will win.&lt;br /&gt;
 commit;&lt;br /&gt;
The assignment of the person to the project commits first, so the other transaction is now doomed to fail.  If either transaction had run at a different isolation level, both transactions could have committed, resulting in a violation of the business rules.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
A serialization failure.  We roll back and try again.&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;&lt;br /&gt;
 update person&lt;br /&gt;
   set is_project_manager = false&lt;br /&gt;
   where person_id = 2;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  person must remain a project manager&lt;br /&gt;
         while managing any projects&lt;br /&gt;
On the retry we get a meaningful message.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Read Only Transactions ===&lt;br /&gt;
&lt;br /&gt;
While a Read Only transaction cannot contribute to an anomaly which persists in the database, under Repeatable Read transaction isolation it can ''see'' a state which is not consistent with any serial (one-at-a-time) execution of transactions.  A Serializable transaction implemented with SSI will never see such transient anomalies.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==== Deposit Report ====&lt;br /&gt;
&lt;br /&gt;
A general class of problems involving read only transactions is batch processing, where one table controls which batch is currently the target of inserts.  A batch is closed by updating the control table, at which point the batch is considered &amp;quot;locked&amp;quot; against further change, and processing of that batch occurs.&lt;br /&gt;
&lt;br /&gt;
A particular example of this which occurs in real-world bookkeeping is receipting.  Receipts might be added to a batch identified by the deposit date, or (if more than one deposit per day is possible) an abstract receipt batch number.  At some point during the day, while the bank is still open, the batch is closed, a report is printed of the money received, and the money is taken to the bank for deposit.&lt;br /&gt;
&lt;br /&gt;
The example can be set up with these statements:&lt;br /&gt;
 create table control&lt;br /&gt;
   (&lt;br /&gt;
     deposit_no int not null&lt;br /&gt;
   );&lt;br /&gt;
 insert into control values (1);&lt;br /&gt;
 create table receipt&lt;br /&gt;
   (&lt;br /&gt;
     receipt_no serial primary key,&lt;br /&gt;
     deposit_no int not null,&lt;br /&gt;
     payee text not null,&lt;br /&gt;
     amount money not null&lt;br /&gt;
   );&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values ((select deposit_no from control), 'Crosby', '100');&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values ((select deposit_no from control), 'Stills', '200');&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values ((select deposit_no from control), 'Nash', '300');&lt;br /&gt;
{|&lt;br /&gt;
|+ Deposit Report Example&lt;br /&gt;
! session 1&lt;br /&gt;
! session 2&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
At a receipting counter, another receipt is added to the current batch.&lt;br /&gt;
 begin;  -- T1&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values&lt;br /&gt;
   (&lt;br /&gt;
     (select deposit_no from control),&lt;br /&gt;
     'Young', '100'&lt;br /&gt;
   );&lt;br /&gt;
This transaction can see its own insert, but it's not visible to other transactions until commit.&lt;br /&gt;
  select * from receipt;&lt;br /&gt;
&lt;br /&gt;
  receipt_no | deposit_no | payee  | amount  &lt;br /&gt;
 ------------+------------+--------+---------&lt;br /&gt;
           1 |          1 | Crosby | $100.00&lt;br /&gt;
           2 |          1 | Stills | $200.00&lt;br /&gt;
           3 |          1 | Nash   | $300.00&lt;br /&gt;
           4 |          1 | Young  | $100.00&lt;br /&gt;
 (4 rows)&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
At about the same time, a supervisor clicks a button to close the receipt batch.&lt;br /&gt;
 begin;  -- T2&lt;br /&gt;
 select deposit_no from control;&lt;br /&gt;
&lt;br /&gt;
  deposit_no &lt;br /&gt;
 ------------&lt;br /&gt;
           1&lt;br /&gt;
 (1 row)&lt;br /&gt;
The application notes the receipting batch which is about to be closed, increments the batch number, and saves that to the control table.&lt;br /&gt;
 update control set deposit_no = 2;&lt;br /&gt;
 commit;&lt;br /&gt;
T1, the transaction inserting the last receipt for the old batch, hasn't committed yet even though the batch has been closed.  If T1 commits before anyone looks at the contents of the closed batch, everything is OK.  So far we don't have a problem; the receipt ''appears'' to have added before the batch was closed.  We have behavior which is consistent with some one-at-a-time execution of the transactions: T1 -&amp;gt; T2.&lt;br /&gt;
&lt;br /&gt;
For purposes of demonstration, we'll have the deposit report start before that last receipt commits.&lt;br /&gt;
 begin;  -- T3&lt;br /&gt;
 select * from receipt where deposit_no = 1;&lt;br /&gt;
&lt;br /&gt;
  receipt_no | deposit_no | payee  | amount  &lt;br /&gt;
 ------------+------------+--------+---------&lt;br /&gt;
           1 |          1 | Crosby | $100.00&lt;br /&gt;
           2 |          1 | Stills | $200.00&lt;br /&gt;
           3 |          1 | Nash   | $300.00&lt;br /&gt;
 (3 rows)&lt;br /&gt;
Now we have a problem.  T3 was started with the knowledge that T2 committed successfully, so T3 must be considered to have run after T2.  (This could also be true if the T3 had run independently and selected from the control table, seeing the updated deposit_no.)  But T3 cannot see the work of T1, so T1 appears to have run after T3.  So we have a cycle T1 -&amp;gt; T2 -&amp;gt; T3 -&amp;gt; T1.  And this would be a problem in practical terms; the batch is supposed to be closed and immutable, but a change will pop up late -- perhaps after the trip to the bank.&lt;br /&gt;
&lt;br /&gt;
Under the REPEATABLE READ isolation level this would silently proceed without the anomaly being noticed.  Under the SERIALIZABLE isolation level one of the transactions will be rolled back to protect the integrity of the system.  Since a rollback and retry of T3 would hit the same error if T1 was still active, PostgreSQL will cancel T1, so that an immediate retry will succeed.&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
 ERROR:  could not serialize access&lt;br /&gt;
         due to read/write dependencies&lt;br /&gt;
         among transactions&lt;br /&gt;
 DETAIL:  Cancelled on identification&lt;br /&gt;
          as a pivot, during commit attempt.&lt;br /&gt;
 HINT:  The transaction might succeed if retried.&lt;br /&gt;
OK, let's retry.&lt;br /&gt;
 rollback;&lt;br /&gt;
 begin;  -- T1 retry&lt;br /&gt;
 insert into receipt&lt;br /&gt;
   (deposit_no, payee, amount)&lt;br /&gt;
   values&lt;br /&gt;
   (&lt;br /&gt;
     (select deposit_no from control),&lt;br /&gt;
     'Young', '100'&lt;br /&gt;
   );&lt;br /&gt;
&lt;br /&gt;
What does the receipt table look like now?&lt;br /&gt;
&lt;br /&gt;
 select * from receipt;&lt;br /&gt;
&lt;br /&gt;
  receipt_no | deposit_no | payee  | amount  &lt;br /&gt;
 ------------+------------+--------+---------&lt;br /&gt;
           1 |          1 | Crosby | $100.00&lt;br /&gt;
           2 |          1 | Stills | $200.00&lt;br /&gt;
           3 |          1 | Nash   | $300.00&lt;br /&gt;
           5 |          2 | Young  | $100.00&lt;br /&gt;
 (4 rows)&lt;br /&gt;
&lt;br /&gt;
The receipt now falls into the next batch, making the deposit report in T3 correct!&lt;br /&gt;
&lt;br /&gt;
 commit;&lt;br /&gt;
&lt;br /&gt;
No problem now.&lt;br /&gt;
|-&lt;br /&gt;
|  ||&lt;br /&gt;
 commit;&lt;br /&gt;
This would have been OK anytime after T3's SELECT.&lt;br /&gt;
|}&lt;/div&gt;</description>
			<pubDate>Sun, 27 Nov 2011 00:06:42 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:SSI</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* SSI Algorithm */ Fix references to T1 in proof; they should have been Tin.  Subscript the zero in T0.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[[#SQL92|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation[[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
::T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----&amp;lt;sub&amp;gt;''rw''&amp;lt;/sub&amp;gt;---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----&amp;lt;sub&amp;gt;''rw''&amp;lt;/sub&amp;gt;---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [[#Cahill2009|&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;]]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt; that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt; might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T&amp;lt;sub&amp;gt;0&amp;lt;/sub&amp;gt; commits -- and thus before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
Notable aspects of the PostgreSQL implementation of SSI include:&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level retains its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
* Modifying a heap tuple creates a rw-conflict with any transaction that holds a SIREAD lock on that tuple, or on the page or relation that contains it.&lt;br /&gt;
* Inserting a new tuple creates a rw-conflict with any transaction holding a SIREAD lock on the entire relation. It doesn't conflict with page-level locks, because page-level locks are only used to aggregate tuple locks. Unlike index page locks, they don't lock &amp;quot;gaps&amp;quot; on the page.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper[[#Foundations2007|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], along with the locking papers referenced from that and the Cahill papers[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;CahillEtAl2008&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Cahill2009&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Foundations2007&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;SQL92&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&amp;lt;/span&amp;gt;&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 23:22:00 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* SSI Algorithm */ Pretty up the ASCII-art diagram of a dangerous structure a little.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[[#SQL92|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation[[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
::T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----&amp;lt;sub&amp;gt;''rw''&amp;lt;/sub&amp;gt;---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----&amp;lt;sub&amp;gt;''rw''&amp;lt;/sub&amp;gt;---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [[#Cahill2009|&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;]]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T0 that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T0 might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T0 and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T1 was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T0 committed before T1 started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T0 commits -- and thus before T1 starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
Notable aspects of the PostgreSQL implementation of SSI include:&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level retains its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
* Modifying a heap tuple creates a rw-conflict with any transaction that holds a SIREAD lock on that tuple, or on the page or relation that contains it.&lt;br /&gt;
* Inserting a new tuple creates a rw-conflict with any transaction holding a SIREAD lock on the entire relation. It doesn't conflict with page-level locks, because page-level locks are only used to aggregate tuple locks. Unlike index page locks, they don't lock &amp;quot;gaps&amp;quot; on the page.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper[[#Foundations2007|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], along with the locking papers referenced from that and the Cahill papers[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;CahillEtAl2008&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Cahill2009&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Foundations2007&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;SQL92&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&amp;lt;/span&amp;gt;&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 23:14:41 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Index AM implementations */ Remove empty lines between points to prevent a separate unordered list per point.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[[#SQL92|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation[[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
::T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [[#Cahill2009|&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;]]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T0 that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T0 might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T0 and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T1 was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T0 committed before T1 started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T0 commits -- and thus before T1 starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
Notable aspects of the PostgreSQL implementation of SSI include:&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level retains its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
* Modifying a heap tuple creates a rw-conflict with any transaction that holds a SIREAD lock on that tuple, or on the page or relation that contains it.&lt;br /&gt;
* Inserting a new tuple creates a rw-conflict with any transaction holding a SIREAD lock on the entire relation. It doesn't conflict with page-level locks, because page-level locks are only used to aggregate tuple locks. Unlike index page locks, they don't lock &amp;quot;gaps&amp;quot; on the page.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper[[#Foundations2007|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], along with the locking papers referenced from that and the Cahill papers[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;CahillEtAl2008&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Cahill2009&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Foundations2007&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;SQL92&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&amp;lt;/span&amp;gt;&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 23:06:26 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* PostgreSQL Implementation */ Eliminate empty lines so points are a single unordered list, rather than a separate list for each point.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[[#SQL92|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation[[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
::T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [[#Cahill2009|&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;]]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T0 that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T0 might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T0 and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T1 was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T0 committed before T1 started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T0 commits -- and thus before T1 starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
Notable aspects of the PostgreSQL implementation of SSI include:&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level retains its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
* Modifying a heap tuple creates a rw-conflict with any transaction that holds a SIREAD lock on that tuple, or on the page or relation that contains it.&lt;br /&gt;
* Inserting a new tuple creates a rw-conflict with any transaction holding a SIREAD lock on the entire relation. It doesn't conflict with page-level locks, because page-level locks are only used to aggregate tuple locks. Unlike index page locks, they don't lock &amp;quot;gaps&amp;quot; on the page.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper[[#Foundations2007|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], along with the locking papers referenced from that and the Cahill papers[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;CahillEtAl2008&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Cahill2009&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Foundations2007&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;SQL92&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&amp;lt;/span&amp;gt;&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 23:04:46 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* PostgreSQL Implementation */ Add two implementation points from Dan's README-SSI.patch.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[[#SQL92|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation[[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
::T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [[#Cahill2009|&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;]]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T0 that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T0 might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T0 and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T1 was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T0 committed before T1 started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T0 commits -- and thus before T1 starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
Notable aspects of the PostgreSQL implementation of SSI include:&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level retains its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
&lt;br /&gt;
* Modifying a heap tuple creates a rw-conflict with any transaction that holds a SIREAD lock on that tuple, or on the page or relation that contains it.&lt;br /&gt;
&lt;br /&gt;
* Inserting a new tuple creates a rw-conflict with any transaction holding a SIREAD lock on the entire relation. It doesn't conflict with page-level locks, because page-level locks are only used to aggregate tuple locks. Unlike index page locks, they don't lock &amp;quot;gaps&amp;quot; on the page.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper[[#Foundations2007|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], along with the locking papers referenced from that and the Cahill papers[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;CahillEtAl2008&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Cahill2009&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Foundations2007&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;SQL92&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&amp;lt;/span&amp;gt;&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 22:57:10 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Publications */ Terminate unclosed &amp;lt;span&amp;gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[[#SQL92|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation[[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
::T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [[#Cahill2009|&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;]]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T0 that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T0 might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T0 and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T1 was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T0 committed before T1 started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T0 commits -- and thus before T1 starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
Notable aspects of the PostgreSQL implementation of SSI include:&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level retains its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper[[#Foundations2007|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], along with the locking papers referenced from that and the Cahill papers[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;CahillEtAl2008&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Cahill2009&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Foundations2007&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;SQL92&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&amp;lt;/span&amp;gt;&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 22:49:42 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;Create anchors for footnotes and turn references into links (superscripted where appropriate).&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[[#SQL92|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation[[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
::T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [[#Cahill2009|&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;]]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T0 that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T0 might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T0 and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T1 was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T0 committed before T1 started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T0 commits -- and thus before T1 starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
Notable aspects of the PostgreSQL implementation of SSI include:&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level retains its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper[[#Foundations2007|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]], along with the locking papers referenced from that and the Cahill papers[[#CahillEtAl2008|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]][[#Cahill2009|&amp;lt;sup&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt;&amp;lt;/sup&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;CahillEtAl2008&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Cahill2009&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;Foundations2007&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;SQL92&amp;quot;&amp;gt;&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 22:47:31 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* PostgreSQL Implementation */ Minor cleanup.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008&amp;lt;nowiki&amp;gt;[1][2]&amp;lt;/nowiki&amp;gt;.  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[4].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation [2] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
::T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [2]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T0 that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T0 might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T0 and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T1 was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T0 committed before T1 started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T0 commits -- and thus before T1 starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
Notable aspects of the PostgreSQL implementation of SSI include:&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level retains its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[1], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;, along with the locking papers referenced from that and the Cahill papers&amp;lt;nowiki&amp;gt;[1][2]&amp;lt;/nowiki&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 21:56:26 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* SSI Algorithm */ Minor spacing adjustment to new code.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008&amp;lt;nowiki&amp;gt;[1][2]&amp;lt;/nowiki&amp;gt;.  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[4].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation [2] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
::T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [2]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T0 that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T0 might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T0 and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T1 was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T0 committed before T1 started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T0 commits -- and thus before T1 starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
The implementation of serializable transactions for PostgreSQL is accomplished through Serializable Snapshot Isolation (SSI), based on the work of Cahill, et al[1][2].  Fundamentally, this allows snapshot isolation to run as it has, with monitoring for conditions which could create a serialization anomaly.&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level will retain its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[1], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;, along with the locking papers referenced from that and the Cahill papers&amp;lt;nowiki&amp;gt;[1][2]&amp;lt;/nowiki&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 21:46:47 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* SSI Algorithm */ Fill in from Dan's README-SSI patch, with some Wiki formatting.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database.  Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
[[Image:Serialization-Anomalies-in-Snapshot-Isolation.png|600px|center]]&lt;br /&gt;
&lt;br /&gt;
=== Serializable Isolation Implementation Strategies ===&lt;br /&gt;
&lt;br /&gt;
Techniques for implementing full serializable isolation have been published and in use in many database products for decades.  The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions.  A cycle in a graph of blocking indicates a deadlock, requiring a rollback.  Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.&lt;br /&gt;
&lt;br /&gt;
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008&amp;lt;nowiki&amp;gt;[1][2]&amp;lt;/nowiki&amp;gt;.  This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation.  In particular, reads don't block anything and writes don't block reads.  Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur.  It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur.  In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.&lt;br /&gt;
&lt;br /&gt;
=== Apparent Serial Order of Execution ===&lt;br /&gt;
&lt;br /&gt;
One way to understand when snapshot anomalies can occur, and to visualize the difference between the serializable implementations described above, is to consider that among transactions executing at the serializable transaction isolation level, the results are required to be consistent with ''some'' serial (one-at-a-time) execution of the transactions[4].  How is that order determined in each?&lt;br /&gt;
&lt;br /&gt;
In S2PL, each transaction locks any data it accesses. It holds the locks until committing, preventing other transactions from making conflicting accesses to the same data in the interim. Some transactions may have to be rolled back to prevent deadlock. But successful transactions can always be viewed as having occurred sequentially, in the order they committed.&lt;br /&gt;
&lt;br /&gt;
With snapshot isolation, reads never block writes, nor vice versa, so more concurrency is possible. The order in which transactions appear to have executed is determined by something more subtle than in S2PL: read/write dependencies. If a transaction reads data, it appears to execute after the transaction that wrote the data it is reading.  Similarly, if it updates data, it appears to execute after the transaction that wrote the previous version. These dependencies, which we call &amp;quot;wr-dependencies&amp;quot; and &amp;quot;ww-dependencies&amp;quot;, are consistent with the commit order, because the first transaction must have committed before the second starts. However, there can also be dependencies between two *concurrent* transactions, i.e. where one was running when the other acquired its snapshot.  These &amp;quot;rw-conflicts&amp;quot; occur when one transaction attempts to read data which is not visible to it because the transaction which wrote it (or will later write it) is concurrent. The reading transaction appears to have executed first, regardless of the actual sequence of transaction starts or commits, because it sees a database state prior to that in which the other transaction leaves it.&lt;br /&gt;
&lt;br /&gt;
Anomalies occur when a cycle is created in the graph of dependencies: when a dependency or series of dependencies causes transaction A to appear to have executed before transaction B, but another series of dependencies causes B to appear before A. If that's the case, then the results can't be consistent with any serial execution of the transactions.&lt;br /&gt;
&lt;br /&gt;
=== SSI Algorithm ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction in PostgreSQL are implemented using&lt;br /&gt;
Serializable Snapshot Isolation (SSI), based on the work of Cahill,&lt;br /&gt;
et al. Fundamentally, this allows snapshot isolation to run as it&lt;br /&gt;
has, while monitoring for conditions which could create a serialization&lt;br /&gt;
anomaly. &lt;br /&gt;
&lt;br /&gt;
SSI is based on the observation [2] that each snapshot isolation&lt;br /&gt;
anomaly corresponds to a cycle that contains a &amp;quot;dangerous structure&amp;quot;&lt;br /&gt;
of two adjacent rw-conflict edges:&lt;br /&gt;
&lt;br /&gt;
T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; ----''rw''---&amp;gt; T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SSI works by watching for this dangerous structure, and rolling back a transaction when needed to prevent any anomaly. This means it only needs to track rw-conflicts between concurrent transactions, not wr- and ww-dependencies. It also means there is a risk of false positives, because not every dangerous structure corresponds to an actual serialization failure.&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation uses two additional optimizations:&lt;br /&gt;
&lt;br /&gt;
# T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle (see proof of Theorem 2.1 of [2]). We only roll back a transaction if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; commits before T&amp;lt;sub&amp;gt;pivot&amp;lt;/sub&amp;gt; and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt;.&lt;br /&gt;
# if T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; is read-only, there can only be an anomaly if T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; committed before T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; takes its snapshot. This optimization is an original one. Proof:&lt;br /&gt;
#* Because there is a cycle, there must be some transaction T0 that precedes T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; in the serial order. (T0 might be the same as T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt;).&lt;br /&gt;
#* The dependency between T0 and T&amp;lt;sub&amp;gt;in&amp;lt;/sub&amp;gt; can't be a rw-conflict, because T1 was read-only, so it must be a ww- or wr-dependency.  Those can only occur if T0 committed before T1 started.&lt;br /&gt;
#* Because T&amp;lt;sub&amp;gt;out&amp;lt;/sub&amp;gt; must commit before any other transaction in the cycle, it must commit before T0 commits -- and thus before T1 starts.&lt;br /&gt;
&lt;br /&gt;
=== PostgreSQL Implementation ===&lt;br /&gt;
&lt;br /&gt;
The implementation of serializable transactions for PostgreSQL is accomplished through Serializable Snapshot Isolation (SSI), based on the work of Cahill, et al[1][2].  Fundamentally, this allows snapshot isolation to run as it has, with monitoring for conditions which could create a serialization anomaly.&lt;br /&gt;
&lt;br /&gt;
* Since this technique is based on Snapshot Isolation (SI), those areas in PostgreSQL which don't use SI can't be brought under SSI.  This includes system tables, temporary tables, sequences, hint bit rewrites, etc.  SSI can not eliminate existing anomalies in these areas.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI.  If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.&lt;br /&gt;
&lt;br /&gt;
* If all transactions are run at the SERIALIZABLE transaction isolation level, business rules can be enforced in triggers or application code without ever having a need to acquire an explicit lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.&lt;br /&gt;
&lt;br /&gt;
* Those who want to continue to use snapshot isolation without the additional protections of SSI (and the associated costs of enforcing those protections), can use the REPEATABLE READ transaction isolation level.  This level will retain its legacy behavior, which is identical to the old SERIALIZABLE implementation and fully consistent with the standard's requirements for the REPEATABLE READ transaction isolation level.&lt;br /&gt;
&lt;br /&gt;
* Performance under this SSI implementation will be significantly improved if transactions which don't modify permanent tables are declared to be READ ONLY before they begin reading data.&lt;br /&gt;
&lt;br /&gt;
* Performance under SSI will tend to degrade more rapidly with a large number of active database transactions than under less strict isolation levels.  Limiting the number of active transactions through use of a connection pool or similar techniques may be necessary to maintain good performance.&lt;br /&gt;
&lt;br /&gt;
* Any transaction which must be rolled back to prevent serialization anomalies will fail with SQLSTATE 40001, which has a standard meaning of &amp;quot;serialization failure&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* This SSI implementation makes an effort to choose the transaction to be cancelled such that an immediate retry of the transaction can not fail due to conflicts with exactly the same transactions.  Pursuant to this goal, no transaction is cancelled until one of the other transactions in the set of conflicts which could generate an anomaly has successfully committed.  This is conceptually similar to how write conflicts are handled.&lt;br /&gt;
&lt;br /&gt;
== Current Status ==&lt;br /&gt;
&lt;br /&gt;
'''Accepted as a feature for PostgreSQL 9.1!'''&lt;br /&gt;
&lt;br /&gt;
Many thanks to Joe, Heikki, Jeff, and Anssi for posing questions and making suggestions which have led to improvements in the patch!  Thanks to Markus for providing dtester at a critical juncture, which allowed progress to continue, and Heikki for developing the src/test/isolation code to move the dcheck tests into the main PostgreSQL testing framework.  Also, thanks to the many who have participated in discussions along the way.&lt;br /&gt;
&lt;br /&gt;
There are some features which should be considered for 9.2 once 9.1 is settled down; most notably integration with hot standby and fine-grained support for index AMs other than btree.  Most other proposed work is related to possible performance improvements, which should each be carefully benchmarked before being accepted.  At the top of that list is better optimization of ''de facto'' read only transactions -- those which aren't flagged as read only, but which don't actually do any writes to permanent database tables.&lt;br /&gt;
&lt;br /&gt;
== Development Path ==&lt;br /&gt;
&lt;br /&gt;
In general, the approach taken was to try for the fastest possible implementation of a serializable isolation level which allowed no anomalies, even though it had many false positives and very poor performance, and then optimize until the rollback rate and overall performance were within a range which allows practical application.  No existing isolation level was removed, since not everyone will want to pay the performance price for true serializable behavior.  An important goal was that for those not using serializable transaction isolation, the patch doesn't cause performance regression.&lt;br /&gt;
&lt;br /&gt;
=== Credits ===&lt;br /&gt;
&lt;br /&gt;
'''Feature Authors''': [[User:Kgrittn|&amp;lt;span title=&amp;quot;different title&amp;quot;&amp;gt;Kevin Grittner&amp;lt;/span&amp;gt;]] and [http://drkp.net/ Dan R. K. Ports].&lt;br /&gt;
&lt;br /&gt;
'''Testing Support Authors''': Markus Wanner (dtester used during most of development) and Heikki Linnakangas (testing support consistent with other PostgreSQL regression testing, so that we had a testing suite suitable for commit).&lt;br /&gt;
&lt;br /&gt;
'''Reviewers''': Joe Conway (warning elimination, bug chasing, and style comments), Jeff Davis (general review and found problems with GiST support and lack of 2PC support), Anssi Kääriäinen (found problems with conditional indexes and performance issue with sequential scans during testing with production data), YAMAMOTO Takashi (found numerous bugs during long and heavy testing), and Heikki Linnakangas (general review and many useful observations and suggestions, plus general improvements during commit process).&lt;br /&gt;
&lt;br /&gt;
'''Committers''': Joe Conway (initial comment and name changes), Heikki Linnakangas (the bulk of the patch and most follow-up fixes), and Robert Haas (some follow-up fixes).&lt;br /&gt;
&lt;br /&gt;
'''Thanks''' to all those who participated in the on-list discussions and offered advice and support off-list.  There were so many who contributed in this way it would be practically impossible to generate an accurate list, but Robert Haas stands out for offering great advice on an overall development strategy.&lt;br /&gt;
&lt;br /&gt;
'''Special thanks''' to Emmanuel Cecchet for pointing out the ACM SIGMOD paper in which this technique was originally published[1], and to all those at the University of Sidney who contributed to the development of this innovative technique.  This is what turned the discussion from wrangling over how best to document existing behavior toward changing it.&lt;br /&gt;
&lt;br /&gt;
=== Source Code Management ===&lt;br /&gt;
&lt;br /&gt;
A &amp;quot;serializable&amp;quot; git branch has been set up at this location:&lt;br /&gt;
&lt;br /&gt;
git://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/git/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
ssh://git@git.postgresql.org/users/kgrittn/postgres.git&lt;br /&gt;
&lt;br /&gt;
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=shortlog;h=refs/heads/serializable&lt;br /&gt;
&lt;br /&gt;
=== Predicate Locking ===&lt;br /&gt;
&lt;br /&gt;
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range.  PostgreSQL didn't have predicate locking, so it needed to be added.  Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures.  Coarse granularities can cause some false positive indications of conflict.  The number of false positives can be influenced by plan choice.&lt;br /&gt;
&lt;br /&gt;
==== Implementation overview ====&lt;br /&gt;
&lt;br /&gt;
New RAM structures, inspired by those used to track traditional locks in PostgreSQL, but tailored to the needs of SIREAD predicate locking, will be used.  These will refer to physical objects actually accessed in the course of executing the query, to model the predicates through inference.  Anyone interested in this subject should review the Hellerstein, Stonebraker and Hamilton paper&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt;, along with the locking papers referenced from that and the Cahill papers&amp;lt;nowiki&amp;gt;[1][2]&amp;lt;/nowiki&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Because the SIREAD locks don't block, traditional locking techniques must be modified.  Intent locking (locking higher level objects before locking lower level objects) doesn't work with non-blocking &amp;quot;locks&amp;quot; (which are, in some respects, more like flags than locks).&lt;br /&gt;
&lt;br /&gt;
A configurable amount of shared memory is reserved at postmaster start-up to track predicate locks.  This size cannot be changed without a restart.&lt;br /&gt;
* To prevent resource exhaustion, multiple fine-grained locks may be promoted to a single coarser-grained lock as needed.&lt;br /&gt;
* An attempt to acquire an SIREAD lock on a tuple when the same transaction already holds an SIREAD lock on the page or the relation will be ignored.  Likewise, an attempt to lock a page when the relation is locked will be ignored, and the acquisition of a coarser lock will result in the automatic release of all finer-grained locks it covers.&lt;br /&gt;
&lt;br /&gt;
==== Heap locking ====&lt;br /&gt;
&lt;br /&gt;
Predicate locks will be acquired for the heap based on the following:&lt;br /&gt;
* For a table scan, the entire relation will be locked.&lt;br /&gt;
* Each tuple read which is visible to the reading transaction will be locked, whether or not it meets selection criteria; except that there is no need to acquire an SIREAD lock on a tuple when the transaction already holds a write lock on any tuple representing the row, since a rw-dependency would also create a ww-dependency which has more aggressive enforcement and will thus prevent any anomaly.&lt;br /&gt;
&lt;br /&gt;
==== Default index locking ====&lt;br /&gt;
&lt;br /&gt;
There is a new ampredlocks flag in pg_am which should be set to false for any index which doesn't handle the predicate locking internally; indexes flagged this way will be predicate locked at the index relation level.  Such a lock will conflict with any insert into the index, but will not conflict, for example, with deletes, HOT updates, or inserts which don't match the WHERE clause on an index (if present).  This will allow correct behavior at the serializable transaction isolation level for new index types with minimal initial effort; but adding the predicate locking calls and changing the flag will improve performance in high contention workloads involving serializable transactions.&lt;br /&gt;
&lt;br /&gt;
==== Index AM implementations ====&lt;br /&gt;
&lt;br /&gt;
Since predicate locks only exist to detect writes which conflict with earlier reads, and heap tuple locks are acquired to cover all heap tuples actually read, including those read through indexes, the index tuples which were actually scanned are not of interest in themselves; we only care about their &amp;quot;new neighbors&amp;quot; -- later inserts into the index which ''would'' have been included in the scan had they existed at the time.  Conceptually, we want to lock the ''gaps'' between and surrounding index entries within the scanned range.&lt;br /&gt;
&lt;br /&gt;
''Correctness'' requires that any insert into an index generate a rw-conflict with a concurrent serializable transaction if, after that insert, re-execution of any index scan of the other transaction would access the heap for a row not accessed during the previous execution.  Note that a non-HOT update which expires an old index entry covered by the scan and adds a new entry for the modified row's new tuple ''need not'' generate a conflict, although an update which &amp;quot;moves&amp;quot; a row into the scan ''must'' generate a conflict.  While correctness allows false positives, they should be minimized for performance reasons.&lt;br /&gt;
&lt;br /&gt;
Several optimizations are possible:&lt;br /&gt;
&lt;br /&gt;
* An index scan which is just finding the right position for an index insertion or deletion need not acquire a predicate lock.&lt;br /&gt;
&lt;br /&gt;
* An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no &amp;quot;between or around&amp;quot; gaps to cover.&lt;br /&gt;
&lt;br /&gt;
* As long as built-in foreign key enforcement continues to use its current &amp;quot;special tricks&amp;quot; to deal with MVCC issues, predicate locks should not be needed for scans done by enforcement code.&lt;br /&gt;
&lt;br /&gt;
* If a search determines that no rows can be found regardless of index contents because the search conditions are contradictory (e.g., x = 1 AND x = 2), then no predicate lock is needed.&lt;br /&gt;
&lt;br /&gt;
Other index AM implementation considerations:&lt;br /&gt;
&lt;br /&gt;
* If a btree search discovers that no root page has yet been created, a predicate lock on the index relation is required; otherwise btree searches must get to the leaf level to determine which tuples match, so predicate locks go there.&lt;br /&gt;
&lt;br /&gt;
* GiST searches can determine that there are no matches at any level of the index, so there must be a predicate lock at each index level during a GiST search.  An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist.&lt;br /&gt;
&lt;br /&gt;
* The effects of page splits, overflows, consolidations, and removals must be carefully reviewed to ensure that predicate locks aren't &amp;quot;lost&amp;quot; during those operations, or kept with pages which could get re-used for different parts of the index.&lt;br /&gt;
&lt;br /&gt;
=== Testing ===&lt;br /&gt;
&lt;br /&gt;
For this development effort to succeed, it was absolutely necessary to have some client application which allowed execution of test scripts with specific interleaving of statements run against multiple backends.  The dtester module from Markus Wanner was used for this during most of development.  It requires python and several python packages (including twisted).  Due to package dependencies and licensing issues the dtester module was not appropriate for commit to the PostgreSQL code base.&lt;br /&gt;
&lt;br /&gt;
Heikki Linnakangas developed a testing framework based on existing regression test code which has been committed to src/test/isolation.  Besides being compatible with other PostgreSQL testing, it runs faster than dtester.  It doesn't provide a nice display of the results by statement ordering permutation, but that can be added if needed by filtering the current output.&lt;br /&gt;
&lt;br /&gt;
Like many other proposed features and optimizations, this area could benefit from a &amp;quot;performance test farm&amp;quot; so that serializable performance can be better compared to other isolation levels, and so the performance impact of future enhancements can be determined.&lt;br /&gt;
&lt;br /&gt;
=== Documentation ===&lt;br /&gt;
&lt;br /&gt;
A README-SSI file was created, largely drawn from this Wiki page.&lt;br /&gt;
&lt;br /&gt;
Someone with update rights to Wikipedia should probably update references there which will be outdated with this feature:&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/Snapshot_isolation&lt;br /&gt;
* http://en.wikipedia.org/wiki/Isolation_%28database_systems%29&lt;br /&gt;
&lt;br /&gt;
== Innovations ==&lt;br /&gt;
&lt;br /&gt;
The PostgreSQL implementation of Serializable Snapshot Isolation differs from what is described in the cited papers for several reasons:&lt;br /&gt;
# PostgreSQL didn't have any existing predicate locking.  It had to be added from scratch.&lt;br /&gt;
# The existing in-memory lock structures were not suitable for tracking SIREAD locks.&lt;br /&gt;
#* The database products used for the prototype implementations for the papers used update-in-place with a rollback log for their MVCC implementations, while PostgreSQL leaves the old version of a row in place and adds a new tuple to represent the row at a new location.&lt;br /&gt;
#* In PostgreSQL, tuple level locks are not held in RAM for any length of time; lock information is written to the tuples involved in the transactions.&lt;br /&gt;
#* In PostgreSQL, existing lock structures have pointers to memory which is related to a connection.  SIREAD locks need to persist past the end of the originating transaction and even the connection which ran it.&lt;br /&gt;
#* PostgreSQL needs to be able to tolerate a large number of transactions executing while one long-running transaction stays open -- the in-RAM techniques discussed in the papers wouldn't support that.&lt;br /&gt;
# Unlike the database products used for the prototypes described in the papers, PostgreSQL didn't already have a true serializable isolation level distinct from snapshot isolation.&lt;br /&gt;
# PostgreSQL supports subtransactions -- an issue not mentioned in the papers.&lt;br /&gt;
# PostgreSQL doesn't assign a transaction number to a database transaction until and unless necessary.&lt;br /&gt;
# PostgreSQL has pluggable data types with user-definable operators, as well as pluggable index types, not all of which are based around data types which support ordering.&lt;br /&gt;
# Some possible optimizations became apparent during development and testing.&lt;br /&gt;
&lt;br /&gt;
Differences from the implementation described in the papers are listed below.&lt;br /&gt;
&lt;br /&gt;
* New structures needed to be created in shared memory to track the proper information for serializable transactions and their SIREAD locks.&lt;br /&gt;
&lt;br /&gt;
* Because PostgreSQL does not have the same concept of an &amp;quot;oldest transaction ID&amp;quot; for all serializable transactions as assumed in the Cahill these, we track the oldest snapshot xmin among serializable transactions, and a count of how many active transactions use that xmin.  When the count hits zero we find the new oldest xmin and run a clean-up based on that.&lt;br /&gt;
&lt;br /&gt;
* Predicate locking in PostgreSQL will start at the tuple level when possible, with automatic conversion of multiple fine-grained locks to coarser granularity as need to avoid resource exhaustion.  The amount of memory used for these structures will be configurable, to balance RAM usage against SIREAD lock granularity.&lt;br /&gt;
&lt;br /&gt;
* A process-local copy of locks held by a process and the coarser covering locks with counts, are kept to support granularity promotion decisions with low CPU and locking overhead.&lt;br /&gt;
&lt;br /&gt;
* Conflicts are identified by looking for predicate locks when tuples are written and looking at the MVCC information when tuples are read.  There is no matching between two RAM-based locks.&lt;br /&gt;
&lt;br /&gt;
* Because write locks are stored in the heap tuples rather than a RAM-based lock table, the optimization described in the Cahill thesis which eliminates an SIREAD lock where there is a write lock is implemented by the following:&lt;br /&gt;
*# When checking a heap write for conflicts against existing predicate locks, a tuple lock on the tuple being written is removed.&lt;br /&gt;
*# When acquiring a predicate lock on a heap tuple, we return quickly without doing anything if it is a tuple written by the reading transaction.&lt;br /&gt;
&lt;br /&gt;
* Rather than using conflictIn and conflictOut pointers which use NULL to indicate no conflict and a self-reference to indicate multiple conflicts or conflicts with committed transactions, we use a list of rw-conflicts.  With the more complete information, false positives are reduced and we have sufficient data for more aggressive clean-up and other optimizations.&lt;br /&gt;
** We can avoid ever rolling back a transaction until and unless there is a pivot where a transaction on the conflict *out* side of the pivot committed before either of the other transactions.&lt;br /&gt;
** We can avoid ever rolling back a transaction when the transaction on the conflict *in* side of the pivot is explicitly or implicitly READ ONLY unless the transaction on the conflict *out* side of the pivot committed before the READ ONLY transaction acquired its snapshot.  (An implicit READ ONLY transaction is one which committed without writing, even though it was not explicitly declared to be READ ONLY.)&lt;br /&gt;
** We can more aggressively clean up conflicts, predicate locks, and SSI transaction information.&lt;br /&gt;
&lt;br /&gt;
* Allow a READ ONLY transaction to &amp;quot;opt out&amp;quot; of SSI if there are no READ WRITE transactions which could cause the READ ONLY transaction to ever become part of a &amp;quot;dangerous structure&amp;quot; of overlapping transaction dependencies.&lt;br /&gt;
&lt;br /&gt;
* Allow the user to request that a READ ONLY transaction ''wait'' until the conditions are right for it to start in the &amp;quot;opt out&amp;quot; state described above.  We add a DEFERRABLE state to transactions, which is specified and maintained in a way similar to to READ ONLY.  It is ignored for transactions which are not SERIALIZABLE ''and'' READ ONLY.&lt;br /&gt;
&lt;br /&gt;
* When a transaction must be rolled back, we pick among the active transactions such that an immediate retry will not fail again on conflicts with the same transactions.&lt;br /&gt;
&lt;br /&gt;
* We use the PostgreSQL SLRU system to hold summarized information about older committed transactions to put an upper bound on RAM used.  Beyond that limit, information spills to disk.  Performance can degrade in a pessimal situation, but it should be tolerable, and transactions won't need to be cancelled or blocked from starting.&lt;br /&gt;
&lt;br /&gt;
== R&amp;amp;D Issues ==&lt;br /&gt;
&lt;br /&gt;
This is intended to be the place to record specific issues which need more detailed review or analysis.&lt;br /&gt;
&lt;br /&gt;
* '''WAL file replay'''.  While serializable implementations using S2PL can guarantee that the write-ahead log contains commits in a sequence consistent with some serial execution of serializable transactions, SSI cannot make that guarantee.  While the WAL replay is no less consistent than under snapshot isolation, it is possible that under PITR recovery or hot standby a database could reach a readable state where some transactions appear before other transactions which would have had to precede them to maintain serializable consistency.  In essence, if we do nothing, WAL replay will be at snapshot isolation even for serializable transactions.  Is this OK?  If not, how do we address it?&lt;br /&gt;
&lt;br /&gt;
* '''External replication'''.  Look at how this impacts external replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.  This is related to the &amp;quot;WAL file replay&amp;quot; issue.&lt;br /&gt;
&lt;br /&gt;
* '''UNIQUE btree search for equality on all columns'''.  Since a search of a UNIQUE index using equality tests on all columns will lock the heap tuple if an entry is found, it appears that there is no need to get a predicate lock on the index in that case.  A predicate lock ''is'' still needed for such a search if a matching index entry which points to a visible tuple is ''not'' found.&lt;br /&gt;
&lt;br /&gt;
* '''Minimize touching of shared memory'''.  Should lists in shared memory push entries which have just been returned to the ''front'' of the available list, so they will be popped back off soon and some memory might never be touched, or should we keep adding returned items to the ''end'' of the available list?&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4A0019EE.EE98.0025.0@wicourts.gov &amp;quot;Serializable Isolation without blocking&amp;quot; - discusses paper in ACM SIGMOD on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B2788EA020000250002D51C@gw.wicourts.gov &amp;quot;Update on true serializable techniques in MVCC&amp;quot; - discusses Cahill Doctoral Thesis on SSI]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B389C79020000250002D987@gw.wicourts.gov &amp;quot;Serializable implementation&amp;quot; - discusses Wisconsin Court System plans]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4B3B88F4020000250002DAE1@gw.wicourts.gov &amp;quot;A third lock method&amp;quot; - discusses development path: rough prototype to refine toward production]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/1262718843.5908.183.camel@monkey-cat.sm.truviso.com &amp;quot;true serializability and predicate locking&amp;quot; - discusses GiST and GIN issues]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4BF43DF702000025000318BE@gw.wicourts.gov WIP patch for serializable transactions with predicate locking]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-09/msg00022.php &amp;quot;serializable&amp;quot; in comments and names]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4C8F5DB202000025000356A0@gw.wicourts.gov Serializable Snapshot Isolation]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/message-id/4CFB574702000025000382FD@gw.wicourts.gov serializable read only deferrable]&lt;br /&gt;
&lt;br /&gt;
[http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php SSI memory mitigation &amp;amp; false positive degradation]&lt;br /&gt;
&lt;br /&gt;
== Presentations ==&lt;br /&gt;
&lt;br /&gt;
From PostgreSQL Conference U.S. East 2010:&lt;br /&gt;
[[media:Transaction-Isolation-in-PostgreSQL.odp|Current Transaction Isolation in PostgreSQL and future directions]]&lt;br /&gt;
&lt;br /&gt;
From PGCon 2011: &lt;br /&gt;
[http://drkp.net/drkp/papers/ssi-pgcon11-slides.pdf Serializable Snapshot Isolation: Making ISOLATION LEVEL SERIALIZABLE Provide Serializable Isolation]&lt;br /&gt;
&lt;br /&gt;
== Publications ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[1]&amp;lt;/nowiki&amp;gt; [http://doi.acm.org/10.1145/1376616.1376690 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.]  (This paper is listed mostly for context; the subsequent paper covers the same ground and more.)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[2]&amp;lt;/nowiki&amp;gt; [http://hdl.handle.net/2123/5353 Michael James Cahill. 2009. Serializable Isolation for Snapshot Databases. Sydney Digital Theses. University of Sydney, School of Information Technologies.]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[3]&amp;lt;/nowiki&amp;gt; [http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141–259.]&lt;br /&gt;
Of particular interest:&lt;br /&gt;
* 6.1 A Note on ACID&lt;br /&gt;
* 6.2 A Brief Review of Serializability&lt;br /&gt;
* 6.3 Locking and Latching&lt;br /&gt;
* 6.3.1 Transaction Isolation Levels&lt;br /&gt;
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;[4]&amp;lt;/nowiki&amp;gt; [http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt SQL-92]&lt;br /&gt;
Search for ''serial execution'' to find the relevant section.&lt;/div&gt;</description>
			<pubDate>Fri, 25 Nov 2011 21:38:31 GMT</pubDate>			<dc:creator>Kgrittn</dc:creator>			<comments>http://wiki.postgresql.org/wiki/Talk:Serializable</comments>		</item>
		<item>
			<title>Serializable</title>
			<link>http://wiki.postgresql.org/wiki/Serializable</link>
			<guid>http://wiki.postgresql.org/wiki/Serializable</guid>
			<description>&lt;p&gt;Kgrittn:&amp;#32;/* Overview */ Add SSI Algorithm section.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Information about the SSI implementation for the SERIALIZABLE transaction isolation level in PostgreSQL, new in release 9.1.&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with a serialization failure.&lt;br /&gt;
&lt;br /&gt;
This document is oriented toward the techniques used to implement the feature in PostgreSQL.  For information oriented toward application programmers and database administrators, see the [[SSI]] Wiki page.&lt;br /&gt;
&lt;br /&gt;
=== Serializable and Snapshot Transaction Isolation Levels ===&lt;br /&gt;
&lt;br /&gt;
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions.  Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly.  The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.&lt;br /&gt;
&lt;br /&gt;
Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these 