Todo
From PostgreSQL wiki
This list contains known PostgreSQL bugs and feature requests and we hope it is complete. If you would like to work on an item, please read the Developer FAQ first. There is also a development information page.
-
- marks ordinary, incomplete items
- [E]
- marks items that are easier to implement
- [D]
- marks changes that are done, and will appear in the PostgreSQL 9.5 release.
For help on editing this list, please see Talk:Todo. Please do not add items here without discussion on the mailing list.
Development Process
WARNING for Developers: Unfortunately this list does not contain all the information necessary for someone to start coding a feature. Some of these items might have become unnecessary since they were added --- others might be desirable but the implementation might be unclear. When selecting items listed below, be prepared to first discuss the value of the feature. Do not assume that you can select one, code it and then expect it to be committed. Always discuss design on Hackers list before starting to code. The flow should be:
Desirability -> Design -> Implement -> Test -> Review -> Commit
Administration
Allow administrators to cancel multi-statement idle transactions
- This allows locks to be released, but it is complex to report the cancellation back to the client.
Check for unreferenced table files created by transactions that were in-progress when the server terminated abruptly
Set proper permissions on non-system schemas during db creation
- Currently all schemas are owned by the super-user because they are copied from the template1 database. However, since all objects are inherited from the template database, it is not clear that setting schemas to the db owner is correct.
Allow log_min_messages to be specified on a per-module basis
- This would allow administrators to see more detailed information from specific sections of the backend, e.g. checkpoints, autovacuum, etc. Another idea is to allow separate configuration files for each module, or allow arbitrary SET commands to be passed to them. See also Logging Brainstorm.
Simplify creation of partitioned tables
- This would allow creation of partitioned tables without requiring creation of triggers or rules for INSERT/UPDATE/DELETE, and constraints for rapid partition selection. Options could include range and hash partition selection. See also Table partitioning
Implement the SQL-standard mechanism whereby REVOKE ROLE revokes only the privilege granted by the invoking role, and not those granted by other roles
Provide a way to query the log collector subprocess to determine the name of the currently active log file
Allow simpler reporting of the unix domain socket directory and allow easier configuration of its default location
Configuration files
Allow Kerberos to disable stripping of realms so we can check the username@realm against multiple realms
Allow synchronous_standby_names to be disabled after communication failure with all synchronous standby servers exceeds some timeout
- This also requires successful execution of a synchronous notification command.
Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block
- Currently it displays zero.
- [E]
If pg_hba.conf has changed since last config reload and we reject an authentication attempt, emit a HINT to the user and in the log
Tablespaces
Allow a database in tablespace t1 with tables created in tablespace t2 to be used as a template for a new database created with default tablespace t2
- Currently all objects in the default database tablespace must have default tablespace specifications. This is because new databases are created by copying directories. If you mix default tablespace tables and tablespace-specified tables in the same directory, creating a new database from such a mixed directory would create a new database with tables that had incorrect explicit tablespaces. To fix this would require modifying pg_class in the newly copied database, which we don't currently do.
Allow reporting of which objects are in which tablespaces
- This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace.
Allow WAL replay of CREATE TABLESPACE to work when the directory structure on the recovery computer is different from the original
Statistics Collector
Allow statistics last vacuum/analyze execution times to be displayed without requiring track_counts to be enabled
Testing pgstat via pg_regress is tricky and inefficient. Consider making a dedicated pgstat test-suite.
SSL
Allow SSL key file permission checks to be optionally disabled when sharing SSL keys with other applications
Allow SSL CRL files to be re-read during configuration file reload, rather than requiring a server restart
-
Unlike SSL CRT files, CRL (Certificate Revocation List) files are updated frequently
Alternatively or additionally supporting OCSP (online certificate security protocol) would provide real-time revocation discovery without reloading
Point-In-Time Recovery (PITR)
Standby server mode
Prevent variables inherited from the server environment from begin used for making streaming replication connections.
Data Types
Domains
Dates and Times
Allow TIMESTAMP WITH TIME ZONE to store the original timezone information, either zone name or offset from UTC
- If the TIMESTAMP value is stored with a time zone name, interval computations should adjust based on the time zone rules.
Arrays
Binary Data
MONEY Data Type
Text Search
Improve handling of dash and plus signs in email address user names, and perhaps improve URL parsing
XML
Allow reliable XML operation non-UTF8 server encodings (xpath(), in particular, is known to not work)
XML Canonical: Convert XML documents to canonical form to compare them. libxml2 has support for this.
Add pretty-printed XML output option
- Parse a document and serialize it back in some indented form. libxml2 might support this.
Allow XML shredding
- In some cases shredding could be better option (if there is no need to keep XML docs entirely, e.g. if we have already developed tools that understand only relational data. This would be a separate module that implements annotated schema decomposition technique, similar to DB2 and SQL Server functionality.
xpath_table needs to be implemented/implementable to get rid of contrib/xml2 [10]
xpath_table is pretty broken anyway [11]
Improve handling of PIs and DTDs in xmlconcat() [14]
Functions
Enforce typmod for function inputs, function results and parameters for spi_prepare'd statements called from PLs
Prevent malicious functions from being executed with the permissions of unsuspecting users
- Index functions are safe, so VACUUM and ANALYZE are safe too. Triggers, CHECK and DEFAULT expressions, and rules are still vulnerable.
- [E]
Add an ereport wrapper callable from SQL. Note that you'll need to use the anyelement hack to deal with the need for a return type, e.g. ereport(text, anyelement) returns anyelement.
- [D]
Add a built-in array_agg(anyarray) or similar, that can aggregate 1-dimensional arrays into a 2-dimensional array.
- [E]
Add a built-in array_cat_agg (naming to be bikeshedded) that concatenates input 1-dim arrays into a single long 1-dim array
Character Formatting
Throw an error from to_char() instead of printing a string of "#" when a number doesn't fit in the desired output format.
- discussed in "to_char, support for EEEE format"
Allow to_char() on interval values to accumulate the highest unit requested
-
Some special format flag would be required to request such accumulation. Such functionality could also be added to EXTRACT. Prevent accumulation that crosses the month/day boundary because of the uneven number of days in a month.
- to_char(INTERVAL '1 hour 5 minutes', 'MI') => 65
- to_char(INTERVAL '43 hours 20 minutes', 'MI' ) => 2600
- to_char(INTERVAL '43 hours 20 minutes', 'WK:DD:HR:MI') => 0:1:19:20
- to_char(INTERVAL '3 years 5 months','MM') => 41
Multi-Language Support
Add a cares-about-collation column to pg_proc, so that unresolved-collation errors can be thrown at parse time
Change memory allocation for multi-byte functions so memory is allocated inside conversion functions
- Currently we preallocate memory based on worst-case usage.
Add ability to use case-insensitive regular expressions on multi-byte characters
- Currently it works for UTF-8, but not other multi-byte encodings
Improve encoding of connection startup messages sent to the client
- Currently some authentication error messages are sent in the server encoding
Views and Rules
Allow VIEW/RULE recompilation when the underlying tables change
- This is both difficult and controversial.
Make it possible to use RETURNING together with conditional DO INSTEAD rules, such as for partitioning setups
SQL Commands
Improve type determination of unknown (NULL or quoted literal) result columns for UNION/INTERSECT/EXCEPT
Allow prepared transactions with temporary tables created and dropped in the same transaction, and when an ON COMMIT DELETE ROWS temporary table is accessed
Add SQL-standard MERGE/REPLACE/UPSERT command
- MERGE is typically used to merge two tables. An UPSERT command does an INSERT, or in the event of a would-be duplicate violation, an UPDATE. See UPSERT for notes on the implementation details.
Add NOVICE output level for helpful messages
- For example, have it warn about unjoined tables. This could also control automatic sequence/index creation messages.
Allow the count returned by SELECT, etc to be represented as an int64 to allow a higher range of values
Add DEFAULT .. AS OWNER so permission checks are done as the table owner
- This would be useful for SERIAL nextval() calls and CHECK constraints.
Add comments on system tables/columns using the information in catalogs.sgml
- Ideally the information would be pulled from the SGML file automatically.
Allow: COPY ( EXECUTE prepare_label [(...)] ) TO 'local_file.out'
- "query" is defined to accept only SELECT and VALUES but EXECUTE has merit as well.
- Consider, in psql, an idiom of: PREPARE long_query AS SELECT [...]; \copy (EXECUTE long_query) TO ...
CREATE
Move NOT NULL constraint information to pg_constraint
- Currently NOT NULL constraints are stored in pg_attribute without any designation of their origins, e.g. primary keys. One manifest problem is that dropping a PRIMARY KEY constraint does not remove the NOT NULL constraint designation. Another issue is that we should probably force NOT NULL to be propagated from parent tables to children, just as CHECK constraints are. (But then does dropping PRIMARY KEY affect children?)
Consider analyzing temporary tables when they are first used in a query
- Autovacuum cannot analyze or vacuum temporary tables.
UPDATE
ALTER
Allow moving system tables to other tablespaces, where possible
- Currently non-global system tables must be in the default database tablespace. Global system tables can never be moved.
CLUSTER
Automatically maintain clustering on a table
- This might require some background daemon to maintain clustering during periods of low usage. It might also require tables to be only partially filled for easier reorganization. Another idea would be to create a merged heap/index data file so an index lookup would automatically access the heap data too. A third idea would be to store heap rows in hashed groups, perhaps using a user-supplied hash function.
COPY
Allow COPY to report error lines and continue
- This requires the use of a savepoint before each COPY line is processed, with ROLLBACK on COPY failure.
Allow COPY to handle other number formats
- E.g. the German notation. Best would be something like WITH DECIMAL ','.
GRANT/REVOKE
DECLARE CURSOR
INSERT
SHOW/SET
ANALYZE
Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified percentage
Window Functions
See TODO items for window functions.
Support creation of user-defined window functions
- We have the ability to create new window functions written in C. Is it worth the effort to create an API that would let them be written in PL/pgsql, etc?
Implement full support for window framing clauses
-
In addition to done clauses described in the latest doc, these clauses are not implemented yet.
- RANGE BETWEEN ... PRECEDING/FOLLOWING
- EXCLUDE
Investigate tuplestore performance issues
- The tuplestore_in_memory() thing is just a band-aid, we ought to try to solve it properly. tuplestore_advance seems like a weak spot as well.
Integrity Constraints
Keys
Referential Integrity
Fix problem when cascading referential triggers make changes on cascaded tables, seeing the tables in an intermediate state
Check Constraints
Server-Side Languages
Implement stored procedures
- This might involve the control of transaction state and the return of multiple result sets
SQL-Language Functions
PL/pgSQL
Allow listing of record column names, and access to record columns via variables, e.g. columns := r.(*), tval2 := r.(colname)
Allow row and record variables to be set to NULL constants, and allow NULL tests on such variables
- Because a row is not scalar, do not allow assignment from NULL-valued scalars.
PL/Perl
PL/Python
Create a new restricted execution class that will allow passing function arguments in as locals. Passing them as globals means functions cannot be called recursively.
PL/Tcl
Clients
pg_ctl
psql
Move psql backslash database information into the backend, use mnemonic commands?
- This would allow non-psql clients to pull the same information out of the database as psql.
Make psql's \d commands distinguish default privileges from no privileges
- ACL displays were visibly different for the two cases before we "improved" them by using array_to_string.
Add a \set variable to control whether \s displays line numbers
- Another option is to add \# which lists line numbers, and allows command execution.
Add option to wrap column values at whitespace boundaries, rather than chopping them at a fixed width.
- Currently, "wrapped" format chops values into fixed widths. Perhaps the word wrapping could use the same algorithm documented in the W3C specification.
Support the ReST table output format
- Details about the ReST format: http://docutils.sourceforge.net/rst.html#reference-documentation
- [E]
When psql -f filename sees the PGDMP marker, abort with an informative error like "this is a PostgreSQL custom-format dump, restore with pg_restore"
pg_dump / pg_restore
- [E]
Add full object name to the tag field. eg. for operators we need '=(integer, integer)', instead of just '='.
- [E]
Modify pg_dump to create skeleton views for reload (which are then updated via CREATE OR REPLACE VIEW) when views have circular dependencies. This should eliminate the need for the CREATE RULE "_RETURN" hack currently used to address this issue. Thread and additional information here:
Avoid using platform-dependent locale names in pg_dumpall output
- Using native locale names puts roadblocks in the way of porting a dump to another platform. One possible solution is to get CREATE DATABASE to accept some agreed-on set of locale names and fix them up to meet the platform's requirements.
Remove support for dumping from pre-7.3 servers
- In 7.3 and later, we can get accurate dependency information from the server. pg_dump still contains a lot of crufty code to try to deal with the lack of dependency info in older servers, but the usefulness of maintaining that code grows small.
Refactor handling of database attributes between pg_dump and pg_dumpall
- Currently only pg_dumpall emits database attributes, such as ALTER DATABASE SET commands and database-level GRANTs. Many people wish that pg_dump would do that. One proposal is to let pg_dump issue such commands if the -C switch was used, but it's unclear whether that will satisfy the demand.
Change pg_dump so that a comment on the dumped database is applied to the loaded database, even if the database has a different name.
- This will require new backend syntax, perhaps COMMENT ON CURRENT DATABASE. This is related to the previous item.
ecpg
Docs
- Document differences between ecpg and the SQL standard and information about the Informix-compatibility module.
- [E]
Fix small memory leaks in ecpg
- Memory leaks in a short running application like ecpg are not really a problem, but make debugging more complicated
libpq
Prevent PQfnumber() from lowercasing unquoted column names
- PQfnumber() should never have been doing lowercasing, but historically it has so we need a way to prevent it
Triggers
Improve storage of deferred trigger queue
- Right now all deferred trigger information is stored in backend memory. This could exhaust memory for very large trigger queues. This item involves dumping large queues into files, or doing some kind of join to process all the triggers, some bulk operation, or a bitmap.
Allow triggers to be disabled in only the current session.
- This is currently possible by starting a multi-statement transaction, modifying the system tables, performing the desired SQL, restoring the system tables, and committing the transaction. ALTER TABLE ... TRIGGER requires a table lock so it is not ideal for this usage.
With disabled triggers, allow pg_dump to use ALTER TABLE ADD FOREIGN KEY
- If the dump is known to be valid, allow foreign keys to be added without revalidating the data.
When statement-level triggers are defined on a parent table, have them fire only on the parent table, and fire child table triggers only where appropriate
Inheritance
Honor UNIQUE INDEX on base column in INSERTs/UPDATEs on inherited table, e.g. INSERT INTO inherit_table (unique_index_col) VALUES (dup) should fail
- The main difficulty with this item is the problem of creating an index that can span multiple tables.
Determine whether ALTER TABLE / SET SCHEMA should work on inheritance hierarchies (and thus support ONLY). If yes, implement it.
ALTER TABLE variants sometimes support recursion and sometimes not, but this is poorly/not documented, and the ONLY marker would then be silently ignored. Clarify the documentation, and reject ONLY if it is not supported.
Indexes
Prevent index uniqueness checks when UPDATE does not modify the column
- Uniqueness (index) checks are done when updating a column even if the column is not modified by the UPDATE. However, HOT already short-circuits this in common cases, so more work might not be helpful.
Allow the creation of on-disk bitmap indexes which can be quickly combined with other bitmap indexes
-
Such indexes could be more compact if there are only a few distinct values. Such indexes can also be compressed. Keeping such indexes updated can be costly.
- Re: Bitmap index AM
- Bitmap index thoughts
- Stream bitmaps
- Re: Bitmapscan changes - Requesting further feedback
- Updated bitmap index patch
- Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
- Bitmap Indexes: request for feedback
- http://archives.postgresql.org/message-id/800923.27831.qm@web29010.mail.ird.yahoo.com
Allow accurate statistics to be collected on indexes with more than one column or expression indexes, perhaps using per-index statistics
- Re: Simple join optimized badly?
- Stats for multi-column indexes
- Cross-column statistics revisited
- Multi-Dimensional Histograms
- http://archives.postgresql.org/pgsql-hackers/2010-12/msg00913.php
- http://archives.postgresql.org/pgsql-hackers/2010-12/msg02179.php
- http://archives.postgresql.org/pgsql-hackers/2011-01/msg00459.php
- http://archives.postgresql.org/pgsql-hackers/2011-02/msg02054.php
- http://archives.postgresql.org/pgsql-hackers/2011-04/msg01731.php
- http://archives.postgresql.org/pgsql-hackers/2011-03/msg00894.php
- http://archives.postgresql.org/pgsql-hackers/2011-09/msg00679.php
Consider smaller indexes that record a range of values per heap page, rather than having one index entry for every heap row
- This is useful if the heap is clustered by the indexed values.
Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY
- This is difficult because you must upgrade to an exclusive table lock to replace the existing index file. CREATE INDEX CONCURRENTLY does not have this complication. This would allow index compaction without downtime.
Allow multiple indexes to be created concurrently, ideally via a single heap scan
- pg_restore allows parallel index builds, but it is done via subprocesses, and there is no SQL interface for this. Cluster could definitely benefit from this.
Consider using "effective_io_concurrency" for index scans
- Currently only bitmap scans use this, which might be fine because most multi-row index scans use bitmap scans.
GIST
Hash
Sorting
Allow sorts of skinny tuples to use even more available memory.
- Now that it is not limited by MaxAllocSize, don't limit by INT_MAX either.
- http://www.postgresql.org/message-id/CA+U5nMKkRMin1pV8VMpS6_n7hcOWSG0kZS3oFL9JOa8DV6vJyQ@mail.gmail.com
Fsync
Cache Usage
Provide a way to calculate an "estimated COUNT(*)"
- Perhaps by using the optimizer's cardinality estimates or random sampling.
Consider automatic caching of statements at various levels:
- Parsed query tree
- Query execute plan
- Query results
Consider allowing higher priority queries to have referenced buffer cache pages stay in memory longer
Vacuum
Auto-fill the free space map by scanning the buffer cache or by checking pages written by the background writer
Consider having single-page pruning update the visibility map
- https://commitfest.postgresql.org/action/patch_view?id=75
- Re: visibility maps and heap_prune
Bias FSM towards returning free space near the beginning of the heap file, in hopes that empty pages at the end can be truncated by VACUUM
Auto-vacuum
Prevent long-lived temporary tables from causing frozen-xid advancement starvation
- The problem is that autovacuum cannot vacuum them to set frozen xids; only the session that created them can do that.
Locking
Fix problem when multiple subtransactions of the same outer transaction hold different types of locks, and one subtransaction aborts
Startup Time Improvements
Experiment with multi-threaded backend for backend creation
- This would prevent the overhead associated with process creation. Most operating systems have trivial process creation time compared to database startup overhead, but a few operating systems (Win32, Solaris) might benefit from threading. Also explore the idea of a single session using multiple threads to execute a statement faster.
Write-Ahead Log
Eliminate need to write full pages to WAL before page modification
-
Currently, to protect against partial disk page writes, we write full page images to WAL before they are modified so we can correct any partial page writes during recovery. These pages can also be eliminated from point-in-time archive files.
- Re: Index Scans become Seq Scans after VACUUM ANALYSE
- http://archives.postgresql.org/pgsql-hackers/2011-05/msg01191.php
- WIP double writes
- double writes
- Double-write with Fast Checksums
- double writes using "double-write buffer" approach
- http://archives.postgresql.org/pgsql-hackers/2012-10/msg01463.php
When full page writes are off, write CRC to WAL and check file system blocks on recovery
- If CRC check fails during recovery, remember the page in case a later CRC for that page properly matches. The difficulty is that hint bits are not WAL logged, meaning a valid page might not match the earlier CRC.
Write full pages during file system write and not when the page is modified in the buffer cache
- This allows most full page writes to happen in the background writer. It might cause problems for applying WAL on recovery into a partially-written page, but later the full page will be replaced from WAL.
Find a way to reduce rotational delay when repeatedly writing last WAL page
- Currently fsync of WAL requires the disk platter to perform a full rotation to fsync again. One idea is to write the WAL to different offsets that might reduce the rotational delay.
Speed WAL recovery by allowing more than one page to be prefetched
- This should be done utilizing the same infrastructure used for prefetching in general to avoid introducing complex error-prone code in WAL replay.
Optimizer / Executor
Consider increasing the default values of from_collapse_limit, join_collapse_limit, and/or geqo_threshold
Log statements where the optimizer row estimates were dramatically different from the number of rows actually found?
Hashing
Background Writer
Consider having the background writer update the transaction status hint bits before writing out the page
- Implementing this requires the background writer to have access to system catalogs and the transaction status log.
Test to see if calling PreallocXlogFiles() from the background writer will help with WAL segment creation latency
Concurrent Use of Resources
Do async I/O for faster random read-ahead of data
-
Async I/O allows multiple I/O requests to be sent to the disk with results coming back asynchronously.
- Asynchronous I/O Support
- Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1
- There's random access and then there's random access
- Bitmap index scan preread using posix_fadvise (Was: There's random access and then there's random access)
The above patch is already applied as of 8.4, but it still remains to figure out how to handle plain indexscans effectively.
Experiment with multi-threaded backend for better I/O utilization
- This would allow a single query to make use of multiple I/O channels simultaneously. One idea is to create a background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be expanded to allow concurrent reads from multiple devices in a partitioned table.
Experiment with multi-threaded backend for better CPU utilization
- This would allow several CPUs to be used for a single query, such as for sorting or query execution.
TOAST
Monitoring
Add column to pg_stat_activity that shows the progress of long-running commands like CREATE INDEX and VACUUM
- EXPLAIN progress info
- The CLUSTER/VACUUM FULL implementation would also be useful to track this way
Miscellaneous Performance
Use mmap() rather than shared memory for shared buffers?
- This would remove the requirement for SYSV SHM but would introduce portability issues. Anonymous mmap (or mmap to /dev/zero) is required to prevent I/O overhead. We could also consider mmap() for writing WAL.
Rather than consider mmap()-ing in 8k pages, consider mmap()'ing entire files into a backend?
- Doing I/O to large tables would consume a lot of address space or require frequent mapping/unmapping. Extending the file also causes mapping problems that might require mapping only individual pages, leading to thousands of mappings. Another problem is that there is no way to _prevent_ I/O to disk from the dirty shared buffers so changes could hit disk before WAL is written.
Consider ways of storing rows more compactly on disk:
- Reduce the row header size?
- Consider reducing on-disk varlena length from four bytes to two because a heap row cannot be more than 64k in length
Allow configuration of backend priorities via the operating system
- Though backend priorities make priority inversion during lock waits possible, research shows that this is not a huge problem.
Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation
Consider decreasing the I/O caused by updating tuple hint bits
- Hint Bits and Write I/O
- Re: [HACKERS] Hint Bits and Write I/O
- http://archives.postgresql.org/pgsql-hackers/2010-10/msg00695.php
- http://archives.postgresql.org/pgsql-hackers/2010-11/msg00792.php
- http://archives.postgresql.org/pgsql-hackers/2011-01/msg01063.php
- http://archives.postgresql.org/pgsql-hackers/2011-03/msg01408.php
- http://archives.postgresql.org/pgsql-hackers/2011-03/msg01453.php
Avoid the requirement of freezing pages that are infrequently modified
-
If all rows on a page are visible, it is possible to set a bit in the visibility map (once the visibility map is 100% reliable) and not need to freeze the page, avoiding a page rewrite
- http://archives.postgresql.org/message-id/4BF701CF.2090205@agliodbs.com
- http://archives.postgresql.org/pgsql-hackers/2010-06/msg00082.php
- http://www.postgresql.org/message-id/20130523175148.GA29374@alap2.anarazel.de
- http://www.postgresql.org/message-id/CA+TgmoaEmnoLZmVbb8gvY69NA8zw9BWpiZ9+TLz-LnaBOZi7JA@mail.gmail.com
- http://www.postgresql.org/message-id/51A7553E.5070601@vmware.com
Restructure truncation logic to be more resistant to failure
- This also involves not writing dirty buffers for a truncated or dropped relation
Consider adding logic to increase large tables by more than 8k
- This would reduce file system fragmentation
Miscellaneous Other
Source Code
Improve detection of shared memory segments being used by others by checking the SysV shared memory field 'nattch'
/contrib/pg_upgrade
Handle large object comments
- This is difficult to do because the large object doesn't exist when --schema-only is loaded.
Migrate pg_statistic by dumping it out as a flat file, so analyze is not necessary
- pg_class.oid is not preserved so schema.tablename must be used.
Improve testing, perhaps using the buildfarm
- The buildfarm has access to multiple versions of PostgreSQL.
Create machine-readable output of pg_controldata
- This would avoid parsing its output. The problem is we need pg_controldata output from both the old and new clusters so we would need to support both formats.
Desired changes that would prevent upgrades with pg_upgrade
- 32-bit page checksums
- Add metapage to GiST indexes
- Clean up hstore's internal representation
- Remove tuple infomask bit HEAP_MOVED_OFF and HEAP_MOVED_IN
- fix char() index trailing space handling
- Use non-collation-aware comparisons for GIN opclasses
Windows
Wire Protocol Changes
Let the client indicate character encoding of database names, user names, passwords, and of pre-auth error messages returned by the server
Update clients to use data types, typmod, schema.table.column names of result sets using new statement protocol
Allow negotiation of encryption, STARTTLS style, rather than forcing client to decide on SSL or !SSL before connecting
Documentation
- [E]
Add contrib functions to the index
- Add the functions and GUCs in the contrib modules to the documentation index: per list discussion
Exotic Features
Add pre-parsing phase that converts non-ISO syntax to supported syntax
- This could allow SQL written for other databases to run without modification.
Add features of Oracle-style packages
- A package would be a schema with session-local variables, public/private functions, and initialization functions. It is also possible to implement these capabilities in any schema and not use a separate "packages" syntax at all.
Features We Do Not Want
The following features have been discussed ad nauseum on the PostgreSQL mailing lists and the consensus has been that the project is not interested in them. As such, if you are going to bring them up as potential features, you will want to be familiar with all of the arguments against these features which have been previously made over the years. If you decide to work on such features anyway, you should be aware that you face a higher-than-normal barrier to get the Project to accept them.
All backends running as threads in a single process (not wanted)
- This eliminates the process protection we get from the current setup. Thread creation is usually the same overhead as process creation on modern systems, so it seems unwise to use a pure threaded model, and MySQL and DB2 have demonstrated that threads introduce as many issues as they solve. Threading specific operations such as I/O, seq scans, and connection management has been discussed and will probably be implemented to enable specific performance features. Moving to a threaded engine would also require halting all other work on PostgreSQL for one to two years.
"Oracle-style" optimizer hints (not wanted)
- Optimizer hints, as implemented in Oracle and other RDBMSes, are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have such problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. See Optimizer Hints Discussion for further information.
Embedded server (not wanted)
- While PostgreSQL clients runs fine in limited-resource environments, the server requires multiple processes and a stable pool of resources to run reliably and efficiently. Stripping down the PostgreSQL server to run in the same process address space as the client application would add too much complexity and failure cases. Besides, there are several very mature embedded SQL databases already available.
Obfuscated function source code (not wanted)
- Obfuscating function source code has minimal protective benefits because anyone with super-user access can find a way to view the code. At the same time, it would greatly complicate backups and other administrative tasks. To prevent non-super-users from viewing function source code, remove SELECT permission on pg_proc.
Indeterminate behavior for the GROUP BY clause (not wanted)
- At least one other database product allows specification of a subset of the result columns which GROUP BY would need to be able to provide predictable results; the server is free to return any value from the group. This is not viewed as a desirable feature. PostgreSQL 9.1 allows result columns that are not referenced by GROUP BY if a primary key for the same table is referenced in GROUP BY.