User:Simon
From PostgreSQL wiki
Jump to navigationJump to searchSimon's Work in Progress: Prioritised Todo
This list contains all known PostgreSQL bugs and feature requests. If you would like to work on an item, please read the Developer FAQ first. There is also a development information page.
- - marks ordinary, incomplete items
- [E] - marks items that are easier to implement
- [D] - marks changes that are done, and will appear in the next major release
For help on editing this list, please see Talk:Todo. Please do not add items here without discussion on the mailing list.
Administration
- Allow administrators to cancel multi-statement idle transactions
- This allows locks to be released, but it is complex to report the cancellation back to the client.
- Check for unreferenced table files created by transactions that were in-progress when the server terminated abruptly
- Set proper permissions on non-system schemas during db creation
- Currently all schemas are owned by the super-user because they are copied from the template1 database. However, since all objects are inherited from the template database, it is not clear that setting schemas to the db owner is correct.
- Allow log_min_messages to be specified on a per-module basis
- This would allow administrators to see more detailed information from specific sections of the backend, e.g. checkpoints, autovacuum, etc. Another idea is to allow separate configuration files for each module, or allow arbitrary SET commands to be passed to them. See also Logging Brainstorm.
- Simplify ability to create partitioned tables
- This would allow creation of partitioned tables without requiring creation of triggers or rules for INSERT/UPDATE/DELETE, and constraints for rapid partition selection. Options could include range and hash partition selection. See also Table partitioning
- Allow auto-selection of partitioned tables for min/max() operations
- There was a patch on -hackers from July 2009, but it has not been merged: http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php
- [D] Allow more complex user/database default GUC settings
- Currently ALTER USER and ALTER DATABASE support per-user and per-database defaults. Consider adding per-user-and-database defaults so things like search_path can be defaulted for a specific user connecting to a specific database.
- Allow custom variable classes that can restrict who can set the values
- The common cases (POSTMASTER, SIGHUP, and SUSET) are already handled to some extent as of 8.4. Should we mark this DONE?
- Implement the SQL standard mechanism whereby REVOKE ROLE revokes only the privilege granted by the invoking role, and not those granted by other roles
- Provide a way to query the log collector subprocess to determine what the currently active log file is
- Allow the client to authenticate the server in a Unix-domain socket connection, e.g., using SO_PEERCRED
- [D] Allow server-side enforcement of password policies
- Password checks might include password complexity or non-reuse of passwords. This facility will require the client to send password creation/changes to the server in plain-text, not MD5.
- Allow custom daemons to be automatically stopped/started along postmaster
- This allows easier administration of daemons like user job schedulers or replication-related daemons.
Configuration files
- Allow pg_hba.conf to specify host names along with IP addresses
- Host name lookup could occur when the postmaster reads the pg_hba.conf file, or when the backend starts. Another solution would be to reverse lookup the connection IP and check that hostname against the host names in pg_hba.conf. We could also then check that the host name maps to the IP address.
- Allow Kerberos to disable stripping of realms so we can check the username@realm against multiple realms
Tablespaces
- Allow a database in tablespace t1 with tables created in tablespace t2 to be used as a template for a new database created with default tablespace t2
- Currently all objects in the default database tablespace must have default tablespace specifications. This is because new databases are created by copying directories. If you mix default tablespace tables and tablespace-specified tables in the same directory, creating a new database from such a mixed directory would create a new database with tables that had incorrect explicit tablespaces. To fix this would require modifying pg_class in the newly copied database, which we don't currently do.
- Allow reporting of which objects are in which tablespaces
- This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace.
- Allow WAL replay of CREATE TABLESPACE to work when the directory structure on the recovery computer is different from the original
Statistics Collector
- Allow statistics last vacuum/analyze execution times to be displayed without requiring track_counts to be enabled
Point-In-Time Recovery (PITR)
- [E] Create dump tool for write-ahead logs for use in determining transaction id for point-in-time recovery
- This is useful for checking PITR recovery.
- [E] Expose pg_controldata via SQL interface
- Helpful for monitoring replicated databases; initial patch
SSL
- Allow SSL key file permission checks to be optionally disabled when sharing SSL keys with other applications
- Allow SSL CRL files to be re-read during configuration file reload, rather than requiring a server restart
- Unlike SSL CRT files, CRL (Certificate Revocation List) files are updated frequently Alternatively or additionally supporting OCSP (online certificate security protocol) would provide real-time revocation discovery without reloading
Data Types
- Add JSON (JavaScript Object Notation) data type
- This would behave similar to the XML data type, which is stored as text, but allows element lookup and conversion functions.
Domains
Dates and Times
- Allow TIMESTAMP WITH TIME ZONE to store the original timezone information, either zone name or offset from UTC
- If the TIMESTAMP value is stored with a time zone name, interval computations should adjust based on the time zone rules.
- Improve timestamptz subtraction to be DST-aware
- Currently subtracting one date from another that crosses a daylight savings time adjustment can return '1 day 1 hour', but adding that back to the first date returns a time one hour in the future. This is caused by the adjustment of '25 hours' to '1 day 1 hour', and '1 day' is the same time the next day, even if daylight savings adjustments are involved.
- [D] Revise the src/timezone/tznames abbreviation files:
- to add missing abbreviations
- to find abbreviations that can be safely promoted to the Default list
- BUG #4377: casting result of timeofday() to timestamp fails in some timezones
Arrays
Binary Data
- Allow read/write into TOAST values like large objects
- This requires the TOAST column to be stored EXTERNAL.
MONEY Data Type
Text Search
XML
- Inline ORDER BY for XMLAGG. Example: "... XMLAGG(XMLELEMENT(...) ORDER BY col1) ..." (should be made to work with all aggregate functions)
- XML Canonical: Convert XML documents to canonical form to compare them. libxml2 has support for this.
- Pretty-printing XML: Parse a document and serialize it back in some indented form. libxml2 might support this.
- In some cases shredding could be better option (if there is no need in keeping XML docs entirely; if we have already developed tools that understand only relational data; etc) -- it would be a separate module that implements annotated schema decomposition technique, similar to DB2 and SQL Server functionality.
- xpath_table needs to be implemented/implementable to get rid of contrib/xml2 [9]
Functions
- Throw an error from to_char() instead of printing a string of "#" when a number doesn't fit in the desired output format.
- discussed in "to_char, support for EEEE format"
- Allow to_char() on interval values to accumulate the highest unit requested
-
Some special format flag would be required to request such accumulation. Such functionality could also be added to EXTRACT. Prevent accumulation that crosses the month/day boundary because of the uneven number of days in a month.
- to_char(INTERVAL '1 hour 5 minutes', 'MI') => 65
- to_char(INTERVAL '43 hours 20 minutes', 'MI' ) => 2600
- to_char(INTERVAL '43 hours 20 minutes', 'WK:DD:HR:MI') => 0:1:19:20
- to_char(INTERVAL '3 years 5 months','MM') => 41
- Allow SQL-language functions to reference parameters by parameter name
- Currently SQL-language functions can only refer to dollar parameters, e.g. $1
- Enforce typmod for function inputs, function results and parameters for spi_prepare'd statements called from PLs
- Add missing operators for geometric data types
- Some geometric types do not have the full suite of geometric operators, e.g. box @> point
- Prevent malicious functions from being executed with the permissions of unsuspecting users
- Index functions are safe, so VACUUM and ANALYZE are safe too. Triggers, CHECK and DEFAULT expressions, and rules are still vulnerable.
Multi-Language Support
- Allow more fine-grained collation selection; add CREATE COLLATION.
-
Right now the collation is fixed at database creation time.
- Re: Patch for collation using ICU
- FW: Win32 unicode vs ICU
- Re: FW: Win32 unicode vs ICU
- Proof of concept COLLATE support with patch
- For review: Initial support for COLLATE
- Proposed COLLATE implementation
- TODO item: locale per database patch (new iteration)
- Re: FW: Win32 unicode vs ICU
- Re: Fixed length data types issue
- http://archives.postgresql.org/pgsql-hackers/2008-07/msg00557.php
- Todo:Collate
- Todo:ICU
- http://archives.postgresql.org/pgsql-hackers/2008-08/msg01362.php
- http://archives.postgresql.org/pgsql-hackers/2008-09/msg00012.php
- http://archives.postgresql.org/pgsql-hackers/2008-10/msg00868.php
- Unicode collation algorithm
- Set client encoding based on the client operating system encoding
- Currently client_encoding is set in postgresql.conf, which defaults to the server encoding.
- Change memory allocation for multi-byte functions so memory is allocated inside conversion functions
- Currently we preallocate memory based on worst-case usage.
- Add ability to use case-insensitive regular expressions on multi-byte characters
- ILIKE already works with multi-byte characters
- Improve encoding of connection startup messages sent to the client
- Currently some authentication error messages are sent in the server encoding
Views / Rules
- Automatically create rules on views so they are updateable, per SQL:2008
- We can only auto-create rules for simple views. For more complex cases users will still have to write rules manually.
- Allow VIEW/RULE recompilation when the underlying tables change
- This is both difficult and controversial.
- Make it possible to use RETURNING together with conditional DO INSTEAD rules, such as for partitioning setups
- Add the ability to automatically create materialized views
- Right now materialized views require the user to create triggers on the main table to keep the summary table current. SQL syntax should be able to manage the triggers and summary table automatically. A more sophisticated implementation would automatically retrieve from the summary table when the main table is referenced, if possible. See Materalized Views for implementation details.
SQL Commands
- [E] Fix TRUNCATE ... RESTART IDENTITY so its effect on sequences is rolled back on transaction abort
- Allow finer control over the caching of prepared query plans
- Currently anonymous (un-named) queries prepared via the wire protocol are replanned every time bind parameters are supplied --- allow SQL PREPARE to do the same. Also, allow control over replanning prepared queries either manually or automatically when statistics for execute parameters differ dramatically from those used during planning.
- Allow prepared transactions with temporary tables created and dropped in the same transaction, and when an ON COMMIT DELETE ROWS temporary table is accessed
- Add SQL-standard MERGE/REPLACE/UPSERT command
- MERGE is typically used to merge two tables. REPLACE or UPSERT command does UPDATE, or on failure, INSERT. See SQL MERGE for notes on the implementation details.
- Enable standard_conforming_strings by default in 9.1?
- When this is done, backslash-quote should be prohibited in non-E'' strings because of possible confusion over how such strings treat backslashes. Basically, '' is always safe for a literal single quote, while \' might or might not be based on the backslash handling rules.
- Allow the count returned by SELECT, etc to be represented as an int64 to allow a higher range of values
- Add DEFAULT .. AS OWNER so permission checks are done as the table owner
- This would be useful for SERIAL nextval() calls and CHECK constraints.
- Add column to pg_stat_activity that shows the progress of long-running commands like CREATE INDEX and VACUUM
- Allow INSERT/UPDATE/DELETE ... RETURNING inside a SELECT 'FROM' clause or target list
- Actually it would be saner to allow this in WITH
- Add comments on system tables/columns using the information in catalogs.sgml
- Ideally the information would be pulled from the SGML file automatically.
- Support LATERAL subqueries
- Lateral subqueries can reference columns of tables defined outside the subquery at the same level, i.e. laterally. For example, a LATERAL subquery in a FROM clause could reference tables defined in the same FROM clause. Currently only the columns of tables defined above subqueries are recognized.
- [D] Forbid COMMENT on columns of an index
- Postgres currently allows comments to be placed on the columns of an index, but pg_dump doesn't handle them and the column names themselves are implementation-dependent.
- Add support for functional dependencies
- This would allow omitting GROUP BY columns when grouping by the primary key.
CREATE
- Move NOT NULL constraint information to pg_constraint
- Currently NOT NULL constraints are stored in pg_attribute without any designation of their origins, e.g. primary keys. One manifest problem is that dropping a PRIMARY KEY constraint does not remove the NOT NULL constraint designation. Another issue is that we should probably force NOT NULL to be propagated from parent tables to children, just as CHECK constraints are. (But then does dropping PRIMARY KEY affect children?)
- CREATE OR REPLACE FUNCTION might leave dependent objects depending on the function in inconsistent state
UPDATE
ALTER
- Allow moving system tables to other tablespaces, where possible
- Currently non-global system tables must be in the default database tablespace. Global system tables can never be moved.
CLUSTER
- Automatically maintain clustering on a table
- This might require some background daemon to maintain clustering during periods of low usage. It might also require tables to be only partially filled for easier reorganization. Another idea would be to create a merged heap/index data file so an index lookup would automatically access the heap data too. A third idea would be to store heap rows in hashed groups, perhaps using a user-supplied hash function.
- [E] Add default clustering to system tables
- To do this, determine the ideal cluster index for each system table and set the cluster setting during initdb.
COPY
- Allow COPY to report error lines and continue
- This requires the use of a savepoint before each COPY line is processed, with ROLLBACK on COPY failure.
- Allow COPY on a newly-created table to skip WAL logging
- On crash recovery, the table involved in the COPY would be removed or have its heap and index files truncated. One issue is that no other backend should be able to add to the table at the same time, which is something that is currently allowed. This currently is done if the table is created inside the same transaction block as the COPY because no other backends can see the table.
- Allow COPY in CSV mode to control whether a quoted zero-length string is treated as NULL
- Currently this is always treated as a zero-length string, which generates an error when loading into an integer column
- Allow COPY to handle other number formats eg. the German notation. Best would be something like WITH DECIMAL ','.
GRANT/REVOKE
- [D] Allow GRANT/REVOKE permissions to be applied to all schema objects with one command
- The proposed syntax is: GRANT SELECT ON ALL TABLES IN public TO phpuser; GRANT SELECT ON NEW TABLES IN public TO phpuser;
DECLARE CURSOR
INSERT
SHOW/SET
LISTEN/NOTIFY
- [D] Allow LISTEN/NOTIFY to store info in memory rather than tables
- Currently LISTEN/NOTIFY information is stored in pg_listener. Storing such information in memory would improve performance.
Window Functions
See TODO items for window functions.
- Support creation of user-defined window functions.
- We have the ability to create new window functions written in C. Is it worth the effort to create an API that would let them be written in PL/pgsql, etc?
- Implement full support for window framing clauses.
- The cases we support now are basically those where no row ever exits the frame as the current row advances. To do better requires some rethinking of the window aggregate support.
- Look at tuplestore performance issues.
- The tuplestore_in_memory() thing is just a band-aid, we ought to try to solve it properly. tuplestore_advance seems like a weak spot as well.
Integrity Constraints
Keys
Referential Integrity
- Fix problem when cascading referential triggers make changes on cascaded tables, seeing the tables in an intermediate state
Server-Side Languages
PL/pgSQL
- [D] Allow function parameters to be passed by name, get_employee_salary(12345 AS emp_id, 2001 AS tax_year)
- Allow listing of record column names, and access to record columns via variables, e.g. columns := r.(*), tval2 := r.(colname)
- Allow row and record variables to be set to NULL constants, and allow NULL tests on such variables
- Because a row is not scalar, do not allow assignment from NULL-valued scalars.
PL/Perl
PL/Python
- Create a new restricted execution class that will allow passing function arguments in as locals. Passing them as globals means functions cannot be called recursively.
- Functions cache the input and output functions for their arguments, so the following will make PostgreSQL unhappy: create table users (first_name text, last_name text); create function user_name(user) returns text as 'mycode' language plpython; select user_name(user) from users; alter table add column user_id integer; select user_name(user) from users; You have to drop and create the function(s) each time its arguments are modified (not nice), or don't cache the input and output functions (slower?), or check if the structure of the argument has been altered (is this possible, easy, quick?) and recreate cache.
PL/Tcl
Clients
pg_ctl
- Allow pg_ctl to work properly with configuration files located outside the PGDATA directory
- pg_ctl can not read the pid file because it isn't located in the config directory but in the PGDATA directory. The solution is to allow pg_ctl to read and understand postgresql.conf to find the data_directory value.
- Have the postmaster write a random number to a file on startup that pg_ctl checks against the contents of a pg_ping response on its initial connection (without login)
- This will protect against connecting to an old instance of the postmaster in a different or deleted subdirectory.
- Modify pg_ctl behavior and exit codes to make it easier to write an LSB conforming init script
- It may be desirable to condition some of the changes on a command-line switch, to avoid breaking existing scripts. A Linux shell (sh) script is referenced which has been tested and seems to provide a high degree of conformance in multiple environments. Study of this script might suggest areas where pg_ctl could be modified to make writing an LSB conforming script easier; however, some aspects of that script would be unnecessary with other suggested changes to pg_ctl, and discussion on the lists did not reach consensus on support for all aspects of this script. Further discussion of particular changes is needed before beginning any work. These threads should be studied for other ideas on improvements:
psql
- Move psql backslash database information into the backend, use mnemonic commands?
- This would allow non-psql clients to pull the same information out of the database as psql.
- Add auto-expanded mode so expanded output is used if the row length is wider than the screen width.
- Consider using auto-expanded mode for backslash commands like \df+.
- Prevent tab completion of SET TRANSACTION from querying the database and therefore preventing the transaction isolation level from being set.
- Currently SET <tab> causes a database lookup to check all supported session variables. This query causes problems because setting the transaction isolation level must be the first statement of a transaction.
- Add a \set variable to control whether \s displays line numbers
- Another option is to add \# which lists line numbers, and allows command execution.
- Add option to wrap column values at whitespace boundaries, rather than chopping them at a fixed width.
- Currently, "wrapped" format chops values into fixed widths. Perhaps the word wrapping could use the same algorithm documented in the W3C specification.
- Add "auto" expanded mode that outputs in expanded format if "wrapped" mode can't wrap the output to the screen width
- Support the ReST table output format
- Details about the ReST format: http://docutils.sourceforge.net/rst.html#reference-documentation
pg_dump / pg_restore
- [E] Add full object name to the tag field. eg. for operators we need '=(integer, integer)', instead of just '='.
- Avoid using platform-dependent locale names in pg_dumpall output
- Using native locale names puts roadblocks in the way of porting a dump to another platform. One possible solution is to get CREATE DATABASE to accept some agreed-on set of locale names and fix them up to meet the platform's requirements.
- Add support for multiple pg_restore -t options, like pg_dump
- pg_restore's -t switch is less useful than pg_dump's in quite a few ways: no multiple switches, no pattern matching, no ability to pick up indexes and other dependent items for a selected table. It should be made to handle this switch just like pg_dump does.
- Allow pg_dump --clean to drop roles that own objects or have privileges
- tgl says: if this is about pg_dumpall, it's done as of 8.4. If it's really about pg_dump, what does it mean? pg_dump has no business dropping roles.
- Change pg_dump so that a comment on the dumped database is applied to the loaded database, even if the database has a different name. This will require new backend syntax, perhaps COMMENT ON CURRENT DATABASE.
- Allow pg_dump to utilize multiple CPUs and I/O channels by dumping multiple objects simultaneously
- The difficulty with this is getting multiple dump processes to produce a single dump output file. It also would require several sessions to share the same snapshot.
- Remove support for dumping from pre-7.3 servers
- In 7.3 and later, we can get accurate dependency information from the server. pg_dump still contains a lot of crufty code to try to deal with the lack of dependency info in older servers, but the usefulness of maintaining that code grows small.
ecpg
- Docs
- Document differences between ecpg and the SQL standard and information about the Informix-compatibility module.
- [D] Implement SQLDA
- add sqlda support to ecpg in both native and compatibility mode
libpq
- Prevent PQfnumber() from lowercasing unquoted column names
- PQfnumber() should never have been doing lowercasing, but historically it has so we need a way to prevent it
- Allow statement results to be automatically batched to the client
- Currently all statement results are transferred to the libpq client before libpq makes the results available to the application. This feature would allow the application to make use of the first result rows while the rest are transferred, or held on the server waiting for them to be requested by libpq. One complexity is that a statement like SELECT 1/col could error out mid-way through the result set.
Triggers
- Improve storage of deferred trigger queue
- Right now all deferred trigger information is stored in backend memory. This could exhaust memory for very large trigger queues. This item involves dumping large queues into files, or doing some kind of join to process all the triggers, some bulk operation, or a bitmap.
- Allow triggers to be disabled in only the current session.
- This is currently possible by starting a multi-statement transaction, modifying the system tables, performing the desired SQL, restoring the system tables, and committing the transaction. ALTER TABLE ... TRIGGER requires a table lock so it is not ideal for this usage.
- With disabled triggers, allow pg_dump to use ALTER TABLE ADD FOREIGN KEY
- If the dump is known to be valid, allow foreign keys to be added without revalidating the data.
- When statement-level triggers are defined on a parent table, have them fire only on the parent table, and fire child table triggers only where appropriate
- Allow AFTER triggers on system tables
- System tables are modified in many places in the backend without going through the executor and therefore not causing triggers to fire. To complete this item, the functions that modify system tables will have to fire triggers.
Inheritance
- Honor UNIQUE INDEX on base column in INSERTs/UPDATEs on inherited table, e.g. INSERT INTO inherit_table (unique_index_col) VALUES (dup) should fail
- The main difficulty with this item is the problem of creating an index that can span multiple tables.
- Determine whether ALTER TABLE / SET SCHEMA should work on inheritance hierarchies (and thus support ONLY). If yes, implement it.
- ALTER TABLE variants sometimes support recursion and sometimes not, but this is poorly/not documented, and the ONLY marker would then be silently ignored. Clarify the documentation, and reject ONLY if it is not supported.
Indexes
- Prevent index uniqueness checks when UPDATE does not modify the column
- Uniqueness (index) checks are done when updating a column even if the column is not modified by the UPDATE. However, HOT already short-circuits this in common cases, so more work might not be helpful.
- Allow the creation of on-disk bitmap indexes which can be quickly combined with other bitmap indexes
- Such indexes could be more compact if there are only a few distinct values. Such indexes can also be compressed. Keeping such indexes updated can be costly.
- Allow accurate statistics to be collected on indexes with more than one column or expression indexes, perhaps using per-index statistics
- Consider smaller indexes that record a range of values per heap page, rather than having one index entry for every heap row
- This is useful if the heap is clustered by the indexed values.
- Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY
- This is difficult because you must upgrade to an exclusive table lock to replace the existing index file. CREATE INDEX CONCURRENTLY does not have this complication. This would allow index compaction without downtime.
- Allow multiple indexes to be created concurrently, ideally via a single heap scan
- pg_restore allows parallel index builds, but it is done via subprocesses, and there is no SQL interface for this.
- Consider using "effective_io_concurrency" for index scans
- Currently only bitmap scans use this, which might be fine because most multi-row index scans use bitmap scans.
GIST
GIN
Hash
- [D] Pack hash index buckets onto disk pages more efficiently
- Currently only one hash bucket can be stored on a page. Ideally several hash buckets could be stored on a single page and greater granularity used for the hash algorithm. However, the binary searching within a hash page probably renders this issue moot.
Catalogs
Sorting
Fsync
- Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options
- Ideally this requires a separate test program that can be run at initdb time or optionally later. Consider O_SYNC when O_DIRECT exists.
Cache Usage
- Speed up COUNT(*)
- We could use a fixed row count and a +/- count to follow MVCC visibility rules, or a single cached value could be used and invalidated if anyone modifies the table. Another idea is to get a count directly from a unique index, but for this to be faster than a sequential scan it must avoid access to the heap to obtain tuple visibility information.
- Provide a way to calculate an "estimated COUNT(*)"
- Perhaps by using the optimizer's cardinality estimates or random sampling.
- Allow data to be pulled directly from indexes
- Currently indexes do not have enough tuple visibility information to allow data to be pulled from the index without also accessing the heap. One way to allow this is to set a bit on index tuples to indicate if a tuple is currently visible to all transactions when the first valid heap lookup happens. This bit would have to be cleared when a heap tuple is expired. Another idea is to maintain a bitmap of heap pages where all rows are visible to all backends, and allow index lookups to reference that bitmap to avoid heap lookups, perhaps the same bitmap we might add someday to determine which heap pages need vacuuming. Frequently accessed bitmaps would have to be stored in shared memory. One 8k page of bitmaps could track 512MB of heap pages. A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter.
- Consider automatic caching of statements at various levels:
- Parsed query tree
- Query execute plan
- Query results
- Consider allowing higher priority queries to have referenced buffer cache pages stay in memory longer
Vacuum
- [D] Improve VACUUM FULL's speed when major data movement is needed
- For large table adjustments during VACUUM FULL, it would be faster to cluster or reindex rather than update the indexes piecemeal as it does now. Also, this behavior tends to bloat the indexes.
- Clean up VACUUM FULL's klugy transaction management
- VACUUM FULL marks its transaction committed before it's really done, which means a PANIC if it fails after that point. This needs to be split into two transactions.
- Auto-fill the free space map by scanning the buffer cache or by checking pages written by the background writer
- Consider having single-page pruning update the visibility map
- https://commitfest.postgresql.org/action/patch_view?id=75
- http://archives.postgresql.org/pgsql-hackers/2010-02/msg02344.php
- Bias FSM towards returning free space near the beginning of the heap file, in hopes that empty pages at the end can be truncated by VACUUM
Auto-vacuum
- Prevent long-lived temporary tables from causing frozen-xid advancement starvation
- The problem is that autovacuum cannot vacuum them to set frozen xids; only the session that created them can do that.
Locking
- Fix problem when multiple subtransactions of the same outer transaction hold different types of locks, and one subtransaction aborts
- Allow UPDATEs on only non-referential integrity columns not to conflict with referential integrity locks
Startup Time Improvements
- Experiment with multi-threaded backend for backend creation
- This would prevent the overhead associated with process creation. Most operating systems have trivial process creation time compared to database startup overhead, but a few operating systems (Win32, Solaris) might benefit from threading. Also explore the idea of a single session using multiple threads to execute a statement faster.
Write-Ahead Log
- Eliminate need to write full pages to WAL before page modification
- Currently, to protect against partial disk page writes, we write full page images to WAL before they are modified so we can correct any partial page writes during recovery. These pages can also be eliminated from point-in-time archive files.
- When full page writes are off, write CRC to WAL and check file system blocks on recovery
- If CRC check fails during recovery, remember the page in case a later CRC for that page properly matches.
- Write full pages during file system write and not when the page is modified in the buffer cache
- This allows most full page writes to happen in the background writer. It might cause problems for applying WAL on recovery into a partially-written page, but later the full page will be replaced from WAL.
- Find a way to reduce rotational delay when repeatedly writing last WAL page
- Currently fsync of WAL requires the disk platter to perform a full rotation to fsync again. One idea is to write the WAL to different offsets that might reduce the rotational delay.
- Allow WAL logging to be turned off for a table, but the table might be dropped or truncated during crash recovery
- Allow tables to bypass WAL writes and just fsync() dirty pages on commit. This should be implemented using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ]. Tables using non-default logging should not use referential integrity with default-logging tables. A table without dirty buffers during a crash could perhaps avoid the drop/truncate.
- Allow WAL logging to be turned off for a table, but the table would avoid being truncated/dropped
- To do this, only a single writer can modify the table, and writes must happen only on new pages so the new pages can be removed during crash recovery. Readers can continue accessing the table. Such tables probably cannot have indexes. One complexity is the handling of indexes on TOAST tables.
- Speed WAL recovery by allowing more than one page to be prefetched
- This should be done utilizing the same infrastructure used for prefetching in general to avoid introducing complex error-prone code in WAL replay.
Optimizer / Executor
- Consider increasing the default values of from_collapse_limit, join_collapse_limit, and/or geqo_threshold
- Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified percentage
- Log statements where the optimizer row estimates were dramatically different from the number of rows actually found?
Background Writer
- Consider having the background writer update the transaction status hint bits before writing out the page
- Implementing this requires the background writer to have access to system catalogs and the transaction status log.
- Test to see if calling PreallocXlogFiles() from the background writer will help with WAL segment creation latency
Concurrent Use of Resources
- Do async I/O for faster random read-ahead of data
- Async I/O allows multiple I/O requests to be sent to the disk with results coming back asynchronously. The above patch is already applied as of 8.4, but it still remains to figure out how to handle plain indexscans effectively.
- Experiment with multi-threaded backend for better I/O utilization
- This would allow a single query to make use of multiple I/O channels simultaneously. One idea is to create a background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be expanded to allow concurrent reads from multiple devices in a partitioned table.
- Experiment with multi-threaded backend for better CPU utilization
- This would allow several CPUs to be used for a single query, such as for sorting or query execution.
TOAST
Miscellaneous Performance
- Use mmap() rather than SYSV shared memory or to write WAL files?
- This would remove the requirement for SYSV SHM but would introduce portability issues. Anonymous mmap (or mmap to /dev/zero) is required to prevent I/O overhead.
- Consider mmap()'ing files into a backend?
- Doing I/O to large tables would consume a lot of address space or require frequent mapping/unmapping. Extending the file also causes mapping problems that might require mapping only individual pages, leading to thousands of mappings. Another problem is that there is no way to _prevent_ I/O to disk from the dirty shared buffers so changes could hit disk before WAL is written.
- Consider ways of storing rows more compactly on disk:
- Reduce the row header size?
- Consider reducing on-disk varlena length from four bytes to two because a heap row cannot be more than 64k in length
- Allow configuration of backend priorities via the operating system
- Though backend priorities make priority inversion during lock waits possible, research shows that this is not a huge problem.
- Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation
- Allow one transaction to see tuples using the snapshot of another transaction
- This would assist multiple backends in working together.
Miscellaneous Other
Source Code
- Add optional CRC checksum to heap and index pages
-
One difficulty is how to prevent hint bit changes from affecting the computed CRC checksum.
- http://archives.postgresql.org/message-id/19934.1226601952%40sss.pgh.pa.us
- http://archives.postgresql.org/pgsql-hackers/2008-10/msg00002.php
- http://archives.postgresql.org/pgsql-hackers/2008-10/msg01028.php
- http://archives.postgresql.org/pgsql-hackers/2008-11/msg00524.php
- http://archives.postgresql.org/pgsql-hackers/2008-12/msg01101.php
- http://archives.postgresql.org/pgsql-hackers/2009-12/msg00011.php
- Improve detection of shared memory segments being used by others by checking the SysV shared memory field 'nattch'
- Fix system views like pg_stat_all_tables to use set-returning functions, rather than views of per-column functions
- [D] Allow table and index WITH options to be specified via hooks, for use with plugins like GiST index methods
Documentation
Windows
Wire Protocol Changes
Exotic Features
- Add pre-parsing phase that converts non-ISO syntax to supported syntax
- This could allow SQL written for other databases to run without modification.
- Add features of Oracle-style packages
- A package would be a schema with session-local variables, public/private functions, and initialization functions. It is also possible to implement these capabilities in any schema and not use a separate "packages" syntax at all.
- Consider allowing control of upper/lower case folding of unquoted identifiers
- Bringing PostgreSQL torwards the standard regarding case folding
- Re: [SQL] Case Preservation disregarding case sensitivity?
- TODO Item: Consider allowing control of upper/lower case folding of unquoted, identifiers
- Identifier case folding notes
- http://archives.postgresql.org/pgsql-hackers/2008-07/msg00415.php