VACUUM FULL

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
m
(ALTER TABLE .. SET DATA TYPE)
Line 48: Line 48:
 
ALTER TABLE your_table ALTER an_integer_column SET DATA TYPE integer;
 
ALTER TABLE your_table ALTER an_integer_column SET DATA TYPE integer;
 
</pre>
 
</pre>
 +
 +
This trick will not work in PostgreSQL versions 9.1 or later, as it detects that the change in data type is degenerate and so no rewrite is necessary.
  
 
=== <code>[http://www.postgresql.org/docs/current/static/sql-selectinto.html SELECT ... INTO]</code> ===
 
=== <code>[http://www.postgresql.org/docs/current/static/sql-selectinto.html SELECT ... INTO]</code> ===

Revision as of 22:28, 12 November 2012

Contents

VACUUM vs VACUUM FULL

The VACUUM command and associated autovacuum process are PostgreSQL's way of controlling MVCC bloat. The VACUUM command has two main forms of interest - ordinary VACUUM, and VACUUM FULL. These two commands are actually quite different and should not be confused.

VACUUM scans a table, marking tuples that are no longer needed as free space so that they can be overwritten by newly inserted or updated data. See Introduction to VACUUM, ANALYZE, EXPLAIN, and COUNT and the PostgreSQL documentation on MVCC for a detailed explanation of this. Note that you should rarely need to use the VACUUM command directly on a modern PostgreSQL database, as autovacuum should take care of it for you if properly set up.

VACUUM FULL, unlike VACUUM, touches data that has not been deleted. On pre-9.x versions of Postgres, it moves data into spaces earlier in the file that have been freed. Once it's created a free space at the end of the file, it truncates the file so that the OS knows that space is free and may be reused for other things. Moving in-use data around this way can have adverse side-effects, including taking heavy weight locks, increased i/o, and adding index bloat. On older systems, there are better ways to free space if you need to, and better ways to optimize tables (see below) so you should essentially never use VACUUM FULL on a pre-9.x system. Even on 9.x and above, the system is designed with the goal that you should never be running VACUUM FULL regularly.

For clarity, 9.0 changes VACUUM FULL. As covered in the documentation, the VACUUM FULL implementation has been changed to one that's similar to using CLUSTER in older versions. This gives a slightly different set of trade-offs from the older VACUUM FULL described here. While the potential to make the database slower via index bloating had been removed by this change, it's still something you may want to avoid doing, due to the locking and general performance overhead of a VACUUM FULL.

When (not) to use VACUUM FULL

Many people, either based on misguided advice on the 'net or on the assumption that it must be "better", periodically run VACUUM FULL on their tables. This is generally not recommended and in some cases can make your database slower, not faster.

VACUUM FULL is only needed when you have a table that is mostly dead rows - ie, the vast majority of its contents have been deleted. Even then, there's no point using VACUUM FULL unless you urgently need that disk space back for other things or you expect that the table will never again grow to its past size. It should not be used for table optimization or periodic maintenance, as it's counterproductive.

When you run VACUUM FULL on a table, that table is locked for the duration of the operation, so nothing else can work with the table. VACUUM FULL is much slower than a normal VACUUM, so the table may be unavailable for a while. More importantly, on pre-9.x systens, while VACUUM FULL compacts the table, it does not compact the indexes - and in fact may increase their size, thus slowing them down, causing more disk I/O when the indexes are used, and increasing the amount of memory they require. A REINDEX may be required after VACUUM FULL. See the main documentation's notes on VACUUM vs VACUUM FULL.

What to use instead

If you shouldn't use VACUUM FULL, what should you be using?

Autovacuum

If autovacuum is running frequently enough and aggressively enough, your tables should never grow ("bloat") due to unreclaimed dead rows, so you should never need to return "dead" space to the OS.

Autovacuum continues to improve dramatically with every PostgreSQL version, and is a very good reason to make sure you are running the latest version. For example, with 8.4 the free space map is now managed automatically, removing a no-longer-necessary tuning parameter and eliminating a major source of table bloat.

If autovacuum isn't doing enough to keep your tables and indexes bloat-free, tune it, don't supplement it with manual vacuuming and reindexing. You may need to increase your free space map settings (pre-8.4), tune autovacuum to run more frequently, and/or tell autovacuum to vacuum certain frequently-updated tables more aggressively than others.

VACUUM

Unless you need to return space to the OS so that other tables or other parts of the system can use that space, you should use VACUUM instead of VACUUM FULL.

If you need to manually VACUUM your tables at any time other than when running major admin or update tasks that rewrite large parts of your tables, you probably don't have autovacuum set up well enough.

CLUSTER

If you're trying to "optimise" your tables by packing them down and removing table bloat that's accumulated due to (say) insufficiently aggressive autovacuuming, or you're trying to return dead space in a table to the operating system, CLUSTER is usually your best bet. It runs a lot faster than VACUUM FULL and will compact and optimise the indexes as well as the table its self. However, you will need enough free space to hold all the in-use data from the table while CLUSTER runs.

ALTER TABLE .. SET DATA TYPE

The problem with CLUSTER is that it reorders the table following an index. If the table is not already approximately in that index' order, this will take a long time because it will have to do a scattered read of the table pages over and over as it looks for each tuple. A faster alternative is to request a full table rewrite without requiring a particular order. PostgreSQL versions prior to 9.0 do not offer any direct way to invoke this operation; however, you can use the following workaround. Choose any table column, and use ALTER TABLE to change its type to the same type. This is obviously going to cause no logical change to the table, but the server will have to rewrite the table, getting rid of dead tuples while at it.

For example, assuming the an_integer_column is of type INTEGER:

ALTER TABLE your_table ALTER an_integer_column SET DATA TYPE integer;

This trick will not work in PostgreSQL versions 9.1 or later, as it detects that the change in data type is degenerate and so no rewrite is necessary.

SELECT ... INTO

Sometimes it can be faster to use a SELECT ... INTO command to copy data from a bloated table into a new table, then re-create the indexes and finally rename the tables to replace the old one with the new one. It's rarely worth doing this instead of using CLUSTER, though, as CLUSTER does almost the same thing automatically and can rebuild indexes in parallel. The main reason you may want to use SELECT ... INTO instead of CLUSTER is if you don't want to sort the table.

TRUNCATE TABLE

If you've been using VACUUM FULL to free space from a table that's periodically completely emptied using DELETE FROM tablename; (without a WHERE clause), you can use TRUNCATE TABLE to replace those two steps with one much, much faster one.

Instead of:

DELETE FROM tablename;
VACUUM FULL tablename;

write:

TRUNCATE TABLE tablename;

Please make sure to read the caveats in the notes on the TRUNCATE TABLE documentation. If TRUNCATE TABLE isn't suitable for your needs, you can use DELETE followed by CLUSTER instead.

If using DELETE on a table that is the target of foreign key references, consider adding an index to the referencing columns. That will allow checks for foreign key enforcement to avoid a sequential scan on the referencing table, making DELETE from the referenced table vastly faster.

Recovering from index bloat caused by VACUUM FULL

If you have indexes badly bloated by regular use of VACUUM FULL, your best bet is usually going to be to use CLUSTER to rewrite the table and rebuild the indexes.

If you can't afford to have the table locked for that long, you can rebuild each index individually while queries continue to run on the table. PostgreSQL unfortunately doesn't have a REINDEX CONCURRENTLY command, but it can be simulated with appropriate use of CREATE INDEX ... CONCURRENTLY, ALTER INDEX ... RENAME and DROP INDEX to create new indexes, swap the old and new ones by renaming, and drop the old indexes. Note that since you can't drop some indexes, such as primary keys, this may not be a possible cleanup technique for all of them.

So, why does VACUUM FULL exist?

The reasons are mostly historical ones. In particular, it was more useful before the availability of a fully MVCC-safe CLUSTER, and severe table bloat was more common before autovacuum was included in mainline and enabled by default.

There is one remaining use for VACUUM FULL that there often isn't a better alternative for, though - recovering from disk-full situations. VACUUM FULL requires much less free space than CLUSTER to compact a table, so it can be useful if you've (almost) completely run out of space.

VACUUM FULL proposed for removal from PostgreSQL 8.5 (later renamed to PostgreSQL 9.0), and that article gives some additional support for the suggestions given here. The change proposed there was implemented. The few cases where the old VACUUM FULL could be helpful, such as recovering from low-disk space situations, did not outweigh the issues of the old code--which included an incompatibility with the new replication features in the 9.0 release.

Originally by --Ringerc 03:48, 26 November 2009 (UTC)

Personal tools