TOAST

From PostgreSQL wiki
Jump to navigationJump to search


TOAST is a mechanism PostgreSQL uses to keep physical data rows from exceeding the size of a data block (typically 8KB). Postgres does not support physical rows that cross block boundaries, so the block size is a hard upper limit on row size. To allow user tables to have rows wider than this, the TOAST mechanism breaks up wide field values into smaller pieces, which are stored "out of line" in a TOAST table associated with the user table.

Each table you create has its own associated (unique) TOAST table, which may or may not ever end up being used, depending on the size of rows you insert. All of this is transparent to the user, and enabled by default.

When a row that is to be stored is "too wide" (the threshold for that is 2KB by default), the TOAST mechanism first attempts to compress any wide field values. If that isn't enough to get the row under 2KB, it breaks up the wide field values into chunks that get stored in the associated TOAST table. Each original field value is replaced by a small pointer that shows where to find this "out of line" data in the TOAST table. TOAST will attempt to squeeze the user-table row down to 2KB in this way, but as long as it can get below 8KB, that's good enough and the row can be stored successfully.

All standard Postgres data types that could possibly have values wider than 2KB support being "TOASTed" in this way, and so do most potentially-wide extension data types.

You can view the current TOAST options for a table by opening psql and running

\d+ table_name

for example:

=> \d+ test_table_name
                       Table "name_space.test_table_name"
 Column |       Type        | Modifiers | Storage  | Stats target | Description 
--------+-------------------+-----------+----------+--------------+-------------
  foo   | character(100000) |           | extended |              | 

You can modify the storage type like this:

alter table test_blob alter column column_name set storage EXTENDED;

The default TOAST option is EXTENDED, which tells the system to first attempt to compress the data for that column. If that isn't enough, then it out-of-line's it to the TOAST table. There are other options that you can pick if you don't want compression or out-of-line storage. Be careful about disabling out-of-line storage entirely, because that will cause an attempt to store an oversize row to fail altogether. (Note that columns that are of non-TOASTable data types will show the storage option as "plain", and you can't change it.)

Total table size limit

One caveat here is that for each value that is moved "out of line" an OID is generated and used to track it in that table's associated TOAST table. You cannot have more than 2^32 (4 billion) out-of-line values in a single table, because there would have to be duplicated OIDs in its TOAST table. In practice, you want to have considerably less than that many TOASTed values per table, because as the OID space fills up the system might spend large amounts of time searching for the next free OID when it needs to generate a new out-of-line value.

The OIDs used for this purpose are generated from a global counter that wraps around every 4 billion values, so that from time to time an already-used value will be generated again. Postgres detects that, and tries again with the next OID. So a really slow insert could only happen if there is a very long run of used OID values, with no gaps, in a particular TOAST table. That's not terribly likely, in part because the use of a global counter tends to spread successive OIDs through different tables. But it can happen.

Notice that what counts is the total number of wide values in a table. If you have many wide columns then the number of rows you can have in the table might be much less than 4 billion. Partitioning your table is a possible workaround.

It's also worth emphasizing that only field values wider than 2KB will consume TOAST OIDs in this way. So in practice, reaching this limit would require many terabytes of data in a single table, especially if you have a wide range of value widths.

Column count limit

TOAST does not completely eliminate the physical block size as a restriction on the width of rows, because it operates by narrowing down individual fields, and it can't make those smaller than a pointer (18 bytes). So even complete toasting doesn't allow a row with more than about 450 wide columns. Also, fixed-width field types such as integers don't support being toasted, so having lots of columns of those types can cause problems with the row width limit too.

A possible workaround, if you don't mind working with a custom build of Postgres, is to use a larger block size. But that only goes so far. Usually, if you start running into this limitation, you should redesign your table schema. Consider combining columns of identical data types into an array, for example.