Improve the performance of ALTER TABLE SET LOGGED UNLOGGED statement

From PostgreSQL wiki
Jump to navigationJump to search

Introduction

Last year during the GSoC2014 I implemented the feature to allow an unlogged table to be changed to logged [1], but the desing chosen was to rewrite the entire heap in a new relfilenode with a new relpersistence because some troubles pointed here [2].

The project was successfully finished and got committed [3] into PostgreSQL to be released this year in the 9.5 version.

However this design lead us to performance problems with large relations because we need to rewrite the entire content of the relation twice, one into a new heap and other into the WAL, so this project will change the current desing of the mechanism of change an unlogged table to logged without the need to rewrite the entire heap, but just by removing the init forks and if the wal_level != minimal we'll write the contents to the WAL too.

Benefits to the PostgreSQL Community

The "unlogged" tables feature was introduced by 9.1 version, and provide better write performance than regular tables (logged), but are not crash-safe. Their contents are automatically discarded (cleared) if the server crashes. Also, their contents do not propagate to standby servers.

We already have a way to change an unlogged table to logged using "ALTER TABLE name SET LOGGED" developed last year during the GSoC2014, now during the GSoC2015 we'll redesing the internals to improve the I/O performance of this feature removing the need of rewrite the entire heap into a new relfilenode.

The are a good idea about the desing here [4], but I'll discuss the design with my mentor to implement this improvement.

Additional Goals

The main goal of this project is improve the performance of the ALTER TABLE name SET {LOGGED|UNLOGGED}, but we can expand this propostal to more related goals.

  • Allow unlogged materialized views
    • ALTER MATERIALIZED VIEW name SET { UNLOGGED | LOGGED }
  • Allow unlogged indexes on logged tables
    • ALTER INDEX name SET { UNLOGGED | LOGGED }

Deliverables

This project has just one deliverable at the end. The deliverable will be the improvement of the routines that transform an "unlogged" table to "logged" and "logged" to "unlogged", without the need to create a new "relfilenode" with a different "relpersistence".

Project Schedule

until May 25

  • create a website to the project (wiki.postgresql.org)
  • create a public repository to the project (github.com/fabriziomello)
  • read what has already been discussed by the community about the project [4]
  • learn about some PostgreSQL internals:
    • control data (src/include/catalog/pg_control.h)
    • storage (src/backend/storage/*)
  • discuss the additional goals with community

May 26 - June 21

  • implementation of the first prototype:
    • implement the change of unlogged table to logged without rewrite the entire heap when "wal_level = minimal"
    • at this point when "wal_level != minimal" we use the current implementation
  • write documentation and the test cases
  • submit this first prototype to the commitfest 2015/06 (https://commitfest.postgresql.org/5/)

June 22 - June 26

  • mentor review the work in progress

June 27 - August 17

  • do the adjustments based on the community feedback during the commitfest 2015/06
  • implementation of the second prototype:
    • when "wal_level != minimal" we'll remove the init fork (first prototype) and write relation pages to the WAL.
    • implement “ALTER MATERIALIZED VIEW .. SET LOGGED / UNLOGGED”
  • submit to the commitfest 2015/09 for final evaluation and maybe will be committed to 9.6 version (webpage don't created yet)

August 18 - August 21

  • do the adjustments based on the community feedback during the commitfest 2015/09 final mentor review

About the proponent

Fabrízio de Royes Mello

  • e-mail: fabriziomello at gmail dot com
  • twitter: @fabriziomello
  • github [5]
  • linkedin [6]

Currently I help people and teams to take the full potential of relational databases, especially PostgreSQL, helping teams to design the structure of the database (modeling), build physical architecture (database schema), programming (procedural languages), SQL (usage, tuning, best practices), optimization and orchestration of instances in production too. I perform a volunteer work for Brazilian Community of PostgreSQL (www.postgresql.org.br), supporting mailing lists, organizing events (pgbr.postgresql.org.br) and some admin tasks. And also I help a little the PostgreSQL Global Development Group (PGDG) in the implementation of some features and review of patches (git.postgresql.org).