GUCS Overhaul

From PostgreSQL wiki

Revision as of 19:14, 19 May 2012 by Boshomi (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Problems

Currently, postgresql.conf and our set of GUC configurations suffer from several large problems:

  1. Most people have no idea how to set these.
  2. The current postgresql.conf file is a huge mess of 194 options, the vast majority of which most users will never touch.
  3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, and settings.sgml), which are only synched with each other manually.
  4. We don't seem to be getting any closer to autotuning.

Goals

  • It's vitally important that any overhaul of the user side of the GUCS be completed in one version to minimize the pain involved.
  • The file format should be standardized in a way that makes it easy to parse and generate.
  • The internal data and pg_settings view should contain everything needed to output a new postgresql.conf that's functionally identical to the original
    • This will be more difficult in cases where there are include files involved.
  • By shifting from a model where postgresql.conf is document-formatted and hand-edited to one where it's machine generated, it becomes vastly easier to write simple utilities to manage these settings. Right now, the big "obstacle" to things like SET PERSISTENT is "how to we preserve the hand-edited comments in the file" -- and the answer is we don't.
  • The lack of an easy way to manage settings via port access to PostgreSQL is a significant inhibitor to adopting PostgreSQL in environments with large numbers of servers. Making this available would make for easier integration into tools like CPANEL, which in turn would expand PostgreSQL web hosting availability.

Gotchas

  • It's impossible to push all the settings into the database itself to eliminate the postgresql.conf altogether, as quite a few of the GUC parameters are needed before one can ever read the database; in particular the ones about file locations and shared memory sizing.
  • Tools that operate remotely have to consider that they may make a change that keeps the server from coming back up again. Suppose you change a parameter in a way that breaks the DB (e.g., set shared_buffers to a value larger than your kernel allows) and try to restart. Database doesn't start. If you only have access to the database via port 5432 once it's up, and you can't change the parameter back via editing the file directly, you're hosed.
    • To support the hosted server goal, perhaps a backup "known good" backup file can be generated and used if the primary doesn't work.
    • The other option being explored is writing a config file checker. It's unclear yet how that will know whether or not the shared memory block needed with the new config will be large enough or not.

Design Proposal

Update pg_settings (Done)

Expose some of the information available but not visible in pg_settings:

  • Default value. Needed to support pg_generate_conf, as well as being handy for tuning tool authors.
  • Setting as it appears in the source file. You can get the setting re-scaled usefully using current_setting(name), and that may be sufficient rather than making available yet another way.
  • Source for the setting, to distinguish between things sourced from the master file and those coming from include files. This is only needed to re-generate complicated postgresql.conf files that include other files.

This work is finished as of 2008-10-06.

Recommend a standard for distinguishing user and system generated comments

Right now the postgresql.conf has lines like this:

max_connections = 100           # (change requires restart)
# Note:  Increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).  You might
# also need to raise shared_buffers to support more connections.

In the field admins end up adopting a couple of common idioms for dealing with this file in a way that they can distinguish their changes and comments about those changes from the text that was there originally. Presenting a standard way to deal with this could be done without actually changing any server code, just by providing a new postgresql.conf. Here is one suggestion for how to re-work this parameter into a new form where system generated comments are distinguished with #!; a sample of a good user comment for change control is included as well:

#! Note:  Increasing max_connections costs ~400 bytes of shared memory per
#! connection slot, plus lock space (see max_locks_per_transaction).  You might
#! also need to raise shared_buffers to support more connections. 
#! (change requires restart)
# 2008-09-16 Increased to 200
max_connections = 200

If this approach were adopted, it would make code to operate on the postgresql.conf file far easier to manage. A simple program could transform between various levels of verbosity in the file on admin request.

Write a comment conversion tool

Given a list of the stock comments that show up in a normal postgresql.conf, a simple script could remove those lines or portions of lines while keeping user commentary and changes in place. This would allow users with existing postgresql.conf files to adopt the new standard easily while preserving their current comments.

Add pg_settings_info

Note: the enum lists goal here has been already added to the pg_settings view as of the 8.4 development in process. However, that implementation is a simple string. It would be more orthogonal to the other things here if it were instead a text[]. Greg Smith has volunteered that make that change in the 8.4 timeline.

Add a new table (pg_settings_info) containing additional information not needed for GUC operation with some subset of the following fields:

  • The system comments currently in the sample postgresql.conf. This is the primary thing needed here to enable pg_generate_conf; the items below in this list are all optional.
  • Saved user comments
  • Subcategory
  • More documentation
    • Longer description, perhaps the whole documentation for the option?
    • Web link to the full documentation on this option?
      • It would be handy to have any system postgresql.conf, and perhaps each section, prefer to the appropriate section of the documentation. The URL there can probably be constructed based on knowing how the docs map rather than set explicitly.
  • Recommendations
  • Enum lists
  • It would be nice if looking at pg_settings could get you everything people like about the Annotated postgresql.conf.

Here's DDL for a minimal implementation that supports the goals below:

CREATE TABLE pg_settings_info (
    name text,
    system_comments text[],
    user_comments text[],
    UNIQUE(name)
);

Some remaining issues in this area:

  • One issue with reconstructing the original file even with all the information imported is that users may have a particular sequence of parameters they've setup in the config file, perhaps to group them differently than the defaults. The source file line number added in a recent patch allows handling that.
  • A sticky point here is that in the real world, people tend to put a header at the top or bottom of the file with notes about this host etc. that aren't really tied to any specific parameter.

Move the sample postgresql.conf comments into the system comments column

All the parameter-specific notes currently appearing in the postgresql.conf need to move into the pg_settings_info.system_comments area.

Write pg_generate_conf

pg_generate_conf would take the following switches:

  • -b , --basic — short conf file, listing only the 15-18 most commonly changed options
  • -a , --advanced — conf file listing all 196+ options
  • -t, --terse — conf file lists only category headings and actual settings, no comments
  • -n, --normal — conf file has category and subcategory settings, with short, descriptive comments
  • -v, --verbose — conf file lists full descriptions and recommendations in comments with each option
  • -c "option = value" set specific option to specific value in the file
  • -f "filename" — take options and values from file "filename"

The default would be "-b, -n", with specific settings for shared_buffers and wal_sync_method. The current postgresql.conf is a lot more like a "-a, -v" file.

This would help us in the following ways:

  • By having a generated postgresql.conf and an easy way to generate it, writing autoconfiguration scripts (as well as shortcuts like SET PERSISTENT) become vastly easier.
  • Most users would deal only with a limited set of 15-20 configuration variables.
  • The 3 places to maintain GUCS lists will be: the docs, guc.c, and the file used to populate pg_settings_info

pg_generate_conf should be able to read its own output as an input and generate a new file with the same parameters, but with a different verbosity level.

An alternate approach here would be to include some text-based documentation that is easy read on the database server itself, then refer to that (and the official documentation) inside the postgresql.conf

Remove the postgresql.conf.sample

  • Delete the postgresql.conf.sample file
  • initdb can now call pg_generate_conf to generate a postgresql.conf based on guc.c and command-line switches, rather than on postgresql.conf.sample

Related projects

Server side changes

There needs to be a way to modify the underlying settings and save that into a new machine-generated postgresql.conf file. Is implementing SET PERSISTENT sufficient for that?

Tuning wizard

Imagine a GUI or web app that connects to a server on port 5432, finds out some basic information about the server, and then gives something like this:

Parameter               Current Recommended     Change?
shared_buffers          32MB    1024MB          [X]
effective_cache_size    128MB   3000MB          [ ]
work_mem                1MB     16MB            [ ]

Saving parameter change history

Many IT shops was some sort of history as a parameter is updated. Something like this would fit most of those requirements:

# 2008-03-02 : 32MB : postgres : Database default
# 2008-05-02 : 512MB : pg_autotune : Wizard update
# 2008-05-15 : 1024MB : gsmith : Increased after benchmark tests
shared_buffers = 1024MB

The tuning wizard should label its comments in such a fashion. The hope is that if such a format were recommended and it was good, other people would adopt it as a de-facto standard for making manual edits as well.

Sources for tuning information

Personal tools