Corruption

From PostgreSQL wiki

Jump to: navigation, search

Contents

Index / table / database corruption

Almost all corruption issues are related to hardware problems, and most of the rest are due to documented-unsafe configurations like disabling fsync.

VITALLY IMPORTANT FIRST RESPONSE

Do you think you have one of those rare but nasty index or table corruption problems?

Before doing anything else:

Stop the postmaster and take a file system level copy of your database NOW!. Don't use pg_dump, instead make a copy of all the files in the data directory (the one that contains the "base", "pg_xlog", "pg_clog" etc folders).

If possible, put a copy of it on an external hard drive, DVD, or other storage you can disconnect from your computer so you don't accidentally modify your snapshot after making it.

People often destroy potentially recoverable data by trying to repair it. Make a file-system-level copy of your database before attempting any repair. It might save your data.

Your backup copy is also important evidence that may help figure out what caused the problem, allowing you to prevent it from happening again. Repair efforts usually destroy that evidence. Having the damaged data may be the only way to figure out why it was damaged and how to prevent the same thing happening again.

Causes

Remember that most corruption is caused by hardware issues:

  • RAID controllers with faulty / worn out battery backup, and an unexpected power loss
  • Hard disk drives with write-back cache enabled, and an unexpected power loss
  • Cheap SSDs with insufficient power-loss protection, and an unexpected power-loss
  • Defective RAM
  • Defective or overheating CPU(s)

Other causes can be:

  • Database systems configured with fsync=off and an OS crash or power loss
  • Filesystems configured to use write barriers plus a storage layer that ignores write barriers. LVM is a particular culprit.
  • PostgreSQL bugs
  • Operating system bugs
  • Admin error
    • directly modifying Postgre data-directory contents
    • faulty fail-over procedures like rsync'ing without pg_start_backup
    • and lots more

What next?

Read the Guide_to_reporting_problems and post to the pgsql-general mailing list with information about the problem. Remember that your hardware settings and postgresql configuration are vital information, as are any error messages, your Pg version, etc.

Christophe Pettus's FOSDEM presentation on database corruption is also well worth a read, but you don't get the spoken warnings that come with it. In particular, the recovery techniques described there are for recovering what you can from a DB, with the understanding that you'll have missing chunks of data, inconsistent FK relationships, reappearing deleted rows / visible uncommitted rows, and all sorts of problems depending on the repair technique used. It assumes you've already done the sensible initial investigation and fixed any less serious problems that might be causing your issue.

Personal tools