Changes To Norwegian Locale

From PostgreSQL wiki
Jump to navigationJump to search

Important information for users using the "Norwegian (Bokmål)" locale on Windows

The 9.4.1, 9.3.6, and 9.2.10 minor releases contain changes to the way Postgres handles the Norwegian (Bokmål) locale on Windows.

Who is affected by the change?

All database clusters on Windows that have been initialized with, or contain any databases using the "Norwegian (Bokmål)" locale.

What was the problem?

On Windows, the name of the "Norwegian (Bokmål)" locale contains a non-ASCII character (å). It is the only locale whose name is not pure ASCII. That non-ASCII character causes several problems:

1. Before a PostgreSQL backend process has set any locale (using the setlocale() system function), it is not clear what character encoding should be used to encode the non-ASCII character, in the call to set the locale. I.e. it's a chicken-and-egg problem.

2. The locale name is stored in the pg_database system catalog, which is shared between all the databases in the cluster. Different databases can use different character encodings, so it is not clear what encoding should be used for the values in pg_database.

3. The locale name is stored in the postgresql.conf configuration file. That file normally contains only ASCII characters, so again it is not clear what encoding should be used. When connected to a database that uses a different encoding than what was used in the configuration file, the locale related settings are incorrectly encoded in SHOW and in the pg_settings view, which can lead to "invalid byte sequence" errors.

These underlying problems can manifest in several ways:

1. Failures when issuing a CREATE DATABASE command.

2. Failures in restoring backups taken with pg_dumpall.

3. Failures in upgrading the cluster with pg_upgrade.

What was changed?

To fix these issues, it was decided to map the problematic "Norwegian (Bokmål)_Norway" locale name to an alias of the same locale that Windows also recognizes, "Norwegian_Norway".

Unfortunately, even though the code was changed to use the "Norwegian_Norway" alias, the old problematic name will still exist in the pg_database system catalog in old installations, and also in the postgresql.conf configuration file. These instances need to be converted manually.

Postgres 9.4.0 mapped the problematic name to "norwegian-bokmal", but that locale name turns out to work on only some versions of Windows. So we now recommend "Norwegian_Norway" instead. While an installation using "norwegian-bokmal" will work fine on its own terms, you might have problems moving the data to another Windows system later, so we recommend replacing that name too.

What do I need to do?

Before making any low-level changes to system tables, it's strongly recommended that you take a full physical backup of the database first.

After updating the binaries, perform the following steps on the existing database cluster:

1. Run these two commands as superuser, in any database of the cluster:

 UPDATE pg_database SET
   datcollate = 'Norwegian_Norway' || substr(datcollate, position('.' in datcollate))
 WHERE datcollate LIKE 'Norwegian (Bokm%' OR datcollate LIKE 'norwegian-bokmal%';
 
 UPDATE pg_database SET
   datctype = 'Norwegian_Norway' || substr(datctype, position('.' in datctype))
 WHERE datctype LIKE 'Norwegian (Bokm%' OR datctype LIKE 'norwegian-bokmal%';

You need only do this once, even if the cluster contains many databases.

2. Open postgresql.conf in a text editor, and find the settings lc_messages, lc_monetary, lc_numeric, and lc_time. Replace the string "Norwegian (Bokmål)_Norway" with "Norwegian_Norway". If you see "norwegian-bokmal", instead replace that with "Norwegian_Norway".

For example, if your postgresql.conf looks like this:

 ...
 # These settings are initialized by initdb, but they can be changed.
 lc_messages = 'Norwegian (Bokmål)_Norway.1252'			# locale for system error message
 					# strings
 lc_monetary = 'Norwegian (Bokmål)_Norway.1252'			# locale for monetary formatting
 lc_numeric = 'Norwegian (Bokmål)_Norway.1252'			# locale for number formatting
 lc_time = 'Norwegian (Bokmål)_Norway.1252'				# locale for time formatting

then change the settings to be like this:

 ...
 lc_messages = 'Norwegian_Norway.1252'			# locale for system error message
 					# strings
 lc_monetary = 'Norwegian_Norway.1252'			# locale for monetary formatting
 lc_numeric = 'Norwegian_Norway.1252'			# locale for number formatting
 lc_time = 'Norwegian_Norway.1252'				# locale for time formatting

You might see a number other than 1252 suffixed to the locale name; don't change the number.

3. Stop and restart the server, or use "pg_ctl reload", to ensure the new postgresql.conf settings are active.

4. Alternatively, pg_dump/restore, or pg_upgrade can be used to recreate the cluster from scratch, with the correct locale names. However, the incorrect names may cause pg_restore or pg_upgrade to fail, in which case the above method must be used.

What happens if I don't perform the manual changes?

The CREATE DATABASE command ceases to work, unless the "TEMPLATE=template0" option is used. You might also have problems with future dumps or upgrade operations.