StateOfICU

From PostgreSQL wiki
Jump to navigationJump to search

Intro

Collation is how strings are compared and sorted. The simplest approach is to "memcmp" the strings, which is what the C locale does.

Collation affects predicates, but it also affects the structure of an index, which depends on a consistent ordering.

Providers

More complex collations use a provider, which may be either ICU or glibc. A different provider or different version may produce a different collation order, which risks corrupting indexes (necessesitating REINDEX).

Benefits of ICU

Risks

  • Unknown unknowns
  • Ordering differences
    • Though that can happen due to different libcs, or different versions of any provider)
  • ICU has its own bugs
  • some "obsolete" locales are no longer recognized in newer versions of ICU
    • C
    • fr_FR@euro
    • de__PHONEBOOK

Done

TODO

  • Redefine iculocale=C/POSIX (and "C.anything"/"POSIX.anything"?) to mean memcmp/pg_ascii
  • Make LOCALE (and --locale) apply to ICU
    • the fact that locale doesn't apply to ICU creates a situation described as "maximally confusing"

Questions

Opinions about ICU technically?

  • Quality?
  • Performance?
  • User experience?

Direction: opinions about pushing users toward ICU to be the preferred collation provider?

Timing: opinions about the steps that have been taken or should be taken in version 16? Defaults?

Notes

  • ...