Gsoc08-collation

From PostgreSQL wiki
Jump to navigationJump to search

Proposal

Abstract

Current version of PostgreSQL supports only one collation per database cluster set by initdb. This does not meet the requirements of some users developing multi-lingual applications.

The goal of the work will be to implement collation at database level and make foundations for further national language support development. User will be able to set collation when creating database or change collation of existing one. Particulary commands CREATE DATABASE... COLLATE … and ALTER DATABASE … COLLATE … regarding ANSI standard. Work will also implement.possibility of creating users's own collation collection – commands CREATE COLLATION … FROM … USING and DROP COLLATION regaring ANSI standard.

Further information

This work will be used as my bachelor thesis. Learning PostgreSQL's internals better will be great experience for me. I will continue working on this project as a master thesis and will be adding more functionality. The idea is to implement collation per colmun. For my batchof this work I'm applying for in the Google Summer of Code 2008 will implement collation functionality at database level and create foundation for further multi language support development. This will be a significant benefit for open source community.

The initial part of my work has been completed and submitted as part of a patch contributed by Alexey Slynko. I'm now in stage of adding collation catalogs, that will be important for further multi language support.

Users and developers have been asking for improvement of multi language support. This requirement has been already added to official PostgreSQL TODO list.

Implementation

Catalogs

  • new catalog pg_collation will be defined
  • pg_collation will contain SQL standard collations + optional default collation (when set other than SQL standard one)
  • pg_type, pg_attribute, pg_namespace will be extended with references to default records in pg_collation

initdb

  • pg_collation will contain pre-defined records regarding SQL standard and optionally one record that will be non-standard set when creating initdb (the one using system locales)
  • this record will be referenced by pg_type, pg_attribute, pg_namespace in concerned columns and will be concidered as default collation that will be inherited

CREATE DATABASE ... COLLATE ...

  • after copying the new database the collation will be default (same as cluster collation) or changed by COLLATE statement. Then we update pg_type, pg_attribute and pg_namespace catalogs
  • reindex database

When changing databases the database collation will be retrieved from type text from pg_type.

Mail archive

Downloads