DATALINK

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(New page: ==Introduction== SQL Datalinks are a part of SQL/MED ISO/IEC 9075-9:2003 specification. It is a special SQL type intended to store URLs in database columns, and a number of functions, wh...)

Revision as of 05:04, 6 April 2010

Introduction

SQL Datalinks are a part of SQL/MED ISO/IEC 9075-9:2003 specification.

It is a special SQL type intended to store URLs in database columns, and a number of functions, which can be used in SQL queries.

There aren't many RDBMSs which implement SQL/MED datalinks. IBM DB2 implements it. A lot of it is defined to be "implementation specific" by the SQL standard.

Values of DATALINK type are opaque, constructed by using function DLVALUE. There are several functions available for converting DATALINK back to text, for example DLURLCOMPLETE.

Only references to external files are stored in the database, not the content of the files themselves.

The DATALINK type can have some special semantics:

  1. referential integrity - file pointed to by a datalink column cannot be renamed or deleted
  2. access control - either SQL or file system mediated
  3. point in time recovery of file changes

There is not much software, which demands SQL/MED datalinks. There is functionality concerning files, URLs and web, which is often needed in modern applications. Often, it is awkward to use these together with RDBMS data. SQL/MED standardizes some of this.

Common use cases include:

  • website content management - ensuring that linked-to URLs exist
  • workflow management - providing for locking and versioning of files
  • configuration - managing configuration files, ie have a change in database recreate text configuration file. This file should be locked and writeable only from database.
  • web server log storage (many referer URLs)
  • RDF databases (many URLs)

Datalinks as defined by SQL/MED should provide:

  • transactional semantics
  • checking if file exists
  • protection of linked file against renaming or deletion
  • read access through database
  • write access through database
  • point-in-time recovery
  • deletion of files no longer referenced
  • access to files on different servers

In addition, we would also like

  • compact URL storage for lots and lots of URLs
  • URL : filesystem mapping
  • remote server access
  • extended file system attributes
  • file meta data


Implementation

Implementation of SQL/MED requires a lot of hocus-pocus and is probably event impossible to do properly on plain unix. Many parts however seem to be implementable.

When storing many URLs, it should be possible to greatly reduce space requirements by storing URLs only once in a master URL table and using an int4 or int8 as an actual DATALINK type. This greatly reduces storage requirements needed for many URLs. It should be possible to partition the master URL table according to several criteria.

Datalink software is divided in several parts:

  • datalink database schema. URL dictionary
  • external process datalinker. It uses datalink schema. It manages files on behalf of postgresql server. It uses postgresql for implementing transactional semantics on file modifications.

QUESTION: what to do for multiple databases on same server?

Parts:

  • DL - DATALINK SQL/MED + support functions
  • FILEIO - local file access functions
  • URL - parsing and manipulation - one often needs to get and set various parts of URLs and filenames

+ some sort of inter-server communication

Personal tools