DATALINK

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(Features)
Line 32: Line 32:
 
* checking if file exists
 
* checking if file exists
 
* protection of linked file against renaming or deletion
 
* protection of linked file against renaming or deletion
* read access through database
+
* read access control through database
* write access through database
+
* write access control through database
 
* point-in-time recovery
 
* point-in-time recovery
 
* deletion of files no longer referenced
 
* deletion of files no longer referenced

Revision as of 05:09, 6 April 2010

Contents

Introduction

SQL Datalinks are a part of SQL/MED ISO/IEC 9075-9:2003 specification.

It is a special SQL type intended to store URLs in database columns, and a number of functions, which can be used in SQL queries.

There aren't many RDBMSs which implement SQL/MED datalinks. IBM DB2 implements it. A lot of it is defined to be "implementation specific" by the SQL standard.

Values of DATALINK type are opaque, constructed by using function DLVALUE. There are several functions available for converting DATALINK back to text, for example DLURLCOMPLETE.

There is not much software, which demands SQL/MED datalinks. There is functionality concerning files, URLs and web, which is often needed in modern applications. Often, it is awkward to use these together with RDBMS data. SQL/MED standardizes some of this.

Features

The DATALINK type can have some special semantics:

  1. referential integrity - file pointed to by a datalink column cannot be renamed or deleted
  2. access control - either SQL or file system mediated
  3. point in time recovery of file changes

Only references to external files are stored in the database, not the content of the files themselves.

Datalinks as defined by SQL/MED should provide:

  • transactional semantics
  • checking if file exists
  • protection of linked file against renaming or deletion
  • read access control through database
  • write access control through database
  • point-in-time recovery
  • deletion of files no longer referenced
  • access to files on different servers

The following is not specified in SQL/MED, but is often needed in practice

  • compact URL storage for lots and lots of URLs
  • URL to filesystem mapping, access control to file system hierarchies
  • local server file access (read, write)
  • remote server file access (get, put)
  • extended file system attributes (get,set)
  • file meta data (list,get,set)

Use cases

Common use cases include:

  • website content management - ensuring that linked-to URLs exist
  • workflow management - providing for locking and versioning of files
  • configuration - managing configuration files, ie have a change in database recreate text configuration file. This file should be locked and writeable only from database.
  • web server log storage (many referer URLs)
  • RDF databases (many URLs)

Implementation

Implementation of SQL/MED requires a lot of hocus-pocus and is probably event impossible to do properly on plain unix. Many parts however seem to be implementable.

When storing many URLs, it should be possible to greatly reduce space requirements by storing URLs only once in a master URL table and using an int4 or int8 as an actual DATALINK type. This greatly reduces storage requirements needed for many URLs. It should be possible to partition the master URL table according to several criteria.

Datalink software is divided in several parts:

  • datalink database schema. URL dictionary
  • external process datalinker. It uses datalink schema. It manages files on behalf of postgresql server. It uses postgresql for implementing transactional semantics on file modifications.

QUESTION: what to do for multiple databases on same server?

Parts:

  • DL - DATALINK SQL/MED + support functions
  • FILEIO - local file access functions
  • URL - parsing and manipulation - one often needs to get and set various parts of URLs and filenames

+ some sort of inter-server communication

Personal tools