DATALINK

This is a design document describing planning for a feature that may not yet exist in PostgreSQL. See the official documentation for your version to get information on available features.

Introduction

SQL Datalinks are a part of SQL/MED ISO/IEC 9075-9:2003 specification. Note that this is distinct from and complements efforts described here.

Datalink is a special SQL type intended to store URLs in database columns, and a number of functions, which can be used in SQL queries.

There aren't many RDBMSs which implement SQL/MED datalinks. IBM DB2 does. A lot of it is defined by the SQL standard to be "implementation specific".

Values of DATALINK type are opaque, constructed by using function DLVALUE. There are several functions available for converting DATALINK back to text, for example DLURLCOMPLETE.

There is not much software, which demands SQL/MED datalinks. However, there is functionality concerning files, URLs and web, which is often needed in modern applications. Often, it is awkward to use these together with RDBMS data. SQL/MED standardizes some of this.

Features

A column containing a value of type DATALINK can optionally have some special semantics:

referential integrity - file pointed to by a datalink cannot be renamed or deleted
access control - either SQL or file system mediated
point in time recovery of file changes

Only references to external files are stored in the database, not the content of the files themselves.

Datalinks as defined by SQL/MED should provide:

transactional semantics
checking if file exists
protection of linked file against renaming or deletion
read access control through database
write access control through database
point-in-time recovery
deletion of files no longer referenced
access to files on different servers

The following is not specified by SQL/MED, but is often needed in practice (and probably needed to implement datalink):

compact URL storage for lots and lots of URLs (say int instead of string; sort-of like enum, but for URLs)
URL to filesystem mapping, access control to file system hierarchies
local server file access (read, write)
remote server file access (get, put)
extended file system attributes (get, set)
file meta data (list, get, set)

Use cases

Common use cases include:

website content management - ensuring that linked-to URLs exist
workflow management - providing for locking and versioning of files
configuration - managing configuration files, ie have a change in database recreate text configuration file. This file should be locked and writeable only from database.
web server log storage (many referer URLs)
RDF databases (many URLs)

Datalinks should enable you to:

efficiently store URLs (datalinks are stored as integers, not strings)
conveniently map between URLs and file system paths
safely read and write files from SQL (SQL/MED does not specify functions for this)
keep backups of changed files
manage files on different servers
manage datalink metadata (unified interface to extended filesystem attributes)

Specification

Synopsys

create table my_files (
  ...,
  link datalink,
  ....
)

insert into my_files (link) select dlvalue('file:///etc/passwd');
 
select into url dlurlcomplete(link) from my_files;

update my_files set link=dlnewcopy(url,0);

update my_files set link=dlpreviouscopy(url,0);

Datalink support functions

Constructors for values of type datalink:

DLVALUE(url) → datalink (INSERT only)
DLNEWCOPY(url,tokenp) → datalink (UPDATE only)
DLPREVIOUSCOPY(url,tokenp) → datalink (UPDATE only)

Functions for extracting information from datalink type:

DLURLCOMPLETE(datalink) → url
DLURLCOMPLETEONLY(datalink) → url
DLURLCOMPLETEWRITE(datalink) → url
DLURLPATH(datalink) → file path
DLURLPATHONLY(datalink) → file path
DLURLPATHWRITE(datalink) → file path

DLURLSCHEME(datalink) → URL scheme ('HTTP' or 'FILE')
DLURLSERVER(datalink) → server address

DLREPLACECONTENT(url, url, comment) → datalink (new in SQL 2008)

Additional functions in another well known RDBMS

DLVALUE(address,linktype [,comment]) → datalink
DLCOMMENT(datalink) → text
DLLINKTYPE(datalink) → link type ('FILE' or 'URL')

Datalink attributes per SQL spec

These are refered to as link control options by SQL spec. They are insanely verbose as text, but can be stored in a typmod. They are specified per column and thus apply to all URLs stored.

LINK CONTROL

NO
- just store the datalink
- file is not "linked", no further control
FILE
- file is "linked"
- file has to exist
- level of control can be specified with further options

INTEGRITY

ALL - linked files cannot be deleted or renamed
SELECTIVE - linked files can only be deleted or modified using file manager operations, if no datalinker is installed
NONE - referenced files can be deleted or modified using file manager operations, not compatible with FILE LINK CONTROL

READ ACCESS

DB
- read access is controlled by SQL server, based on access privileges to the datalink value
- involves read access tokens
  - encoded into the URL by the SQL server
  - verified by external file manager/data linker
FS
- read access is determined by file manager

WRITE ACCESS

ADMIN REQUIRING TOKEN FOR UPDATE
- write access governed by SQL server (and datalinker)
- involves write access token for modifying file content
ADMIN NOT REQUIRING TOKEN FOR UPDATE
- write access governed by SQL server (and datalinker)
BLOCKED
- linked files cannot be modified
FS
- write access controlled by file manager

RECOVERY

YES
NO

ON UNLINK

DELETE
- file is deleted when unlinked
RESTORE
- original properties (ownership, permissions) restored as well
NONE
- ownership and permissions are not restored

Implementation

Implementation of SQL/MED requires a lot of hocus-pocus and is probably even impossible to do properly on plain unix. Many parts, however seem to be implementable.

System should support at least http: and file: URL schemes. These two require quite different implementations. Perhaps a support for pluggable URL schemes could be useful - consider for example RDF namespaces.

When storing many URLs, it should be possible to greatly reduce space requirements by storing URLs only once in a master URL table and using an int4 or int8 as an actual DATALINK type. This greatly reduces storage requirements needed for many URLs, provided for the use case mentioned. It should be possible to partition the master URL table according to several criteria.

Datalink software is divided in several parts:

datalink database schema. Master URL table is stored here.
external process datalinker. It uses datalink schema. It manages files on behalf of postgresql server. It uses postgresql for implementing transactional semantics.

QUESTION: what to do for multiple databases on same server?

Parts

DL - DATALINK SQL/MED + support functions
FILEIO - local file access functions
URL - parsing and manipulation - one often needs to get and set various parts of URLs and filenames

Some sort of inter-server communication is needed (to support remote files). Accessing datalinker api in a remote postgresql instance over libpq (say dblink) could maybe work.

Misc notes

On ext3 filesystem, files can be locked with 'chattr +i' command, but only by root. This prevents file from beeing changed, no matter what unix file permissions say, even for root user.

By changing file owner/group to postgres and setting unix file permissions, file can be made effectively owned by postgresql server (which can than use normal unix functions to use them).

PL/PerlU seems the right choice for this, with Perl for external datalinker. Proper security and safety will be hard to get right!

SQL Conformance

Datalink features in the SQL standard
Identifier	Description	Status
M001	Datalinks
M002	Datalinks via SQL/CLI
M003	Datalinks via Embedded SQL
M011	Datalinks via Ada
M012	Datalinks via C
M013	Datalinks via COBOL
M014	Datalinks via Fortran
M015	Datalinks via M
M016	Datalinks via Pascal
M017	Datalinks via PL/I

Error codes for datalinks
Code	Meaning
22000	DATA EXCEPTION
22017	INVALID DATA SPECIFIED FOR DATALINK
2201A	NULL ARGUMENT PASSED TO DATALINK CONSTRUCTOR
2201D	DATALINK VALUE EXCEEDS MAXIMUM LENGTH
HW000	DATALINK EXCEPTION
HW001	EXTERNAL FILE NOT LINKED
HW002	EXTERNAL FILE ALREADY LINKED
HW003	REFERENCED FILE DOES NOT EXIST
HW004	INVALID WRITE TOKEN
HW005	INVALID DATALINK CONSTRUCTION
HW006	INVALID WRITE PERMISSION FOR UPDATE
HW007	REFERENCED FILE NOT VALID
HY093	INVALID DATALINK VALUE

DATALINK

Contents

Introduction

Features

Use cases

Specification

Synopsys

Datalink support functions

Datalink attributes per SQL spec

Implementation

Parts

Misc notes

SQL Conformance

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Tools

Search