Variable Design

From PostgreSQL wiki
Jump to navigationJump to search

Introduction

This page aims at discussing possible variable designs within PosgreSQL. Such variables could be:

  • simple (typed?) session variables, like My/MS/Oracle SQL
  • advanced session variables, i.e. with more capabilities (permissions? MVCC? ...)
  • persistent variables, like relations but with one value
  • ...

Use Cases

A few use cases for session or permanent variables.

#1. Session variable: Basic SQL-Level Server-Side Scripting

Example scenario: in interactive mode, store the result of a query and reuse it later.

  • get the uid of a user known by name
  • get the timestamp of its last login
  • then consult application logs around this time for that user
   SELECT uid FROM AppUser WHERE login='calvin'; -- into $UID
   SELECT MAX(ts) FROM AppUserLogin WHERE uid=$UID; -- into $LAST
   SELECT * FROM AppLog WHERE uid=$UID AND ABS(ts-$LAST) < '1 min';

#2. Session Variables: Persistence of Security Properties

  • a protected function is called with credentials
  • it expensively checks the credentials and
  • stores whether it is ok in some protected session variable
  • other protected functions called can check cheaply for this status before performing operations
  • a key requirement is that if the setup transaction fails, the session variable must be reverted somehow

#3. Persistent Variables: Storing Application Properties

Application properties such as version installed, date of last upgrade are stored and managed by the database...

This is often done with a one row table, although there is no such thing readily enforceable with pg.

Proposed Syntaxes & Semantics

List of possible semantical properties, pros/cons:

  • namespace, is it distinct from relations?
  • persistence of object declaration
  • persistence of object value, scope of existence
  • permissions
  • transactional (MVCC)
  • constraints
  • triggers?
  • implementation cost?
  • check through static analysis (eg plpgsql_check)

Simple Session Variables

Simple session variables are an extension of existing session variables (aka user-defined GUCs) to add typing, privacy, improved syntax... A private variable is identified with a @ prefix. They are created and assigned in their own namespace in a session with SET. They are typed. They can be used with SQL in place of any scalar value. They end with the session/process.

Proposed syntax:

 SET @var [type] [(=/TO) sql-expression] [DEFAULT constant?];
  • if no type supplied and first declaration: inferred from assigned value or default? error undeclared? TEXT?
  • if type supplied but already declared different: error inconsistent? change type?
  • if type supplied but already declared: ignore
  • =/TO is for assigning a value
  • DEFAULT is for setting the value on declaration only, ignored otherwise
  • DEFAULT default is "DEFAULT NULL"
  • special SET TO DEFAULT to reset to the value to the default
 SET @secure_status BOOLEAN = <SQL-boolean-expression>;
 -- usage
 SELECT @secure_status;
 SELECT * FROM SomeTable WHERE someColumn = @secure_status;

Possible options:

  • compatibility with existing SET?
    • compatible syntax... SET [SESSION | LOCAL] var = text-constant;
    • extend SET with typing?
    • @-prefix means private? some other sign for public?
    • otherwise possible keywords: DEFINE DEF VARIABLE VAR ...
    • type syntax with AS? looks less like a column declaration
  • DEFAULT <constant> or DEFAULT <expression>?
  • composite types, @var.field: ok?
  • allow multiple comma-separated declarations: yes? SET does not.
  • allow PUBLIC (RW to all) variables: maybe/no? SET does this already
    • security? none, it is public...
    • how would namespaces interact one with the other?
    • explicit namespace: @public/private.var? other?
  • allow PROTECTED (R to all) variables: no?
    • security implications? implementation efficiency?
    • issue: what if already exists as PUBLIC? or PROTECTED by someone else?
  • have PUBLIC variables by default: no?!
  • if a non existent variable is referenced, should it be an error or NULL?
    • NULL, fails gracefully on non declared variables: no...
    • error, better to detect var name typos: yes
    • NULL + warning: a possible compromise: no
    • both, decided with a guc: no
  • if a variable is redeclared with same type/default value: ignore
  • if a variable is redeclared with a different type/default value: error
  • allow to remove a variable? UNSET @var?
  • allow shared (between sessions) variables? how??
  • on the fly assignment (à la MySQL :=): no?
  • allow persistent declarations of session variables, through some system table?

Properties: very light-weight, live only in session, not in catalog, transactional or not???, no permissions (or private/public?)... secured by default (private to role).

Implementation: very efficient, only one session key-value store ('owner-id@name' -> value & type) in memory in the server process. When is the value substituted?

Pros:

  • Common access syntax both in SQL and PL/pgSQL

Cons:

  • Transactional may be more useful, but more costly?
    • see GUCs performance?

Full-fledge Variable Objects

Declared in catalog, MVCC, permissions...

Secure Session Variables

Declared in catalog, permissions, optionally transactional, no constraints, ... They allow to share values securely between function calls.

The Variable can be declared as TEMP - then the catalog changes and the accessibility of variable is limited by session or by transaction.

Optionally the content can be be initialized to default or to previous value (when the content was changed in transactions that fails). This behavior should be explicitly required by using clauses `ON ROLLBACK RESET` and `ON ROLLBACK UNDO`.

 CREATE VARIABLE myapp.x INT DEFAULT 0;
 GRANT SELECT ON VARIABLE x TO public;
 SELECT setvar('myapp.x', 3);
 SELECT getvar('myapp.x');
 -- possible direct access in PL/pgSQL (SQL??)
    IF myapp.x = 2 THEN ...

Questions:

  • is there a use-case where ON ROLLBACK RESET (vs UNDO) is useful?
    • DEFAULT value can be setted to some safe state value (like "disconnected"), then RESET enforce change to safe state. UNDO does change to previous values, what can be anything
  • is there a use-case where not-transactional is useful?
    • sure, when you would to store info about any raised error, performance counters, info about last actions,
    • it adds another implementation path
    • this option as a default weakens security, esp. in use-case #2
    • if it remains, it should have a name `ON ROLLBACK IGNORE`?
      • can be
  • is a direct variable reference syntax ok with SQL?
    • why not - if there is not possible conflict between variables and tables
    • how would it interact with column names, table names...? (cf PL/pgSQL variable pain...)
      • When you use qualified names, there are no problems
  • maybe it would be nice to have SESSION in the syntax, for a session variable?
    • Isn't it TEMP variables

Comments:

  • security:
    • declared directly on the variable (i.e. not through security definer accessor functions)
    • default access permissions are restricted to the owner
    • the owner can GRANT SELECT|UPDATE permissions to other roles
      • based on standard builtin PostgreSQL permissions
    • two level security: schema and variable
  • the variable is declared (CREATE) once and for all
    • reside permanently in system catalog, like a table: global, list, ...
    • however content is per session, scope analogue to TEMP tables (although TEMP table contents is transactional - this is not relevant here - TEMP from metadata perspective)
    • allows a simple check of any reference of variable (everywhere where you can detect getter/setter functions and name is constant)
      • If *all* source code is available? What about dynamic accesses (getvar(random_name()))? What about other languages (SQL, compiled...)?
  • As variable identifiers are unique in a database, getter/setter functions are only necessary in environments without variables support (i.e. SQL)

Pros:

  • Variable declaration (type, permissions) is persistent in the database (although not its value) so it is declared only once.
  • Direct full-database permissions (GRANT/REVOKE).
  • The default value should be defined - implicitly evaluated when variable was first touched in session

Cons:

  • New kind of database session-live object with only partial database support: type, permissions, declaration persistence, optional transactional behaviour but not value-persistence, constraints...
    • some limits are similar to sequences - no MVCC, no constraints - but the constraints can be assigned (if we would it), domain types (with check) can be used too.
  • Not well fitted for simple session variable use case because of catalog costs and declaration persistence (a simple session variable must disappear at the end of the session).
    • we can introduce temporary variables - the impact on catalog for interactive work is +/- zero
  • Setter/getter access functions:
    • syntax not well integrated to SQL (required setter/getter functions), for an object actually in the database catalog...
      • we use same access methods for sequences now
      • setter/getter function can be replaced by direct identifiers - when there will not be possible conflict between variables and other SQL identifiers
    • somehow allows dynamic names, which may or may not be a good idea, esp. in a security-related usage context
      • security is declared - the security is not based on knowledge of name, if somebody know or doesn't know name has not any impact on security.

Application to Use Cases

Depending on the proposals, what you would have to write in order to implement the various use cases. Pros & cons...

Use Case #1 - Session Variables for Basic Scripting

Basic session variable usage scenario.

With User-Defined GUCS

Quite verbose, especially for interactive use. Casting is a pain.

   SELECT set_config('x.uid',
       (SELECT uid FROM AppUser WHERE login = 'calvin')::TEXT, FALSE);
   SELECT set_config('x.last',
       (SELECT MAX(ts) FROM AppUserLogin
        WHERE uid = current_setting('x.uid')::INT)::TEXT, FALSE);
   SELECT * FROM AppLog 
     WHERE uid = current_setting('x.uid')::INT
       AND ABS(ts - current_setting('x.last')::TIMESTAMP) < '1 min';

With psql Client-Side Variables

Must work around lack of value typing. Escaping is a little strange.

   SELECT uid FROM AppUser WHERE login='calvin' \gset
   SELECT MAX(ts) AS last FROM AppUser WHERE uid=:uid \gset
   SELECT * FROM AppLog WHERE uid=:uid AND ABS(ts - :'last'::TIMESTAMP) < '1 min';

i=== With Simple Session Variables ===

   SET @uid = (SELECT uid FROM AppUser WHERE login='calvin');
   SET @last = (SELECT MAX(ts) FROM AppUser WHERE uid=@uid);
   SELECT * FROM AppLog WHERE uid=@uid AND ABS(ts - @last) < '1 min';

Pros:

  • syntax is quite light weight (no casting...)

With Secure Session Variables

If some TEMPORARY option is available on the variables, then:

Scenario 1:

  CREATE TEMP VARIABLE uid INT, last TIMESTAMP;
  -- CREATE VARIABLE uid INT
  --     DEFAULT (SELECT uid FROM AppUser WHERE login='calvin');
  SELECT setvar('uid', (SELECT uid FROM AppUser WHERE login='calvin'));
  SELECT setvar('last', (SELECT MAX(ts) FROM AppUserLogin WHERE uid=getvar('uid'));
  SELECT * FROM AppLog
    WHERE uid=getvar('uid') AND ABS(ts - getvar('last')) < '1 min';

Notes:

  • looks better than GUCs, thanks to typing.

Cons:

  • the variable creation implies catalog changes
    • the cost for interactive work on catalog is minimal
    • it cannot work read-only (low permissions, replica)

Use Case #2

Session persistent security status for access control.

With Simple Session Variables

Initial variable setup with application admin role:

  CREATE FUNCTION checkUserCredentials(...)
    RETURNS BOOLEAN SECURITY DEFINER AS $$
      .. expensive check whether use is ok... then
      SET SESSION @user_is_ok BOOLEAN = <...>;
      SELECT @user_is_ok;
    $$ LANGUAGE SQL;

Use of the private session variable from other functions:

  CREATE FUNCTION safeDoSomething(...)
    RETURNS BOOLEAN SECURITY DEFINER AS $$
      -- not strictly necessary, would fail anyway:
      SET SESSION @user_is_ok BOOLEAN DEFAULT FALSE;
      IF NOT @user_is_ok THEN RAISE ...;
      ... else do it ...
    $$ LANGUAGE plpgsql;

User perspective:

  SELECT setUserCredentials('top secret credentials');
  SELECT safeDoSomething(...);

If read-only access to the private variable is needed, use a getter:

  CREATE FUNCTION isUserOk()
    RETURNS BOOLEAN SECURITY DEFINER AS $$
    SET SESSION @user_is_ok BOOLEAN DEFAULT FALSE;
    SELECT @user_is_ok;
  $$ LANGUAGE SQL;

Similarly a setter could be proposed with permissions restricted to some roles.

With User-Defined GUCs

User-defined GUCs are public to all roles within the session, they cannot be used for this use case. They could be if they were private.

With TEMP Table

TODO: try to create a private table with session vars...

   CREATE TEMPORARY TABLE session_vars(user_is_ok BOOLEAN);
   REVOKE ...

With Secure Session Variables

There is risk of possible data inconsistency when some statements in initial transactions can enforce deferred rollback - it is safe with `RESET` clause. Can be safe too (without transactional support) if checkUserCredentials is used in read only transaction.


Initial variable setup with application admin role:

   CREATE VARIABLE user_is_ok BOOLEAN DEFAULT false ON ROLLBACK RESET;
   GRANT/REVOKE ... ON VARIABLE user_is_ok;
   CREATE FUNCTION checkUserCredentials(...)
     RETURNS BOOLEAN SECURITY DEFINER AS $$
       .. expensive check whether use is ok... then
       SELECT setvar('user_is_ok', <...>);
       SELECT getvar('user_is_ok');
     $$ LANGUAGE SQL;

Use of the session variable from other functions:

   CREATE FUNCTION safeDoSomething(...)
     RETURNS BOOLEAN SECURITY DEFINER AS $$
       IF NOT user_is_ok THEN RAISE ...;
       ... else do it ...
     $$ LANGUAGE plpgsql;

User perspective:

   SELECT setUserCredentials('top secret credentials');
   SELECT safeDoSomething(...);

Use Case 3

Persistent variables for eg application status.

With Session Variables

Cannot be used, session variable values are not persistent.

With GUCs

With Manual Singleton Table

Create the singleton table:

  CREATE TABLE AppConfig(
    is_singleton BOOLEAN DEFAULT TRUE CHECK(is_singleton),
    version INTEGER NOT NULL DEFAULT 1
  );            
  INSERT INTO AppConfig(version) VALUES(1);

Ensure singleton non empty property with a trigger, do not forget TRUNCATE.

  CREATE FUNCTION pg_do_not_edit() RETURNS TRIGGER AS $$
    BEGIN
      RAISE 'cannot edit table ...';
    END;
  $$ LANGUAGE PLpgSQL;
  CREATE TRIGGER AppConfigIsSingleton
    BEFORE INSERT OR DELETE OR TRUNCATE ON AppConfig
    FOR EACH STATEMENT EXECUTE PROCEDURE pg_do_not_edit();

Then use table, only SELECT & UPDATE are allowed.

  SELECT version FROM AppConfig;
  
  -- application schema updates
  BEGIN
    ALTER stuff...;
    UPDATE AppConfig SET version = 2;
  COMMIT;

With SINGLETON Table

Create the version for app in a singleton table:

   CREATE TABLE AppConfig(version INT DEFAULT 1) SINGLETON;

Consult or update the version on upgrade:

   SELECT version FROM AppConfig;
   
   BEGIN;
     -- alter tables whatever
     UPDATE AppConfig SET version = 2;
   COMMIT;

With Persistent Variable Object

Object declaration:

   CREATE VARIABLE version INT DEFAULT 1;

Version update:

   BEGIN;
     ... do updates
     SET version = 2;
   COMMIT;

Variables, Here and There

PostgreSQL User-Defined GUCS

User GUCs can store and load TEXT temporarily (i.e. in the session or within a transaction) through SET/SHOW and two functions. Variables are TEXT only (explicit casting is required in and out), the value setting is transactional, there are no explicit permissions.

   SET x.foo = <TEXT-constant>;
   SHOW x.foo;
   SELECT set_config('x.foo', <TEXT-expression>, FALSE);
   SELECT current_setting('x.foo')::<EXPECTED-TYPE>;

Notes:

  • set_config/current_setting do not look symmetrical...
  • costs about 3µs/set_config
  • it would look a little better with simpler setter/getter functions, e.g. setVar('foo', <expression>) and getVar('foo')
  • GUCS are public to all roles in the session

PostgreSQL "psql" Variables

psql has client side variables managed as raw strings. They can be set directly or from a query result.

 \set pg 5432
 SELECT 'hello world' AS message \gset
 SELECT * FROM SomeTable WHERE id=:pg OR msg=:'message';

Properties: untyped, client-side...

  • you can ensure proper escaping with syntax :'...', :"..."

MySQL Session Variables

MySQL user session variables are prefixed with @. They are not declared, but created and implicitly typed as assigned between 5 possible types (integer, decimal, floating-point, binary or non-binary string, or maybe NULL). The type is kept later on. Variables value are constants, they cannot be used directly as identifiers.

 SET @my = 0 + 0, @pg = 5431 + 1;

The special assignment := allows to assign a variable on the fly within any expression...

 SELECT (@var := id) FROM SomeTable;

Properties: pure light-weight session variables, no catalog nor any heavy properties. Typing is very limited.

MS-SQL Session Variables

SQL Server has some session variables in Transact-SQL. Session @-prefix variables are declared, typed and possibly assigned with DECLARE, the assigned value can be modified with SET or SELECT:

 DECLARE @ms INT = 0
 SET @ms += 1 * 0

Properties: ?

Oracle Session Variables

Oracle SQL seems to have bind (VAR) and substitution (DEF/ACCEPT) variables.

These variables are similar to psql variables, although they seem to be typed.

 DEF ora = 0
 SELECT * FROM SomeTable WHERE someColumn = &ora
 VAR ora INT
 EXEC :ora = 0
 SELECT * FROM SomeTable WHERE someColumn = :ora

Properties: ?? Client vs server side?

Oracle Package Variables

 CREATE PACKAGE xx AS 
   y int := 100;
   PROCEDURE foo IS
   DECLARE x int;
   BEGIN
    x := y;
   END;  
 END xx;
 -- outside if y is PUBLIC
 x := xx.y;

Properties:

  • PL/SQL specific variables.
  • initialized on demand in session
  • the content is temporal, unshared between session
  • packages are persistent (i.e. metadata are persistent)
  • access control on variables: PUBLIC, PRIVATE
    • "PRIVATE" allows a access only from within the package - rights are related to scope.
  • These variables are ADA variables - no any relation to SQL engine

DB2 (session) global variables

Global variable is named memory variable defined within schema. Global variable is declared by statement

  CREATE [OR REPLACE] VARIABLE [schema.]name datatype [ (DEFAULT|CONSTANT) (NULL | constant | special register| expression)

Properties:

  • session scope - values are private for each session (shared values are possible only for buildin variables)
  • global variables are not under transaction control