PostgreSQL vs SQL Standard

From PostgreSQL wiki

Jump to: navigation, search

This page attempts to document those instances where PostgreSQL deviates from standard SQL. It is not necessarily complete, and doesn't attempt to include all cases where an optional feature is not implemented or only partially implemented.

Contents

Issues not yet classified

NOT NULL constraints on composite-type columns

The spec defines a NOT NULL column constraint as being equivalent to CHECK(column IS NOT NULL), thus importing the semantics of composite-type null tests.

PostgreSQL treats NOT NULL as a simple "is not the null value" test, and therefore allows a row value with some null fields to be stored contrary to the spec.

COALESCE on row types

The spec defines COALESCE(X,Y) as a syntactic transformation to CASE WHEN X IS NOT NULL THEN X ELSE Y END (it leaves open the question of whether X is really evaluated twice by disallowing non-deterministic expressions or expressions with side effects in this context). A consequence of this is that the rather odd rules for null tests of row types are applied.

PostgreSQL applies only the "is not the null value" test to X. Accordingly, if X is a row value containing null columns, PostgreSQL will return X, while the spec would require returning Y.


Fixable issues

These are cases where PG's behavior differs from the spec, but there appears to be no major obstacle to changing it to conform.

Wrong return type for extract()

The spec requires that extract() return an exact numeric type.

PostgreSQL returns an approximate numeric type (double precision).

There are obviously some backward compatibility concerns that would have to be considered before changing this one, but they don't look serious.


Issues with some block to implementation

These are cases where PG's behavior differs from the spec, or where an otherwise desirable feature is missing, but there is some technical issue that would make it harder to implement.

LISTAGG()

LISTAGG() is the standard SQL equivalent of our string_agg(), but it uses the ordered-set aggregate syntax.

Implementing this in the most straightforward way would require erasing many of the existing distinctions between ordered-set aggregate syntax and conventional aggregate syntax. Currently we require that ordered-set aggs be called using WITHIN GROUP, while ordinary aggs must not use WITHIN GROUP but may have an embedded ORDER BY clause in their arguments (as per the standard ARRAY_AGG()).

LISTAGG() does not have the semantics of an ordered-set aggregate, so implementing it other than as a special case would mean allowing ordinary aggs to be called using WITHIN GROUP as an alternative to their normal syntax.

Data change delta tables

These are the standard's closest equivalent to our RETURNING and writable CTEs, but the visibility and trigger firing rules appear to be different to those of wCTEs, so one can't be implemented in terms of the other.

Identifier lengths

The spec requires that identifiers be allowed up to 18 characters even if the "long identifiers" feature is not declared; if it is, then the limit is 128 characters. Note that this limit is in characters and not bytes.

PostgreSQL limits the significant length of an identifier to 63 bytes unless specified otherwise at compile time; this isn't enough for 18 unicode characters.

Currently we use the fixed-length name type for names; making it larger would significantly inflate the sizes of many system catalogs, while making it variable length would be a compatibility issue (e.g. changing the order of columns in many system catalogs to put the name field after the fixed-length fields).

DISTINCT aggregates as window functions

PostgreSQL does not allow DISTINCT in aggregates when they are invoked as window functions.

The existing implementation of DISTINCT would be quite hard to adapt to window functions.

Deferrable CHECK and NOT NULL

The spec defines NOT NULL in terms of CHECK, and allows CHECK constraints to be deferrable.

PostgreSQL currently does all these checks before actually inserting the new or modified row.

FULL OUTER JOIN conditions

The spec allows any join condition for a FULL OUTER JOIN.

PostgreSQL currently limits such conditions to ones that are implementable without constructing explicit UNION operations; that is, the condition must be hashable, mergeable, or constant.

Adding support for arbitrary full joins (which would likely require constructing a UNION of the inner join and the two anti-joins) seems like a lot of work for minimal gain.


Issues with some block at design level

These are cases where PG's behavior differs from the spec, or where an otherwise desirable feature is missing, but there is some technical issue that makes it hard to even decide how to implement it.

ANY() and SOME() aggregates

PostgreSQL supports every() from the standard, and provides bool_or as a nonstandard spelling of the ANY/SOME aggregate function, but two syntax conflicts make it impossible to parse the standard's ANY syntax:

  1. PostgreSQL allows = ANY (array_expression) as a form of quantified comparison predicate (this is an extension to the spec)
  2. PostgreSQL allows subselects in all contexts to have additional parentheses, which the spec does not allow

The spec's syntax for ANY(expression) as an aggregate function can be parsed unambiguously only because the spec's version of ANY as a quantified comparison predicate must have the keyword SELECT immediately following the ANY(.

Unreserved keywords as select-list aliases without AS

The spec requires that SELECT 1 ZONE be equivalent to SELECT 1 AS ZONE. PostgreSQL accepts only the latter form or alternatively requires the identifier to be quoted.

The main syntax ambiguity here is caused by the presence of postfix operators in PostgreSQL syntax; SELECT 1 ! ZONE would be ambiguous between treating the operator as postfix or infix (we take it as being infix in this case by using precedence rules). Additional ambiguities exist with compound type names using NATIONAL, VARYING, or WITHOUT (which would likely have to be made reserved words, as they are in the spec). WITHIN, FILTER, and OVER, and the datetime component names, which are not currently reserved in PostgreSQL, would also need to become reserved words. Since adding so many more reserved words is likely to break existing queries, this isn't considered a particularly viable solution.

DROP DOMAIN … CASCADE

The spec says that DROP DOMAIN … CASCADE does not drop any columns, but simply changes any columns of the domain type to instead use the base type with the addition of the domain's defaults and constraints.

PostgreSQL drops columns of the domain type.

The issue here is that we can reach a DROP DOMAIN by cascading from a drop of another object, such as a schema containing the domain or a function referenced in the domain's default or constraints. Supporting the standard behavior would therefore mean altering objects in ways that add new dependencies from within the code that traverses the existing dependencies, which seems problematic.

Temporary tables

In the spec, temporary tables are persistent schema objects whose data (and only the data) is temporary and per-session. Implementing these in PostgreSQL would actually have substantial advantages (catalog bloat when using temp tables heavily is a common problem); but PostgreSQL's existing behavior is very well-entrenched and changing it would cause compatibility issues.

(One approach sometimes suggested would be to use the spec's method if GLOBAL was specified when creating the table.)

Mandatory parens in VALUES

The spec allows VALUES 1,2,3 (as an explicit table with 1 column, 3 rows).

PostgreSQL requires parens on each row: VALUES (1),(2),(3).

It looks like the only obstacle here is that VALUES would need to become a reserved word.


Issues where we explicitly choose to disregard the spec (WONTFIX)

Identifier case

PostgreSQL folds unquoted identifiers to lowercase, not the uppercase that the spec requires.

Constraint name scope

PostgreSQL treats the scope of a table or column constraint name as being the table it belongs to. The spec requires that such names be unique over the whole schema.

PostgreSQL's behavior here severely limits the usefulness of several information_schema views, which are defined on the assumption of schema-wide constraint name uniqueness.

TIMESTAMP WITH TIME ZONE

PostgreSQL's timestamp with time zone type is very different from the standard one.

  1. timestamp '2018-06-01 00:00:00+1200' is a timestamp without timezone, spec says it should be with timezone.
    PostgreSQL does not allow the type of a literal to depend on its string value.
  2. timestamp with time zone does not store a timezone offset, but is interpreted with the session timezone.
    The spec's timezone handling is less than useless, PostgreSQL's is actually useful.

AT TIME ZONE

Our AT TIME ZONE operator is somewhat different to the spec's.

The spec says:

  1. t AT TIME ZONE z where t is of type time[stamp] without time zone returns a result of the corresponding with time zone type.
  2. t AT TIME ZONE z where t is of type time[stamp] with time zone, the result is of the same type.

What PostgreSQL does is:

  1. t AT TIME ZONE z where t is of type timestamp without time zone returns a result of type timestamp with time zone.
  2. t AT TIME ZONE z where t is of type timestamp with time zone returns a result of type timestamp without time zone.
  3. t AT TIME ZONE z where t is of type time with time zone returns a result of type time with time zone.
  4. t AT TIME ZONE z where t is of type time without time zone casts the value of t to type time with time zone and then proceeds as the previous option.

This permits various transforms which are simply not possible in the spec's crippled timezone support, such as converting a wallclock time from one location to another via two applications of AT TIME ZONE.

INTERVAL

The standard interval type assumes that all days are the same length, and therefore intervals can be divided into "year-month" and "day-second" values; only one of which can be present in any single interval value or column.

PostgreSQL's interval type has three fields: months,days,[micro]seconds and allows all three to be present simultaneously.

Rationale: the spec's lack of usable timezone support makes its interval type useless too.

Transaction management

The spec's model of the relationship between the database and its clients differs from PostgreSQL's (and most other databases for that matter). This leaves the spec's requirements concerning how transactions begin and end as somewhat inconsistent, with some statements being defined as implicitly starting transactions and others not (and some may or may not implicitly start transactions depending on variable content).

PostgreSQL treats all client requests (which may contain multiple statements) as being wrapped in a transaction (auto-committing at the end of the request) except where BEGIN or START TRANSACTION is used explicitly.

Default transaction isolation level

The spec declares SERIALIZABLE to be the default isolation level, but PostgreSQL defaults to READ COMMITTED.

Object ownership scope

The standard controls ownership of objects only at the schema level; objects inside a schema belong to the schema's owner.

PostgreSQL tracks ownership at individual object level.

Rationale: this PG behavior dates back to before it had schemas, and is much more flexible than the standard's approach.

Trigger firing order

The standard says that trigger firing order on a table depends on creation order of the triggers.

PostgreSQL fires triggers in order of name.

Rationale: documentation says "this was judged to be more convenient".

Trigger firing relative to referential constraint actions

The spec requires that when a referential action (e.g. CASCADE or SET NULL) causes a BEFORE trigger to be fired, the referential action is actually performed before firing the trigger.

One possible motivation for this is to ensure that the trigger function doesn't see the inconsistent intermediate state of the database in which the FK is violated.

Nevertheless, PostgreSQL always runs the BEFORE trigger before doing the operation.

LIKE with no escape clause

The standard says that LIKE has no escape character if the ESCAPE clause was not specified.

PostgreSQL treats a missing ESCAPE clause as if it were ESCAPE '\'.

Rationale: historical.

Declared type of string literals

The standard says that the type of a string literal is fixed-length character string, i.e. character(n) where n is the length of the literal.

PostgreSQL treats untyped literals as being of unknown type and deduces the type from context (even in contexts where the spec does not provide for contextual typing). If no type is deduced, the value is treated either as text or unknown type depending on context and PostgreSQL version.

Trailing spaces in character(n)

Trailing spaces in char(n) values are removed in contexts where the spec requires they be kept.

Rationale: PG actually converts char(n) to text in most contexts and all the relevant functions and operators take text args, rather than having separate versions for char(n).

FETCH FIRST 0 ROWS

The spec requires an error to be generated for specifying a limit of 0. We generate errors only for values strictly less than 0.

Early enforcement of non-deferrable UNIQUE

The spec says that UNIQUE (and PRIMARY KEY) constraints are enforced at statement end and not row-by-row when in immediate mode, with no distinction between non-deferrable constraints and deferrable constraints currently set to immediate mode.

PostgreSQL enforces non-deferrable constraints row-by-row, and deferrable constraints in immediate mode at statement end. This is a compromise between wanting to preserve the hard semantic guarantees of uniqueness in row-by-row mode (relied on by the planner in order to prove result uniqueness), while allowing conflicting updates (e.g. UPDATE … SET uniq = uniq + 1 where really necessary.

Lexing of string literals and comments

PostgreSQL rejects all of these:

select 'foo'
 /**/ 'bar' from ...

select 'foo' /**/ 'bar' from ...

select U&'foo' UESCAPE /**/ 'x' from ...

select 'foo' as U&"foo" UESCAPE /**/ 'x' from ...

select U&'foo' /**/ UESCAPE 'x' from ...

select 'foo' as U&"foo" /**/ UESCAPE 'x' from ...

The first of those is certainly legal in the spec; the remainder are more debatable since the spec is rather unclear.

Regarding the second case:

5.3 <literal>
Syntax Rules
7) In a <character string literal>, <national character string literal>, <Unicode character string literal>, or <binary string literal>, a <separator> shall contain a <newline>.

But SQL2016 also says:

5.2 <token> and <separator>
Syntax Rules
11) SQL text containing one or more instances of <comment> is equivalent to the same SQL text with the <comment> replaced with <newline>.

which implies that the input should have been accepted.

Regarding the remaining cases:

5.2 <token> and <separator>
<separator> ::=
{ <comment> | <white space> }...

Syntax Rules
7) Any <token> may be followed by a <separator>.
5.3 <literal>
Syntax Rules
11) In a <Unicode character string literal>, there shall be no <separator> between the "U" and the <ampersand> nor between the <ampersand> and the <quote>.

which seems to imply that <separator> around the UESCAPE would be permitted; indeed, no other rule permits any kind of whitespace there.

There's no present desire to fix this because of the added complexity in the lexer. It is possible that this could be revisited if the lexer is redesigned in future.


Issues where PostgreSQL actually conforms to the spec

These issues have been claimed or discussed as violations, but where PostgreSQL currently conforms to the spec as written. These are for reference in case they come up again; also, cases where PostgreSQL formerly violated the spec can be noted here as fixed.

null return from every(x)

In the query SELECT every(x) FROM t; the result will be NULL if t has no rows or all values of t.x are NULL. This is correct as per the spec, even though it's inconsistent with conventional mathematical usage (in which an AND of no terms should return true).


Major features simply not implemented yet

  • CREATE TABLE … (LIKE … INCLUDING GENERATED)
  • MATCH PARTIAL
  • references in typed tables
  • fetch first … with ties
  • fetch first … percent
  • more functional dependencies
  • row pattern recognition
  • partition join
  • distinct types
  • temporal tables
  • generated columns
  • respect/ignore nulls for window functions
  • "from last" for nth_value
  • COMMIT/ROLLBACK AND CHAIN
  • many more
  • CAST(val AS type FORMAT 'template')
Personal tools