9.1第三十五章

From PostgreSQL wiki
Jump to navigationJump to search

Chapter 35. Extending SQL (扩展SQL)

在本章的各个小节里,我们将讨论如何通过添加下述操作或功能来对PostgreSQL中的SQL查询语言进行扩展: (约定:PG是PostgreSQL的简写)

  1. 函数(在35.3小节里开始介绍)
  2. 聚合函数(在35.10小节里开始介绍)
  3. 数据类型(在35.11小节里开始介绍)
  4. 操作符(在35.12小节里开始介绍)
  5. 索引的操作符类(在35.14小节里开始介绍)
  6. 相关对象的包(在35.15小节里开始介绍)

35.1. 扩展如何工作

PG可扩展,是因为它的操作是catalog驱动的(catalog在此处不译)。如果熟悉标准的关系数据库系统,你应该知道它们会保存有关数据库、表、列等有关信息,这些都被称为系统catalog。(有些数据库系统称其为数据字典。)catalogs呈现给用户的是一些表,就跟其它普通表一样,但是DBMS会在内部对它们进行登记。PG和标准的关系数据库系统一个重要的区别在于,PG在它的catalog表里保存了更多的信息:不仅饮食表和列的相关信息,还有有关数据类型、函数、访问方法等其它各类的对象。用户可以修改这些表,由于PG的操作要以这些表为基础,这意味着用户可以对PG进行扩展。相比而言,常规的数据库系统只能通过直接修改源码或者通过加载DBMS供应商提供的特殊模块来进行扩展。


PG服务器更能吸纳用户编写的代码,通过动态加载纳入自己的体系。也就是说,用户可以指定一个实现新类型或函数的对象代码文件(如,一个共享库(注:也即动态库)),PG会按实际要求进行加载。更常见的是直接使用SQL代码往服务器里添加。这种在“运行时”修改操作的特性,使得PG尤其适合对一些新应用和存储结构快速建立原型。


35.2. PG的类型系统

PG的数据类型,可以分为基本类型、复合类型、域以及伪类型。

35.2.1. 基本类型

基本类型,如int4,主要是指SQL语言中比较偏底层的实现类型(典型的,在底层实现语言C语言当中)。通常与大家熟悉的抽象数据类型相对应。PG只能通过用户提供的函数来操纵这些类型,对这些类型的理解程度取决于用户对这些类型的描述。基本类型进一步可以分为标量和数组类型。对每一个标量类型,对应的数组类型会自动创建,该数组类型是该标量类型的容器,其长度可变。


35.2.2. 复合类型

复合类型,或行类型,主要用于用户建表的时候。它也可以单独用于创建一个复合类型,不与表关联。复合类型,是在前的字段类型的一个列表。它的值就是一个记录行,用于记录各个字段值。用户可以通过SQL查询来访问各个字段值。您可以参考8.15节获取更多有关复合类型方面的信息。

35.2.3. 域

域的概念建立在特定的基本类型基础上,并且大多用于与它的基础类型进行交换。但是,域可以有约束,用于限制它的有效值的范围,使其有效值位于底层基本类型值的某一个范围内。

域可以使用SQL命令CREATE DOMAIN来创建,它的创建和使用将不在本章讨论。

35.2.4. Pseudo-Types

There are a few "pseudo-types" for special purposes. Pseudo-types cannot appear as columns of tables or attributes of composite types, but they can be used to declare the argument and result types of functions. This provides a mechanism within the type system to identify special classes of functions. Table 8-24 lists the existing pseudo-types.

35.2.5. Polymorphic Types

Four pseudo-types of special interest are anyelement, anyarray, anynonarray, and anyenum, which are collectively called polymorphic types. Any function declared using these types is said to be a polymorphic function. A polymorphic function can operate on many different data types, with the specific data type(s) being determined by the data types actually passed to it in a particular call.

Polymorphic arguments and results are tied to each other and are resolved to a specific data type when a query calling a polymorphic function is parsed. Each position (either argument or return value) declared as anyelement is allowed to have any specific actual data type, but in any given call they must all be the same actual type. Each position declared as anyarray can have any array data type, but similarly they must all be the same type. If there are positions declared anyarray and others declared anyelement, the actual array type in the anyarray positions must be an array whose elements are the same type appearing in the anyelement positions. anynonarray is treated exactly the same as anyelement, but adds the additional constraint that the actual type must not be an array type. anyenum is treated exactly the same as anyelement, but adds the additional constraint that the actual type must be an enum type.

Thus, when more than one argument position is declared with a polymorphic type, the net effect is that only certain combinations of actual argument types are allowed. For example, a function declared as equal(anyelement, anyelement) will take any two input values, so long as they are of the same data type.

When the return value of a function is declared as a polymorphic type, there must be at least one argument position that is also polymorphic, and the actual data type supplied as the argument determines the actual result type for that call. For example, if there were not already an array subscripting mechanism, one could define a function that implements subscripting as subscript(anyarray, integer) returns anyelement. This declaration constrains the actual first argument to be an array type, and allows the parser to infer the correct result type from the actual first argument's type. Another example is that a function declared as f(anyarray) returns anyenum will only accept arrays of enum types.

Note that anynonarray and anyenum do not represent separate type variables; they are the same type as anyelement, just with an additional constraint. For example, declaring a function as f(anyelement, anyenum) is equivalent to declaring it as f(anyenum, anyenum): both actual arguments have to be the same enum type.

A variadic function (one taking a variable number of arguments, as in Section 35.4.5) can be polymorphic: this is accomplished by declaring its last parameter as VARIADIC anyarray. For purposes of argument matching and determining the actual result type, such a function behaves the same as if you had written the appropriate number of anynonarray parameters.

35.3. User-defined Functions

PostgreSQL provides four kinds of functions:

query language functions (functions written in SQL) (Section 35.4)

procedural language functions (functions written in, for example, PL/pgSQL or PL/Tcl) (Section 35.7)

internal functions (Section 35.8)

C-language functions (Section 35.9)

Every kind of function can take base types, composite types, or combinations of these as arguments (parameters). In addition, every kind of function can return a base type or a composite type. Functions can also be defined to return sets of base or composite values.

Many kinds of functions can take or return certain pseudo-types (such as polymorphic types), but the available facilities vary. Consult the description of each kind of function for more details.

It's easiest to define SQL functions, so we'll start by discussing those. Most of the concepts presented for SQL functions will carry over to the other types of functions.

Throughout this chapter, it can be useful to look at the reference page of the CREATE FUNCTION command to understand the examples better. Some examples from this chapter can be found in funcs.sql and funcs.c in the src/tutorial directory in the PostgreSQL source distribution.

35.4. Query Language (SQL) Functions

SQL functions execute an arbitrary list of SQL statements, returning the result of the last query in the list. In the simple (non-set) case, the first row of the last query's result will be returned. (Bear in mind that "the first row" of a multirow result is not well-defined unless you use ORDER BY.) If the last query happens to return no rows at all, the null value will be returned.

Alternatively, an SQL function can be declared to return a set, by specifying the function's return type as SETOF sometype, or equivalently by declaring it as RETURNS TABLE(columns). In this case all rows of the last query's result are returned. Further details appear below.

The body of an SQL function must be a list of SQL statements separated by semicolons. A semicolon after the last statement is optional. Unless the function is declared to return void, the last statement must be a SELECT, or an INSERT, UPDATE, or DELETE that has a RETURNING clause.

Any collection of commands in the SQL language can be packaged together and defined as a function. Besides SELECT queries, the commands can include data modification queries (INSERT, UPDATE, and DELETE), as well as other SQL commands. (You cannot use transaction control commands, e.g. COMMIT, SAVEPOINT, and some utility commands, e.g. VACUUM, in SQL functions.) However, the final command must be a SELECT or have a RETURNING clause that returns whatever is specified as the function's return type. Alternatively, if you want to define a SQL function that performs actions but has no useful value to return, you can define it as returning void. For example, this function removes rows with negative salaries from the emp table:

CREATE FUNCTION clean_emp() RETURNS void AS '

   DELETE FROM emp
       WHERE salary < 0;

' LANGUAGE SQL;

SELECT clean_emp();

clean_emp

(1 row) The syntax of the CREATE FUNCTION command requires the function body to be written as a string constant. It is usually most convenient to use dollar quoting (see Section 4.1.2.4) for the string constant. If you choose to use regular single-quoted string constant syntax, you must double single quote marks (') and backslashes (\) (assuming escape string syntax) in the body of the function (see Section 4.1.2.1).

Arguments to the SQL function are referenced in the function body using the syntax $n: $1 refers to the first argument, $2 to the second, and so on. If an argument is of a composite type, then the dot notation, e.g., $1.name, can be used to access attributes of the argument. The arguments can only be used as data values, not as identifiers. Thus for example this is reasonable:

INSERT INTO mytable VALUES ($1); but this will not work:

INSERT INTO $1 VALUES (42); 35.4.1. SQL Functions on Base Types

The simplest possible SQL function has no arguments and simply returns a base type, such as integer:

CREATE FUNCTION one() RETURNS integer AS $$

   SELECT 1 AS result;

$$ LANGUAGE SQL;

-- Alternative syntax for string literal: CREATE FUNCTION one() RETURNS integer AS '

   SELECT 1 AS result;

' LANGUAGE SQL;

SELECT one();

one

  1

Notice that we defined a column alias within the function body for the result of the function (with the name result), but this column alias is not visible outside the function. Hence, the result is labeled one instead of result.

It is almost as easy to define SQL functions that take base types as arguments. In the example below, notice how we refer to the arguments within the function as $1 and $2.

CREATE FUNCTION add_em(integer, integer) RETURNS integer AS $$

   SELECT $1 + $2;

$$ LANGUAGE SQL;

SELECT add_em(1, 2) AS answer;

answer

     3

Here is a more useful function, which might be used to debit a bank account:

CREATE FUNCTION tf1 (integer, numeric) RETURNS integer AS $$

   UPDATE bank
       SET balance = balance - $2
       WHERE accountno = $1;
   SELECT 1;

$$ LANGUAGE SQL; A user could execute this function to debit account 17 by $100.00 as follows:

SELECT tf1(17, 100.0); In practice one would probably like a more useful result from the function than a constant 1, so a more likely definition is:

CREATE FUNCTION tf1 (integer, numeric) RETURNS numeric AS $$

   UPDATE bank
       SET balance = balance - $2
       WHERE accountno = $1;
   SELECT balance FROM bank WHERE accountno = $1;

$$ LANGUAGE SQL; which adjusts the balance and returns the new balance. The same thing could be done in one command using RETURNING:

CREATE FUNCTION tf1 (integer, numeric) RETURNS numeric AS $$

   UPDATE bank
       SET balance = balance - $2
       WHERE accountno = $1
   RETURNING balance;

$$ LANGUAGE SQL; 35.4.2. SQL Functions on Composite Types

When writing functions with arguments of composite types, we must not only specify which argument we want (as we did above with $1 and $2) but also the desired attribute (field) of that argument. For example, suppose that emp is a table containing employee data, and therefore also the name of the composite type of each row of the table. Here is a function double_salary that computes what someone's salary would be if it were doubled:

CREATE TABLE emp (

   name        text,
   salary      numeric,
   age         integer,
   cubicle     point

);

INSERT INTO emp VALUES ('Bill', 4200, 45, '(2,1)');

CREATE FUNCTION double_salary(emp) RETURNS numeric AS $$

   SELECT $1.salary * 2 AS salary;

$$ LANGUAGE SQL;

SELECT name, double_salary(emp.*) AS dream

   FROM emp
   WHERE emp.cubicle ~= point '(2,1)';
name | dream

+-------

Bill |  8400

Notice the use of the syntax $1.salary to select one field of the argument row value. Also notice how the calling SELECT command uses * to select the entire current row of a table as a composite value. The table row can alternatively be referenced using just the table name, like this:

SELECT name, double_salary(emp) AS dream

   FROM emp
   WHERE emp.cubicle ~= point '(2,1)';

but this usage is deprecated since it's easy to get confused.

Sometimes it is handy to construct a composite argument value on-the-fly. This can be done with the ROW construct. For example, we could adjust the data being passed to the function:

SELECT name, double_salary(ROW(name, salary*1.1, age, cubicle)) AS dream

   FROM emp;

It is also possible to build a function that returns a composite type. This is an example of a function that returns a single emp row:

CREATE FUNCTION new_emp() RETURNS emp AS $$

   SELECT text 'None' AS name,
       1000.0 AS salary,
       25 AS age,
       point '(2,2)' AS cubicle;

$$ LANGUAGE SQL; In this example we have specified each of the attributes with a constant value, but any computation could have been substituted for these constants.

Note two important things about defining the function:

The select list order in the query must be exactly the same as that in which the columns appear in the table associated with the composite type. (Naming the columns, as we did above, is irrelevant to the system.)

You must typecast the expressions to match the definition of the composite type, or you will get errors like this:

ERROR: function declared to return emp returns varchar instead of text at column 1 A different way to define the same function is:

CREATE FUNCTION new_emp() RETURNS emp AS $$

   SELECT ROW('None', 1000.0, 25, '(2,2)')::emp;

$$ LANGUAGE SQL; Here we wrote a SELECT that returns just a single column of the correct composite type. This isn't really better in this situation, but it is a handy alternative in some cases — for example, if we need to compute the result by calling another function that returns the desired composite value.

We could call this function directly in either of two ways:

SELECT new_emp();

        new_emp

(None,1000.0,25,"(2,2)")

SELECT * FROM new_emp();

name | salary | age | cubicle

+--------+-----+---------

None | 1000.0 |  25 | (2,2)

The second way is described more fully in Section 35.4.7.

When you use a function that returns a composite type, you might want only one field (attribute) from its result. You can do that with syntax like this:

SELECT (new_emp()).name;

name

None

The extra parentheses are needed to keep the parser from getting confused. If you try to do it without them, you get something like this:

SELECT new_emp().name; ERROR: syntax error at or near "." LINE 1: SELECT new_emp().name;

                       ^

Another option is to use functional notation for extracting an attribute. The simple way to explain this is that we can use the notations attribute(table) and table.attribute interchangeably.

SELECT name(new_emp());

name

None

-- This is the same as: -- SELECT emp.name AS youngster FROM emp WHERE emp.age < 30;

SELECT name(emp) AS youngster FROM emp WHERE age(emp) < 30;

youngster

Sam
Andy

Tip: The equivalence between functional notation and attribute notation makes it possible to use functions on composite types to emulate "computed fields". For example, using the previous definition for double_salary(emp), we can write SELECT emp.name, emp.double_salary FROM emp; An application using this wouldn't need to be directly aware that double_salary isn't a real column of the table. (You can also emulate computed fields with views.) Because of this behavior, it's unwise to give a function that takes a single composite-type argument the same name as any of the fields of that composite type. Another way to use a function returning a composite type is to pass the result to another function that accepts the correct row type as input:

CREATE FUNCTION getname(emp) RETURNS text AS $$

   SELECT $1.name;

$$ LANGUAGE SQL;

SELECT getname(new_emp());

getname

None

(1 row) Still another way to use a function that returns a composite type is to call it as a table function, as described in Section 35.4.7.

35.4.3. SQL Functions with Parameter Names

It is possible to attach names to a function's parameters, for example

CREATE FUNCTION tf1 (acct_no integer, debit numeric) RETURNS numeric AS $$

   UPDATE bank
       SET balance = balance - $2
       WHERE accountno = $1
   RETURNING balance;

$$ LANGUAGE SQL; Here the first parameter has been given the name acct_no, and the second parameter the name debit. So far as the SQL function itself is concerned, these names are just decoration; you must still refer to the parameters as $1, $2, etc within the function body. (Some procedural languages let you use the parameter names instead.) However, attaching names to the parameters is useful for documentation purposes. When a function has many parameters, it is also useful to use the names while calling the function, as described in Section 4.3.

35.4.4. SQL Functions with Output Parameters

An alternative way of describing a function's results is to define it with output parameters, as in this example:

CREATE FUNCTION add_em (IN x int, IN y int, OUT sum int) AS 'SELECT $1 + $2' LANGUAGE SQL;

SELECT add_em(3,7);

add_em

    10

(1 row) This is not essentially different from the version of add_em shown in Section 35.4.1. The real value of output parameters is that they provide a convenient way of defining functions that return several columns. For example,

CREATE FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int) AS 'SELECT $1 + $2, $1 * $2' LANGUAGE SQL;

SELECT * FROM sum_n_product(11,42);
sum | product

+---------

 53 |     462

(1 row) What has essentially happened here is that we have created an anonymous composite type for the result of the function. The above example has the same end result as

CREATE TYPE sum_prod AS (sum int, product int);

CREATE FUNCTION sum_n_product (int, int) RETURNS sum_prod AS 'SELECT $1 + $2, $1 * $2' LANGUAGE SQL; but not having to bother with the separate composite type definition is often handy. Notice that the names attached to the output parameters are not just decoration, but determine the column names of the anonymous composite type. (If you omit a name for an output parameter, the system will choose a name on its own.)

Notice that output parameters are not included in the calling argument list when invoking such a function from SQL. This is because PostgreSQL considers only the input parameters to define the function's calling signature. That means also that only the input parameters matter when referencing the function for purposes such as dropping it. We could drop the above function with either of

DROP FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int); DROP FUNCTION sum_n_product (int, int); Parameters can be marked as IN (the default), OUT, INOUT, or VARIADIC. An INOUT parameter serves as both an input parameter (part of the calling argument list) and an output parameter (part of the result record type). VARIADIC parameters are input parameters, but are treated specially as described next.

35.4.5. SQL Functions with Variable Numbers of Arguments

SQL functions can be declared to accept variable numbers of arguments, so long as all the "optional" arguments are of the same data type. The optional arguments will be passed to the function as an array. The function is declared by marking the last parameter as VARIADIC; this parameter must be declared as being of an array type. For example:

CREATE FUNCTION mleast(VARIADIC arr numeric[]) RETURNS numeric AS $$

   SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);

$$ LANGUAGE SQL;

SELECT mleast(10, -1, 5, 4.4);

mleast 

    -1

(1 row) Effectively, all the actual arguments at or beyond the VARIADIC position are gathered up into a one-dimensional array, as if you had written

SELECT mleast(ARRAY[10, -1, 5, 4.4]); -- doesn't work You can't actually write that, though — or at least, it will not match this function definition. A parameter marked VARIADIC matches one or more occurrences of its element type, not of its own type.

Sometimes it is useful to be able to pass an already-constructed array to a variadic function; this is particularly handy when one variadic function wants to pass on its array parameter to another one. You can do that by specifying VARIADIC in the call:

SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]); This prevents expansion of the function's variadic parameter into its element type, thereby allowing the array argument value to match normally. VARIADIC can only be attached to the last actual argument of a function call.

The array element parameters generated from a variadic parameter are treated as not having any names of their own. This means it is not possible to call a variadic function using named arguments (Section 4.3), except when you specify VARIADIC. For example, this will work:

SELECT mleast(VARIADIC arr := ARRAY[10, -1, 5, 4.4]); but not these:

SELECT mleast(arr := 10); SELECT mleast(arr := ARRAY[10, -1, 5, 4.4]); 35.4.6. SQL Functions with Default Values for Arguments

Functions can be declared with default values for some or all input arguments. The default values are inserted whenever the function is called with insufficiently many actual arguments. Since arguments can only be omitted from the end of the actual argument list, all parameters after a parameter with a default value have to have default values as well. (Although the use of named argument notation could allow this restriction to be relaxed, it's still enforced so that positional argument notation works sensibly.)

For example:

CREATE FUNCTION foo(a int, b int DEFAULT 2, c int DEFAULT 3) RETURNS int LANGUAGE SQL AS $$

   SELECT $1 + $2 + $3;

$$;

SELECT foo(10, 20, 30);

foo 

 60

(1 row)

SELECT foo(10, 20);

foo 

 33

(1 row)

SELECT foo(10);

foo 

 15

(1 row)

SELECT foo(); -- fails since there is no default for the first argument ERROR: function foo() does not exist The = sign can also be used in place of the key word DEFAULT.

35.4.7. SQL Functions as Table Sources

All SQL functions can be used in the FROM clause of a query, but it is particularly useful for functions returning composite types. If the function is defined to return a base type, the table function produces a one-column table. If the function is defined to return a composite type, the table function produces a column for each attribute of the composite type.

Here is an example:

CREATE TABLE foo (fooid int, foosubid int, fooname text); INSERT INTO foo VALUES (1, 1, 'Joe'); INSERT INTO foo VALUES (1, 2, 'Ed'); INSERT INTO foo VALUES (2, 1, 'Mary');

CREATE FUNCTION getfoo(int) RETURNS foo AS $$

   SELECT * FROM foo WHERE fooid = $1;

$$ LANGUAGE SQL;

SELECT *, upper(fooname) FROM getfoo(1) AS t1;

fooid | foosubid | fooname | upper

+----------+---------+-------

    1 |        1 | Joe     | JOE

(1 row) As the example shows, we can work with the columns of the function's result just the same as if they were columns of a regular table.

Note that we only got one row out of the function. This is because we did not use SETOF. That is described in the next section.

35.4.8. SQL Functions Returning Sets

When an SQL function is declared as returning SETOF sometype, the function's final query is executed to completion, and each row it outputs is returned as an element of the result set.

This feature is normally used when calling the function in the FROM clause. In this case each row returned by the function becomes a row of the table seen by the query. For example, assume that table foo has the same contents as above, and we say:

CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$

   SELECT * FROM foo WHERE fooid = $1;

$$ LANGUAGE SQL;

SELECT * FROM getfoo(1) AS t1; Then we would get:

fooid | foosubid | fooname

+----------+---------

    1 |        1 | Joe
    1 |        2 | Ed

(2 rows) It is also possible to return multiple rows with the columns defined by output parameters, like this:

CREATE TABLE tab (y int, z int); INSERT INTO tab VALUES (1, 2), (3, 4), (5, 6), (7, 8);

CREATE FUNCTION sum_n_product_with_tab (x int, OUT sum int, OUT product int) RETURNS SETOF record AS $$

   SELECT $1 + tab.y, $1 * tab.y FROM tab;

$$ LANGUAGE SQL;

SELECT * FROM sum_n_product_with_tab(10);

sum | product

+---------

 11 |      10
 13 |      30
 15 |      50
 17 |      70

(4 rows) The key point here is that you must write RETURNS SETOF record to indicate that the function returns multiple rows instead of just one. If there is only one output parameter, write that parameter's type instead of record.

Currently, functions returning sets can also be called in the select list of a query. For each row that the query generates by itself, the function returning set is invoked, and an output row is generated for each element of the function's result set. Note, however, that this capability is deprecated and might be removed in future releases. The following is an example function returning a set from the select list:

CREATE FUNCTION listchildren(text) RETURNS SETOF text AS $$

   SELECT name FROM nodes WHERE parent = $1

$$ LANGUAGE SQL;

SELECT * FROM nodes;

  name    | parent

+--------

Top       |
Child1    | Top
Child2    | Top
Child3    | Top
SubChild1 | Child1
SubChild2 | Child1

(6 rows)

SELECT listchildren('Top');

listchildren

Child1
Child2
Child3

(3 rows)

SELECT name, listchildren(name) FROM nodes;

 name  | listchildren

+--------------

Top    | Child1
Top    | Child2
Top    | Child3
Child1 | SubChild1
Child1 | SubChild2

(5 rows) In the last SELECT, notice that no output row appears for Child2, Child3, etc. This happens because listchildren returns an empty set for those arguments, so no result rows are generated.

Note: If a function's last command is INSERT, UPDATE, or DELETE with RETURNING, that command will always be executed to completion, even if the function is not declared with SETOF or the calling query does not fetch all the result rows. Any extra rows produced by the RETURNING clause are silently dropped, but the commanded table modifications still happen (and are all completed before returning from the function). 35.4.9. SQL Functions Returning TABLE

There is another way to declare a function as returning a set, which is to use the syntax RETURNS TABLE(columns). This is equivalent to using one or more OUT parameters plus marking the function as returning SETOF record (or SETOF a single output parameter's type, as appropriate). This notation is specified in recent versions of the SQL standard, and thus may be more portable than using SETOF.

For example, the preceding sum-and-product example could also be done this way:

CREATE FUNCTION sum_n_product_with_tab (x int) RETURNS TABLE(sum int, product int) AS $$

   SELECT $1 + tab.y, $1 * tab.y FROM tab;

$$ LANGUAGE SQL; It is not allowed to use explicit OUT or INOUT parameters with the RETURNS TABLE notation — you must put all the output columns in the TABLE list.

35.4.10. Polymorphic SQL Functions

SQL functions can be declared to accept and return the polymorphic types anyelement, anyarray, anynonarray, and anyenum. See Section 35.2.5 for a more detailed explanation of polymorphic functions. Here is a polymorphic function make_array that builds up an array from two arbitrary data type elements:

CREATE FUNCTION make_array(anyelement, anyelement) RETURNS anyarray AS $$

   SELECT ARRAY[$1, $2];

$$ LANGUAGE SQL;

SELECT make_array(1, 2) AS intarray, make_array('a'::text, 'b') AS textarray;

intarray | textarray

+-----------

{1,2}    | {a,b}

(1 row) Notice the use of the typecast 'a'::text to specify that the argument is of type text. This is required if the argument is just a string literal, since otherwise it would be treated as type unknown, and array of unknown is not a valid type. Without the typecast, you will get errors like this:

ERROR: could not determine polymorphic type because input has type "unknown" It is permitted to have polymorphic arguments with a fixed return type, but the converse is not. For example:

CREATE FUNCTION is_greater(anyelement, anyelement) RETURNS boolean AS $$

   SELECT $1 > $2;

$$ LANGUAGE SQL;

SELECT is_greater(1, 2);

is_greater

f

(1 row)

CREATE FUNCTION invalid_func() RETURNS anyelement AS $$

   SELECT 1;

$$ LANGUAGE SQL; ERROR: cannot determine result data type DETAIL: A function returning a polymorphic type must have at least one polymorphic argument. Polymorphism can be used with functions that have output arguments. For example:

CREATE FUNCTION dup (f1 anyelement, OUT f2 anyelement, OUT f3 anyarray) AS 'select $1, array[$1,$1]' LANGUAGE SQL;

SELECT * FROM dup(22);

f2 |   f3

+---------

22 | {22,22}

(1 row) Polymorphism can also be used with variadic functions. For example:

CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$

   SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);

$$ LANGUAGE SQL;

SELECT anyleast(10, -1, 5, 4);

anyleast 

      -1

(1 row)

SELECT anyleast('abc'::text, 'def');

anyleast 

abc

(1 row)

CREATE FUNCTION concat_values(text, VARIADIC anyarray) RETURNS text AS $$

   SELECT array_to_string($2, $1);

$$ LANGUAGE SQL;

SELECT concat_values('|', 1, 4, 2);

concat_values 

1|4|2

(1 row) 35.4.11. SQL Functions with Collations

When a SQL function has one or more parameters of collatable data types, a collation is identified for each function call depending on the collations assigned to the actual arguments, as described in Section 22.2. If a collation is successfully identified (i.e., there are no conflicts of implicit collations among the arguments) then all the collatable parameters are treated as having that collation implicitly. This will affect the behavior of collation-sensitive operations within the function. For example, using the anyleast function described above, the result of

SELECT anyleast('abc'::text, 'ABC'); will depend on the database's default collation. In C locale the result will be ABC, but in many other locales it will be abc. The collation to use can be forced by adding a COLLATE clause to any of the arguments, for example

SELECT anyleast('abc'::text, 'ABC' COLLATE "C"); Alternatively, if you wish a function to operate with a particular collation regardless of what it is called with, insert COLLATE clauses as needed in the function definition. This version of anyleast would always use en_US locale to compare strings:

CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$

   SELECT min($1[i] COLLATE "en_US") FROM generate_subscripts($1, 1) g(i);

$$ LANGUAGE SQL; But note that this will throw an error if applied to a non-collatable data type.

If no common collation can be identified among the actual arguments, then a SQL function treats its parameters as having their data types' default collation (which is usually the database's default collation, but could be different for parameters of domain types).

The behavior of collatable parameters can be thought of as a limited form of polymorphism, applicable only to textual data types.


35.5. Function Overloading

More than one function can be defined with the same SQL name, so long as the arguments they take are different. In other words, function names can be overloaded. When a query is executed, the server will determine which function to call from the data types and the number of the provided arguments. Overloading can also be used to simulate functions with a variable number of arguments, up to a finite maximum number.

When creating a family of overloaded functions, one should be careful not to create ambiguities. For instance, given the functions:

CREATE FUNCTION test(int, real) RETURNS ... CREATE FUNCTION test(smallint, double precision) RETURNS ... it is not immediately clear which function would be called with some trivial input like test(1, 1.5). The currently implemented resolution rules are described in Chapter 10, but it is unwise to design a system that subtly relies on this behavior.

A function that takes a single argument of a composite type should generally not have the same name as any attribute (field) of that type. Recall that attribute(table) is considered equivalent to table.attribute. In the case that there is an ambiguity between a function on a composite type and an attribute of the composite type, the attribute will always be used. It is possible to override that choice by schema-qualifying the function name (that is, schema.func(table)) but it's better to avoid the problem by not choosing conflicting names.

Another possible conflict is between variadic and non-variadic functions. For instance, it is possible to create both foo(numeric) and foo(VARIADIC numeric[]). In this case it is unclear which one should be matched to a call providing a single numeric argument, such as foo(10.1). The rule is that the function appearing earlier in the search path is used, or if the two functions are in the same schema, the non-variadic one is preferred.

When overloading C-language functions, there is an additional constraint: The C name of each function in the family of overloaded functions must be different from the C names of all other functions, either internal or dynamically loaded. If this rule is violated, the behavior is not portable. You might get a run-time linker error, or one of the functions will get called (usually the internal one). The alternative form of the AS clause for the SQL CREATE FUNCTION command decouples the SQL function name from the function name in the C source code. For instance:

CREATE FUNCTION test(int) RETURNS int

   AS 'filename', 'test_1arg'
   LANGUAGE C;

CREATE FUNCTION test(int, int) RETURNS int

   AS 'filename', 'test_2arg'
   LANGUAGE C;

The names of the C functions here reflect one of many possible conventions.


35.6. Function Volatility Categories

Every function has a volatility classification, with the possibilities being VOLATILE, STABLE, or IMMUTABLE. VOLATILE is the default if the CREATE FUNCTION command does not specify a category. The volatility category is a promise to the optimizer about the behavior of the function:

A VOLATILE function can do anything, including modifying the database. It can return different results on successive calls with the same arguments. The optimizer makes no assumptions about the behavior of such functions. A query using a volatile function will re-evaluate the function at every row where its value is needed.

A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call. In particular, it is safe to use an expression containing such a function in an index scan condition. (Since an index scan will evaluate the comparison value only once, not once at each row, it is not valid to use a VOLATILE function in an index scan condition.)

An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever. This category allows the optimizer to pre-evaluate the function when a query calls it with constant arguments. For example, a query like SELECT ... WHERE x = 2 + 2 can be simplified on sight to SELECT ... WHERE x = 4, because the function underlying the integer addition operator is marked IMMUTABLE.

For best optimization results, you should label your functions with the strictest volatility category that is valid for them.

Any function with side-effects must be labeled VOLATILE, so that calls to it cannot be optimized away. Even a function with no side-effects needs to be labeled VOLATILE if its value can change within a single query; some examples are random(), currval(), timeofday().

Another important example is that the current_timestamp family of functions qualify as STABLE, since their values do not change within a transaction.

There is relatively little difference between STABLE and IMMUTABLE categories when considering simple interactive queries that are planned and immediately executed: it doesn't matter a lot whether a function is executed once during planning or once during query execution startup. But there is a big difference if the plan is saved and reused later. Labeling a function IMMUTABLE when it really isn't might allow it to be prematurely folded to a constant during planning, resulting in a stale value being re-used during subsequent uses of the plan. This is a hazard when using prepared statements or when using function languages that cache plans (such as PL/pgSQL).

For functions written in SQL or in any of the standard procedural languages, there is a second important property determined by the volatility category, namely the visibility of any data changes that have been made by the SQL command that is calling the function. A VOLATILE function will see such changes, a STABLE or IMMUTABLE function will not. This behavior is implemented using the snapshotting behavior of MVCC (see Chapter 13): STABLE and IMMUTABLE functions use a snapshot established as of the start of the calling query, whereas VOLATILE functions obtain a fresh snapshot at the start of each query they execute.

Note: Functions written in C can manage snapshots however they want, but it's usually a good idea to make C functions work this way too. Because of this snapshotting behavior, a function containing only SELECT commands can safely be marked STABLE, even if it selects from tables that might be undergoing modifications by concurrent queries. PostgreSQL will execute all commands of a STABLE function using the snapshot established for the calling query, and so it will see a fixed view of the database throughout that query.

The same snapshotting behavior is used for SELECT commands within IMMUTABLE functions. It is generally unwise to select from database tables within an IMMUTABLE function at all, since the immutability will be broken if the table contents ever change. However, PostgreSQL does not enforce that you do not do that.

A common error is to label a function IMMUTABLE when its results depend on a configuration parameter. For example, a function that manipulates timestamps might well have results that depend on the timezone setting. For safety, such functions should be labeled STABLE instead.

Note: Before PostgreSQL release 8.0, the requirement that STABLE and IMMUTABLE functions cannot modify the database was not enforced by the system. Releases 8.0 and later enforce it by requiring SQL functions and procedural language functions of these categories to contain no SQL commands other than SELECT. (This is not a completely bulletproof test, since such functions could still call VOLATILE functions that modify the database. If you do that, you will find that the STABLE or IMMUTABLE function does not notice the database changes applied by the called function, since they are hidden from its snapshot.)


35.7. Procedural Language Functions

PostgreSQL allows user-defined functions to be written in other languages besides SQL and C. These other languages are generically called procedural languages (PLs). Procedural languages aren't built into the PostgreSQL server; they are offered by loadable modules. See Chapter 38 and following chapters for more information.