9.1第三章
3.1. Introduction
In the previous chapter we have covered the basics of using SQL to store and access your data in PostgreSQL. We will now discuss some more advanced features of SQL that simplify management and prevent loss or corruption of your data. Finally, we will look at some PostgreSQL extensions.
This chapter will on occasion refer to examples found in Chapter 2 to change or improve them, so it will be useful to have read that chapter. Some examples from this chapter can also be found in advanced.sql in the tutorial directory. This file also contains some sample data to load, which is not repeated here. (Refer to Section 2.1 for how to use the file.)
3.1. 介绍
在前面的章节中,我们介绍了使用 SQL 保存和访问在 PostgreSQL 里的数据的基本方法。 我们现在将讨论一些 SQL 更高级的特性,这些特性可以简化管理和避免你的数据的丢失或损坏。 最后,我们将看看一些PostgreSQL 的扩展。
本章将不时引用在 Chapter 2 里建立的例子, 并且对它们进行修改和提高,因此如果你已经看过那章会更好。 本章的一些例子也可以在教程目录里的 advanced.sql 文件里找到。 这个文件还包括一些要装载的例子数据,这些数据在这里没有介绍。 (请参考 Section 2.1 获取如何使用该文件的方法。)
3.2. Views
Refer back to the queries in Section 2.6. Suppose the combined listing of weather records and city location is of particular interest to your application, but you do not want to type the query each time you need it. You can create a view over the query, which gives a name to the query that you can refer to like an ordinary table:
CREATE VIEW myview AS SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city = name; SELECT * FROM myview;
Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces.
Views can be used in almost any place a real table can be used. Building views upon other views is not uncommon.
3.2.视图
回头看看在 Section 2.6 里的查询例子。 假设你的应用对天气记录和城市位置的组合列表特别感兴趣, 而你又不想每次键入这些查询。那么你可以在这个查询上创建一个视图, 它给这个查询一个名字,你可以像普通表那样引用它。
CREATE VIEW myview AS SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city = name; SELECT * FROM myview;
自由地运用视图是好的 SQL 数据库设计的一个关键要素。 视图允许我们把表结构的细节封装起来,尽管这些表可能因你的应用的进化而变化,而这些变化却可以躲在一致的接口后面。
视图几乎可以在一个真正的表可以使用的任何地方使用。 在其它视图上面再建造视图也并非罕见。
3.3. Foreign Keys
Recall the weather and cities tables from Chapter 2. Consider the following problem: You want to make sure that no one can insert rows in the weather table that do not have a matching entry in the cities table. This is called maintaining the referential integrity of your data. In simplistic database systems this would be implemented (if at all) by first looking at the cities table to check if a matching record exists, and then inserting or rejecting the new weather records. This approach has a number of problems and is very inconvenient, so PostgreSQL can do this for you.
The new declaration of the tables would look like this:
CREATE TABLE cities (
city varchar(80) primary key, location point
);
CREATE TABLE weather (
city varchar(80) references cities(city), temp_lo int, temp_hi int, prcp real, date date
);Now try inserting an invalid record:
INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28'); ERROR: insert or update on table "weather" violates foreign key constraint "weather_city_fkey" DETAIL: Key (city)=(Berkeley) is not present in table "cities". The behavior of foreign keys can be finely tuned to your application. We will not go beyond this simple example in this tutorial, but just refer you to Chapter 5 for more information. Making correct use of foreign keys will definitely improve the quality of your database applications, so you are strongly encouraged to learn about them.
3.3. 外键
回忆一下 Chapter 2 里的 weather 和 cities 表。思考下面的问题:你想确保没有人可以在 weather 表里插入一条在 cities 表里没有匹配记录的数据行。 这就叫维护你的表的参考完整性。 在简单的数据库系统里,实现(如果也叫实现)这个特性的方法 通常是先看看 cities 表里是否有匹配的记录, 然后插入或者拒绝新的 weather 记录。 这个方法有许多问题,而且非常不便,因此 PostgreSQL 可以为你做这些。
新的表声明看起来会象下面这样:
CREATE TABLE cities ( city varchar(80) primary key, location point );
CREATE TABLE weather ( city varchar(80) references cities(city), temp_lo int, temp_hi int, prcp real, date date );
然后我们试图插入一条非法的记录:
INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28'); ERROR: insert or update on table "weather" violates foreign key constraint "weather_city_fkey" DETAIL: Key (city)=(Berkeley) is not present in table "cities".
外键的使用可以改进你的应用。在这份教程里我们就不再多说这个简单的例子了,而是请你参考Chapter 5获取更多的信息。 正确使用外键无疑将改进你的数据库应用,所以我们强烈建议你学习它们。
3.4. Transactions
Transactions are a fundamental concept of all database systems. The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation. The intermediate states between the steps are not visible to other concurrent transactions, and if some failure occurs that prevents the transaction from completing, then none of the steps affect the database at all.
For example, consider a bank database that contains balances for various customer accounts, as well as total deposit balances for branches. Suppose that we want to record a payment of $100.00 from Alice's account to Bob's account. Simplifying outrageously, the SQL commands for this might look like:
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
UPDATE branches SET balance = balance - 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice');
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Bob';
UPDATE branches SET balance = balance + 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob');
The details of these commands are not important here; the important point is that there are several separate updates involved to accomplish this rather simple operation. Our bank's officers will want to be assured that either all these updates happen, or none of them happen. It would certainly not do for a system failure to result in Bob receiving $100.00 that was not debited from Alice. Nor would Alice long remain a happy customer if she was debited without Bob being credited. We need a guarantee that if something goes wrong partway through the operation, none of the steps executed so far will take effect. Grouping the updates into a transaction gives us this guarantee. A transaction is said to be atomic: from the point of view of other transactions, it either happens completely or not at all.
We also want a guarantee that once a transaction is completed and acknowledged by the database system, it has indeed been permanently recorded and won't be lost even if a crash ensues shortly thereafter. For example, if we are recording a cash withdrawal by Bob, we do not want any chance that the debit to his account will disappear in a crash just after he walks out the bank door. A transactional database guarantees that all the updates made by a transaction are logged in permanent storage (i.e., on disk) before the transaction is reported complete.
Another important property of transactional databases is closely related to the notion of atomic updates: when multiple transactions are running concurrently, each one should not be able to see the incomplete changes made by others. For example, if one transaction is busy totalling all the branch balances, it would not do for it to include the debit from Alice's branch but not the credit to Bob's branch, nor vice versa. So transactions must be all-or-nothing not only in terms of their permanent effect on the database, but also in terms of their visibility as they happen. The updates made so far by an open transaction are invisible to other transactions until the transaction completes, whereupon all the updates become visible simultaneously.
In PostgreSQL, a transaction is set up by surrounding the SQL commands of the transaction with BEGIN and COMMIT commands. So our banking transaction would actually look like:
BEGIN; UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
-- etc etc COMMIT; If, partway through the transaction, we decide we do not want to commit (perhaps we just noticed that Alice's balance went negative), we can issue the command ROLLBACK instead of COMMIT, and all our updates so far will be canceled.
PostgreSQL actually treats every SQL statement as being executed within a transaction. If you do not issue a BEGIN command, then each individual statement has an implicit BEGIN and (if successful) COMMIT wrapped around it. A group of statements surrounded by BEGIN and COMMIT is sometimes called a transaction block.
Note: Some client libraries issue BEGIN and COMMIT commands automatically, so that you might get the effect of transaction blocks without asking. Check the documentation for the interface you are using.
It's possible to control the statements in a transaction in a more granular fashion through the use of savepoints. Savepoints allow you to selectively discard parts of the transaction, while committing the rest. After defining a savepoint with SAVEPOINT, you can if needed roll back to the savepoint with ROLLBACK TO. All the transaction's database changes between defining the savepoint and rolling back to it are discarded, but changes earlier than the savepoint are kept.
After rolling back to a savepoint, it continues to be defined, so you can roll back to it several times. Conversely, if you are sure you won't need to roll back to a particular savepoint again, it can be released, so the system can free some resources. Keep in mind that either releasing or rolling back to a savepoint will automatically release all savepoints that were defined after it.
All this is happening within the transaction block, so none of it is visible to other database sessions. When and if you commit the transaction block, the committed actions become visible as a unit to other sessions, while the rolled-back actions never become visible at all.
Remembering the bank database, suppose we debit $100.00 from Alice's account, and credit Bob's account, only to find later that we should have credited Wally's account. We could do it using savepoints like this:
BEGIN; UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
SAVEPOINT my_savepoint; UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Bob';
-- oops ... forget that and use Wally's account ROLLBACK TO my_savepoint; UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Wally';
COMMIT; This example is, of course, oversimplified, but there's a lot of control possible in a transaction block through the use of savepoints. Moreover, ROLLBACK TO is the only way to regain control of a transaction block that was put in aborted state by the system due to an error, short of rolling it back completely and starting again.
3.4.事务
事务是所有数据库系统的一个基本概念。 一次事务的要点就是它把多个步骤捆绑成了一个单一的,原子的操作。 其它并发的事务是看不到在这些步骤之间的中间状态的,并且如果其中一些操作失败, 导致该事务无法完成,那么所有这些步骤都完全不会影响数据库。
例如,假设一个银行的数据库包含各种客户帐户的余额,以及每个分行的总余额。 假设我们要记录一次从 Alice 的帐户到 Bob 的帐户的金额为 $100.00 的支付动作。那么,完成这个任务的简单到极点的 SQL 命令象下面这样
UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; UPDATE branches SET balance = balance - 100.00 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice'); UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; UPDATE branches SET balance = balance + 100.00 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob');
这些命令的细节在这儿并不重要;重要的是这里牵涉到了好几个独立的更新来完成这个相当简单的操作。 我们的银行官员会希望要么所有这些更新都生效,要么全部不起作用。 我们当然不希望一次系统崩溃就导致 Bob 收到 100 块不是 Alice 支付的钱, 也不希望 Alice 老是不花钱从 Bob 那里拿到物品。我们需要保证:如果在操作的过程中出了差错, 那么所有这些步骤都不会发生效果。把这些更新组合成一个事务就给予我们这样的保证。 事务被认为是原子的:从其它事务的角度来看,它要么是全部发生,要么完全不发生。
我们还需要保证:一旦一个事务完成并且得到数据库系统的认可, 那么它必须被真正永久地存储,并且不会在随后的崩溃中消失。 比如,如果我们记录到了一个 Bob 撤单的动作, 那么我们不希望仅仅在他走出银行大门之后的一次崩溃就会导致对他的帐户的扣减动作消失。 一个事务型数据库保证一个事务所做的所有更新在事务发出完成响应之前都记录到永久的存储中(也就是磁盘)。
事务型数据库的另外一个重要的性质和原子更新的概念关系密切: 当多个事务并发地运行的时候,那么每个事务都不应看到其它事务所做的未完成的变化。 比如,如果一个事务正忙着计算所有分行的余额总和, 那么它不应该包括来自 Alice 的分行的扣帐和来自 Bob 分行的入帐,反之亦然。 所以事务必须是黑白分明的,不仅仅体现在它们在数据库上产生的永久影响出发,而且体现在它们运转时的自身的可视性上。 一个打开的事务做的更新在它完成之前是其它事务无法看到的,而到提交的时候所有更新同时可见。
在 PostgreSQL 里,一个事务是通过把 SQL 命令用 BEGIN 和 COMMIT 命令包围实现的。 因此我们的银行事务实际上看起来象下面这样
BEGIN; UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; -- 等等 COMMIT;
如果在该事务的过程中,我们决定不做提交(可能是我们刚发现 Alice 的余额是负数), 那么我们可以发出 ROLLBACK 命令而不是 COMMIT 命令,那么到目前为止我们的所有更新都会被取消。
PostgreSQL 实际上把每个 SQL 语句当做在一个被执行的事务来看待。 如果你没有发出 BEGIN 命令,那么每个独立的语句都有一个隐含的 BEGIN 和(如果成功的话) COMMIT 语句包围在周围。 一组包围在 BEGIN 和 COMMIT 语句中间的语句有时候被称做事务块。
注意: 一些客户库自动发出 BEGIN 和 COMMIT, 因此你可能不需要特意请求就可以获取事务块的效果。查看你使用的接口的文档。
我们可以使用保存点的方法,在一个事务里更加精细地控制其中的语句。 保存点允许你有选择性地抛弃事务中的某些部分,而提交其它剩下的。 在用 SAVEPOINT 定义了一个保存点后,如果需要,你可以使用 ROLLBACK TO 回滚到该保存点。 则该事务在定义保存点到回滚到它之间的所有数据库更改都被抛弃,但是在保存点之前的修改将被保留。
在回滚到一个保存点之后,这个保存点仍然保存着其定义,所以你可以回滚到这个位置好几次。 当然,如果你确信你不需要再次回滚到一个保存点,那么你可以释放它,这样系统可以释放一些资源。 要记住:释放或者回滚到一个保存点都会自动释放在其后定义的所有保存点。
所有这些都发生在一个事务块内部,所以所有这些都不可能被其它事务会话看到。 当且仅当你提交了这个事务块,这些提交了的动作才能以一个单元的方式被其它会话看到, 而回滚的动作完全不会再被看到。
还记得我们的银行数据库吗?假设我们从 Alice 的帐户上消费 $100.00, 然后给 Bob 的帐户进行贷记加款,稍后我们发现我们应该给 Wally 的账号贷记加款。 那么我们可以像下面这样的保存点来做:
BEGIN; UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; SAVEPOINT my_savepoint; UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; -- 呀!加错钱了,应该用 Wally 的账号 ROLLBACK TO my_savepoint; UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Wally'; COMMIT;
这个例子当然是实在太简单了,但是通过使用保存点,我们可以对事务块有大量的控制。 并且,ROLLBACK TO 是除了事务全部回滚,重新来过之外的唯一可用的, 用于重新控制一个因错误而被系统置于退出状态下的事务的方法。
3.5. Window Functions
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
Here is an example that shows how to compare each employee's salary with the average salary in his or her department:
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
depname | empno | salary | avg
+-------+--------+-----------------------
develop | 11 | 5200 | 5020.0000000000000000 develop | 7 | 4200 | 5020.0000000000000000 develop | 9 | 4500 | 5020.0000000000000000 develop | 8 | 6000 | 5020.0000000000000000 develop | 10 | 5200 | 5020.0000000000000000 personnel | 5 | 3500 | 3700.0000000000000000 personnel | 2 | 3900 | 3700.0000000000000000 sales | 3 | 4800 | 4866.6666666666666667 sales | 1 | 5000 | 4866.6666666666666667 sales | 4 | 4800 | 4866.6666666666666667
(10 rows)The first three output columns come directly from the table empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname value as the current row. (This actually is the same function as the regular avg aggregate function, but the OVER clause causes it to be treated as a window function and computed across an appropriate set of rows.)
A window function call always contains an OVER clause directly following the window function's name and argument(s). This is what syntactically distinguishes it from a regular function or aggregate function. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.
You can also control the order in which rows are processed by window functions using ORDER BY within OVER. (The window ORDER BY does not even have to match the order in which the rows are output.) Here is an example:
SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;
depname | empno | salary | rank
+-------+--------+------
develop | 8 | 6000 | 1 develop | 10 | 5200 | 2 develop | 11 | 5200 | 2 develop | 9 | 4500 | 4 develop | 7 | 4200 | 5 personnel | 2 | 3900 | 1 personnel | 5 | 3500 | 2 sales | 1 | 5000 | 1 sales | 4 | 4800 | 2 sales | 3 | 4800 | 2
(10 rows)As shown here, the rank function produces a numerical rank within the current row's partition for each distinct ORDER BY value, in the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.
The rows considered by a window function are those of the "virtual table" produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways by means of different OVER clauses, but they all act on the same collection of rows defined by this virtual table.
We already saw that ORDER BY can be omitted if the ordering of rows is not important. It is also possible to omit PARTITION BY, in which case there is just one partition containing all the rows.
There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Many (but not all) window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. [1] Here is an example using sum:
SELECT salary, sum(salary) OVER () FROM empsalary; salary | sum
+-------
5200 | 47100 5000 | 47100 3500 | 47100 4800 | 47100 3900 | 47100 4200 | 47100 4500 | 47100 4800 | 47100 6000 | 47100 5200 | 47100
(10 rows)Above, since there is no ORDER BY in the OVER clause, the window frame is the same as the partition, which for lack of PARTITION BY is the whole table; in other words each sum is taken over the whole table and so we get the same result for each output row. But if we add an ORDER BY clause, we get very different results:
SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; salary | sum
+-------
3500 | 3500 3900 | 7400 4200 | 11600 4500 | 16100 4800 | 25700 4800 | 25700 5000 | 30700 5200 | 41100 5200 | 41100 6000 | 47100
(10 rows)Here the sum is taken from the first (lowest) salary up through the current one, including any duplicates of the current one (notice the results for the duplicated salaries).
Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after regular aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.
If there is a need to filter or group rows after the window calculations are performed, you can use a sub-select. For example:
SELECT depname, empno, salary, enroll_date FROM
(SELECT depname, empno, salary, enroll_date, rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos FROM empsalary ) AS ss
WHERE pos < 3;The above query only shows the rows from the inner query having rank less than 3.
When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER. For example:
SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
More details about window functions can be found in Section 4.2.8, Section 9.19, Section 7.2.4, and the SELECT reference page.
Notes [1] There are options to define the window frame in other ways, but this tutorial does not cover them. See Section 4.2.8 for details.
3.5.Window 函数
一个Window 函数执行多个与表主题相关的当前行的行集计算。是可与聚合函数相媲美的计算类型。但又不同于普通的聚合函数,一个Window 函数的使用不会导致行分组为单一的输出行-行会保留各自的特性。在幕后,Window 函数的功能是不仅仅是获得查询结果的当前行了。
下面是来说明如何比较用他或她的部门的平均工资来比较每个员工的工资:
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary; depname | empno | salary | avg -----------+-------+--------+----------------------- develop | 11 | 5200 | 5020.0000000000000000 develop | 7 | 4200 | 5020.0000000000000000 develop | 9 | 4500 | 5020.0000000000000000 develop | 8 | 6000 | 5020.0000000000000000 develop | 10 | 5200 | 5020.0000000000000000 personnel | 5 | 3500 | 3700.0000000000000000 personnel | 2 | 3900 | 3700.0000000000000000 sales | 3 | 4800 | 4866.6666666666666667 sales | 1 | 5000 | 4866.6666666666666667 sales | 4 | 4800 | 4866.6666666666666667 (10 rows)
前三个输出列直接来自表empsalary,并且对于表中的每一行都有一个输出行。第四列表示在所有的表行中已为当前具有相同depname的行值采取了平均水平。(这实际上是与正常平均聚合函数相同的功能,但它的OVER子句被视为一个window 函数,计算一组适当的行值。)
一个Window 函数调用总是在函数名以及参数之后还包含一个over字句,这就在语法上区别了常规的函数或聚合函数。over字句决定Window 函数如何对查询行进行拆分处理。OVER中的PARTITION BY 的值指定如何将行分成组或者是区,组或区中的数据在PARTITION BY 的表达式中具有相同的值。对于每一行,Window 函数将那些分在同一区内的行作为当前行来计算。
你还可以控制是那些被Window 函数用OVER里面的ORDER BY命令进行处理过的行的顺序,(Window 函数的ORDER BY甚至无需与输出行的顺序一致。)这里有个例子:
SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary; depname | empno | salary | rank -----------+-------+--------+------ develop | 8 | 6000 | 1 develop | 10 | 5200 | 2 develop | 11 | 5200 | 2 develop | 9 | 4500 | 4 develop | 7 | 4200 | 5 personnel | 2 | 3900 | 1 personnel | 5 | 3500 | 2 sales | 1 | 5000 | 1 sales | 4 | 4800 | 2 sales | 3 | 4800 | 2 (10 rows)
如上所示,RANK函数在当前行的分区内为每个不同的ORDER BY的值都生成了一个数值rank,由ORDER BY子句定义顺序。rank函数没有明确的参数,因为它的行为是完全由OVER子句决定。
那些由Window 函数生成的行都是根据查询语句的FROM子句里面的WHERE,GROUP BY,HAVING或是别的子句来过滤的。例如,一个行被删除了是因为它没有符合WHERE条件,所以没有在Window 函数中显示出来。一个查询可以包含多个Window 函数,这样就可以通过不同的OVER子句将数据进行分割,但这些数据都是在虚拟表中定义的相同的行集合中。
我们已经看到ORDER BY可以省略,如果行的顺序不是很重要的话,它也可以忽略PARTITION BY,在这种情况下只有一个分区包含所有的行。
这里还有一个重要概念与Window 函数有关联:对于每一行,在它的分区内有一个称之为Window frame的行集,许多(不是全部)Window 函数都只是对window frame的行起作用,而不是对分区内的所有行都起作用,默认情况下,如果有ORDER BY那么window frame包括所有的行,从分区开始到当前行,还有那些等同于当前行的行,根据OREDR BY字句就可进行区分。如果ORDER BY都省略了那么默认frame就包括了分区内的所有行[1]这里有一个使用sum的例子。
SELECT salary, sum(salary) OVER () FROM empsalary; salary | sum --------+------- 5200 | 47100 5000 | 47100 3500 | 47100 4800 | 47100 3900 | 47100 4200 | 47100 4500 | 47100 4800 | 47100 6000 | 47100 5200 | 47100 (10 rows)
看上面,因为在OVER 子句中没有ORDER BY,window frame就与分区一样,缺少PARTITION BY的是整个表,换句话说,每个SUM都计算了整个表,所以我们每个输出行都得到了相同的接结果。但是如果我们加个ORDER BY子句,将会得到不同的结果:
SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; salary | sum --------+------- 3500 | 3500 3900 | 7400 4200 | 11600 4500 | 16100 4800 | 25700 4800 | 25700 5000 | 30700 5200 | 41100 5200 | 41100 6000 | 47100 (10 rows)
这里的sum是计算从第一个(最低的)工资到当前工资,包括任何重复数据(注意重复工资的结果)
Window 函数只在SELECT列表和ORDER BY子句的查询中被允许,在别的地方都是被禁止的,例如在GROUP BY, HAVING 和 WHERE 子句中。这是因为他们在逻辑上都是在这些子句处理后执行,而且,Window 函数在常规聚合函数之后执行。这就意味着在一个Window 函数的参数中包含一个聚合函数是有效的,而不是刚好相反。
如果需要在Window计算执行后过滤或者对行进行分组,可以使用子选择,例如;
SELECT depname, empno, salary, enroll_date FROM (SELECT depname, empno, salary, enroll_date, rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos FROM empsalary ) AS ss WHERE pos < 3;
上面的查询只显示内查询中rank值小于3的行。
当一个查询包括多个Window 函数时,它可以为每个Window 函数写单独的OVER子句,但这是重复的,并且如果相同的窗口行为希望有不同的功能就会容易出错,相反,每个Window 的行为都可以在一个WINDOWS子句中命名,然后再OVER中引用,例如:
SELECT sum(salary) OVER w, avg(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
更多的关于Window 函数的细节可以在章节4.2.8,章节9.19,章节7.2.4.还有SELECT相关的页面部分找到
Notes
[1] 有其他的选择定义window frame的方式,但本教程中不包括它们,有关详细信息,请看章节4.2.8
3.6. Inheritance
Inheritance is a concept from object-oriented databases. It opens up interesting new possibilities of database design.
Let's create two tables: A table cities and a table capitals. Naturally, capitals are also cities, so you want some way to show the capitals implicitly when you list all cities. If you're really clever you might invent some scheme like this:
CREATE TABLE capitals (
name text, population real, altitude int, -- (in ft) state char(2)
);
CREATE TABLE non_capitals (
name text, population real, altitude int -- (in ft)
);
CREATE VIEW cities AS
SELECT name, population, altitude FROM capitals UNION SELECT name, population, altitude FROM non_capitals;This works OK as far as querying goes, but it gets ugly when you need to update several rows, for one thing.
A better solution is this:
CREATE TABLE cities (
name text, population real, altitude int -- (in ft)
);
CREATE TABLE capitals (
state char(2)
) INHERITS (cities); In this case, a row of capitals inherits all columns (name, population, and altitude) from its parent, cities. The type of the column name is text, a native PostgreSQL type for variable length character strings. State capitals have an extra column, state, that shows their state. In PostgreSQL, a table can inherit from zero or more other tables.
For example, the following query finds the names of all cities, including state capitals, that are located at an altitude over 500 feet:
SELECT name, altitude
FROM cities WHERE altitude > 500;which returns:
name | altitude
+----------
Las Vegas | 2174 Mariposa | 1953 Madison | 845
(3 rows) On the other hand, the following query finds all the cities that are not state capitals and are situated at an altitude of 500 feet or higher:
SELECT name, altitude
FROM ONLY cities WHERE altitude > 500; name | altitude
+----------
Las Vegas | 2174 Mariposa | 1953
(2 rows) Here the ONLY before cities indicates that the query should be run over only the cities table, and not tables below cities in the inheritance hierarchy. Many of the commands that we have already discussed — SELECT, UPDATE, and DELETE — support this ONLY notation.
Note: Although inheritance is frequently useful, it has not been integrated with unique constraints or foreign keys, which limits its usefulness. See Section 5.8 for more detail.
3.6.继承
继承是面向对象数据库中的一个概念。它开启了数据库设计新的有趣的可能性大门。
让我们创建两个表:一个表 cities 和一个表 capitals。自然,首府(capital)也是城市(cities), 因此在列出所有城市时你想要某种方法隐含地显示首府。 如果你已经很高明了,那么你可能会创造类似下面这样的模式:
CREATE TABLE capitals ( name text, population real, altitude int, -- (单位是英尺) state char(2) );
CREATE TABLE non_capitals ( name text, population real, altitude int -- (单位是英尺) );
CREATE VIEW cities AS SELECT name, population, altitude FROM capitals UNION SELECT name, population, altitude FROM non_capitals;
一个更好的方法是;
CREATE TABLE cities ( name text, population real, altitude int -- (in ft) );
CREATE TABLE capitals ( state char(2) ) INHERITS (cities);
在这个例子里,capitals 的一行继承所有来自它的父表cities 的所有字段(name, population,和 altitude)。 字段 name 的类型是 text, 是 PostgreSQL 用于变长字符串的固有类型。 州首府有一个额外的字段,州,显示所处的州。在 PostgreSQL 里,一个表可以从零个或者更多其它表中继承过来。
比如,下面的查询找出所有海拔超过 500 英尺的城市的名字, 包括州首府:
SELECT name, altitude FROM cities WHERE altitude > 500;
它返回:
name | altitude -----------+---------- Las Vegas | 2174 Mariposa | 1953 Madison | 845 (3 rows)
另外一方面,下面的查询找出所有不是州首府并且位于海拔大于或等于 500 英尺的城市:
SELECT name, altitude FROM ONLY cities WHERE altitude > 500; name | altitude -----------+---------- Las Vegas | 2174 Mariposa | 1953 (2 rows)
这里的 cities 前面的 ONLY 指示系统只对 cities 表运行查询,而不包括继承级别中低于 cities 的表。 许多我们已经讨论过的命令 — SELECT, UPDATE 和 DELETE — 支持这个 ONLY 表示法。
注意: 尽管继承经常是有用的,但是它还没有集成唯一约束或者外键,因此制约了其实用性。 参阅 Section 5.8 获取更多细节。
3.7. Conclusion
PostgreSQL has many features not touched upon in this tutorial introduction, which has been oriented toward newer users of SQL. These features are discussed in more detail in the remainder of this book.
If you feel you need more introductory material, please visit the PostgreSQL web site for links to more resources.
3.7.结论
PostgreSQL 有许多这份教程里没有谈到的特性, 因为这份教程主要是面向新 SQL 用户的。这些特性在本书剩余部分将有更详细的介绍。
如果你觉得自己需要更多介绍性材料,请访问 PostgreSQL网站 获取更多资源的联接。