https://wiki.postgresql.org/api.php?action=feedcontributions&user=Marco44&feedformat=atomPostgreSQL wiki - User contributions [en]2024-03-29T09:18:47ZUser contributionsMediaWiki 1.35.13https://wiki.postgresql.org/index.php?title=SSI/fr&diff=21869SSI/fr2014-02-25T14:10:54Z<p>Marco44: /* Transactions en Lecture Seule */</p>
<hr />
<div>{{Languages}}<br />
<br />
Documentation de la «Serializable Snapshot Isolation» (Isolation par Instantanés Sérialisables, ou SSI) dans PostgreSQL, comparée à la «Snapshot Isolation» (Isolation par Instantanés, ou SI). Celles-ci correspondent respectivement aux niveaux d'isolation de transaction SERIALIZABLE et REPEATABLE READ dans PostgreSQL, à partir de la version 9.1.<br />
<br />
== Aperçu ==<br />
<br />
Avec de vraies transactions sérialisables, si vous pouvez prouver que votre transaction fera ce qui est prévu si il n'y a aucune transaction concurrente, elle fera ce qui est prévu quelles que soient les autres transactions sérialisables qui s'exécuteront en même temps qu'elle, ou sera annulée pour erreur de sérialisation.<br />
<br />
Ce document montre les problèmes qui peuvent se produire avec certaines combinaisons de transactions au niveau d'isolation de transaction REPEATABLE READ, et comment elles sont évitées avec le niveau d'isolation SERIALIZABLE, à partir de PostgreSQL 9.1.<br />
<br />
Ce document est destiné au programmeur d'applications ou à l'administrateur de bases de données. Pour les détails sur l'implémentation de SSI, voyez la page de Wiki [[Serializable]]. Pour plus d'informations sur comment utiliser ce niveau d'isolation, voyez [http://docs.postgresql.fr/current/transaction-iso.html#XACT-SERIALIZABLE la documentation PostgreSQL courante].<br />
<br />
== Exemples ==<br />
<br />
Dans les environnements qui évitent de protéger leur intégrité en mettant en place des verrous bloquants, il sera fréquent que la base soit configurée (dans postgresql.conf) avec:<br />
default_transaction_isolation = 'serializable'<br />
Pour cette raison, tous les exemples ont été effectués avec ce paramétrage, ce qui a évité de polluer les exemples en se contentant d'un simple begin plutôt que de déclarer explicitement le niveau d'isolation pour chaque transaction.<br />
<br />
=== Write Skew Simple (Écriture Faussée Simple?) ===<br />
<br />
Quand deux transactions concurrentes déterminent chacune ce qu'elles écrivent en lisant des données qui se chevauchent avec des données que l'autre modifie, on peut se retrouver dans un état qui ne devrait pas apparaître si une des deux s'était exécutée avant l'autre. C'est un phénomène connu sous le nom de ''write skew'', et c'est la forme la plus simple de défaut de sérialisation contre laquelle SSI vous protège.<br />
<br />
Quand il y a write skew dans SSI, les deux transactions se déroulent jusqu'à ce que l'une valide. La première à valider gagne, et l'autre transaction est annulée. La règle du "le premier à valider gagne" garantit que du travail peut avoir lieu sur la base et que la transaction qui est annulée puisse être tentée à nouveau immédiatement.<br />
<br />
----<br />
==== Noir et Blanc ====<br />
<br />
Dans ce cas, il y a des enregistrement avec une colonne couleur contenant 'blanc' ou 'noir'. Deux utilisateurs essayent simultanément de convertir tous les enregistrements vers une couleur unique, mais chacun dans une direction opposée. Un veut tout passer tous les blancs en noir, et l'autre tous les noirs en blanc.<br />
<br />
L'exemple peut être mis en place avec ces ordres: <br />
create table points<br />
(<br />
id int not null primary key,<br />
couleur text not null<br />
);<br />
insert into points<br />
with x(id) as (select generate_series(1,10))<br />
select id, case when id % 2 = 1 then 'noir'<br />
else 'blanc' end from x;<br />
{|<br />
|+ Exemple Noir et Blanc<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
begin;<br />
update points set couleur = 'noir'<br />
where couleur = 'blanc';<br />
|-<br />
| ||<br />
begin;<br />
update points set couleur = 'blanc'<br />
where couleur = 'noir';<br />
À ce moment, une des deux transaction est condamnée à mourir.<br />
commit;<br />
Le premier à valider gagne.<br />
select * from points order by id;<br />
<br />
id | couleur<br />
----+-------<br />
1 | blanc<br />
2 | blanc<br />
3 | blanc<br />
4 | blanc<br />
5 | blanc<br />
6 | blanc<br />
7 | blanc<br />
8 | blanc<br />
9 | blanc<br />
10 | blanc<br />
(10 rows)<br />
Celle-ci s'est exécutée comme si elle était seule.<br />
|-<br />
| <br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
Une erreur de sérialisation. On annule et on réessaye.<br />
rollback;<br />
begin;<br />
update points set couleur = 'noir'<br />
where couleur = 'blanc';<br />
commit;<br />
Il n'y a pas de transaction concurrente pour gêner.<br />
select * from points order by id;<br />
<br />
id | couleur<br />
----+-------<br />
1 | noir<br />
2 | noir<br />
3 | noir<br />
4 | noir<br />
5 | noir<br />
6 | noir<br />
7 | noir<br />
8 | noir<br />
9 | noir<br />
10 | noir<br />
(10 rows)<br />
La transaction s'est exécutée seule, après l'autre.<br />
|}<br />
<br />
----<br />
==== Données en intersection ====<br />
<br />
Cet exemple est tiré de la documentation PostgreSQL. Deux transactions concurrentes lisent des données, et chacune utilise ces données pour mettre à jour l'ensemble lu par l'autre. Un exemple simple, même si un peu artificiel, de données faussées.<br />
<br />
L'exemple peut être mis en place avec ces ordres:<br />
CREATE TABLE mytab<br />
(<br />
class int NOT NULL,<br />
value int NOT NULL<br />
);<br />
INSERT INTO mytab VALUES<br />
(1, 10), (1, 20), (2, 100), (2, 200);<br />
{|<br />
|+ Exemple de données en intersection<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
BEGIN;<br />
SELECT SUM(value) FROM mytab WHERE class = 1;<br />
<br />
sum<br />
-----<br />
30<br />
(1 row)<br />
<br />
INSERT INTO mytab VALUES (2, 30);<br />
|-<br />
| ||<br />
BEGIN;<br />
SELECT SUM(value) FROM mytab WHERE class = 2;<br />
<br />
sum<br />
-----<br />
300<br />
(1 row)<br />
<br />
INSERT INTO mytab VALUES (1, 300);<br />
Chaque transaction a modifié ce que l'autre transaction aurait lu. Si les deux étaient autorisées à valider, le comportement sérialisable ne serait plus respecté, parce que si elles avaient été exécutées une seule à la fois, une des transactions aurait vu l'INSERT que l'autre a validé. Nous attendons qu'une des transactions ait validé avant d'annuler quoi que ce soit, toutefois, pour garantir que des traitements soient effectués et éviter que le système ne s'effondre.<br />
COMMIT;<br />
|-<br />
|<br />
COMMIT;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
Donc, maintenant nous annulons la transaction en échec et nous la réessayons depuis le début.<br />
ROLLBACK;<br />
BEGIN;<br />
SELECT SUM(value) FROM mytab WHERE class = 1;<br />
<br />
sum<br />
-----<br />
330<br />
(1 row)<br />
<br />
INSERT INTO mytab VALUES (2, 330);<br />
COMMIT;<br />
Cela réussit, et le résultat est cohérent avec une exécution sérialisée des transactions.<br />
SELECT * FROM mytab;<br />
<br />
class | value<br />
-------+-------<br />
1 | 10<br />
1 | 20<br />
2 | 100<br />
2 | 200<br />
1 | 300<br />
2 | 330<br />
(6 rows)<br />
|}<br />
<br />
----<br />
==== Protection contre le Découvert ====<br />
<br />
Le cas hypothétique est celui d'une banque qui autorise ses clients à retirer de l'argent jusqu'au total de tout ce qu'ils ont sur tous leurs comptes. La banque transfèrera ensuite automatiquement les fonds au besoin pour terminer la journée avec un solde positif sur chaque compte. À l'intérieur d'une seule transaction, on vérifie que la somme de tous les comptes dépasse la somme requise.<br />
<br />
Quelqu'un essaye d'être malin et de piéger la banque en soumettant deux retraits de 900$ sur deux comptes ayant chacun 500$ de solde simultanément. Au niveau d'isolation de transaction REPEATABLE READ, cela pourrait marcher; mais si le niveau d'isolation de transaction SERIALIZABLE est utilisé, SSI détectera une "structure dangereuse" dans le schéma de lecture/écriture et rejettera une des deux transactions.<br />
<br />
Cet exemple peut être mis en place avec ces ordres:<br />
<br />
create table compte<br />
(<br />
nom text not null,<br />
type text not null,<br />
solde money not null default '0.00'::money,<br />
primary key (nom, type)<br />
);<br />
insert into compte values<br />
('kevin','epargne', 500),<br />
('kevin','courant', 500);<br />
<br />
{|<br />
|+ Exemple de Protection contre le Découvert<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
begin;<br />
select type, solde from compte<br />
where nom = 'kevin';<br />
<br />
type | solde<br />
-----------+---------<br />
epargne | $500.00<br />
courant | $500.00<br />
(2 rows)<br />
Le total est de $1000, un retrait de $900 est donc permis.<br />
|-<br />
| ||<br />
begin;<br />
select type, solde from compte<br />
where nom = 'kevin';<br />
<br />
type | solde<br />
-----------+---------<br />
epargne | $500.00<br />
courant | $500.00<br />
(2 rows)<br />
Le total est de $1000, un retrait de $900 est donc permis.<br />
|-<br />
| <br />
update compte<br />
set solde = solde - 900::money<br />
where nom = 'kevin' and type = 'epargne';<br />
Jusqu'ici tout va bien.<br />
|-<br />
| ||<br />
update compte<br />
set solde = solde - 900::money<br />
where nom = 'kevin' and type = 'courant';<br />
Maintenant nous avons un problème. Cela ne peut co-exister avec l'activité de l'autre transaction. Nous n'annulons pas encore, parce que la transaction échouerait avec les mêmes conflits si on la réessayait. Le premier à valider va gagner, et l'autre échouera quand elle essayera de continuer après cela.<br />
|-<br />
| <br />
commit;<br />
Celle ci a validé la première. Son travail est enregistré.<br />
|-<br />
| ||<br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
Cette transaction n'a pas réussi à retirer l'argent.<br />
Maintenant nous l'annulons et réessayons la transaction.<br />
<br />
rollback;<br />
begin;<br />
select type, solde from compte<br />
where nom = 'kevin';<br />
<br />
type | solde<br />
-----------+----------<br />
epargne | -$400.00<br />
courant | $500.00<br />
(2 rows)<br />
On voit qu'il y a un solde net de $100. Cette demande de $900 sera rejetée par l'application.<br />
|}<br />
<br />
=== Trois Transactions ou Plus ===<br />
<br />
Des anomalies de sérialisation peuvent résulter de motifs plus complexes d'accès, impliquant trois transactions ou plus.<br />
<br />
----<br />
==== Couleurs Primaires ====<br />
<br />
C'est similaire à l'exemple "Blanc et Noir" précédent, à la différence que nous utilisons les trois couleurs primaires. Une transaction essaye de passer le rouge à jaune, la suivante le jaune au bleu, et la troisième le bleu au rouge. Si ces transactions étaient exécutées une seule à la fois, on aurait à la fin de l'exécution deux des trois couleurs, en fonction de l'ordre d'exécution. Si deux d'entre elles sont exécutées simultanément, celle essayant de lire les enregistrements mis à jour par l'autre semblera s'exécuter première, puisqu'elle ne verra pas le travail de l'autre transaction, il n'y a donc pas de problème dans ce cas. Que l'autre transaction soit exécutée avant ou après cela, les résultats sont cohérents avec un ordre d'exécution sérialisé.<br />
<br />
Si les trois s'exécutent en même temps, il y a un cycle dans l'ordre apparent d'exécution. Une transaction Repeatable Read ne détecterait pas cela, et la table aurait toujours trois couleurs. Une transaction Sérialisable détectera le problème et annulera une des transactions avec une erreur de sérialisation.<br />
<br />
L'exemple peut être mis en place avec ces ordres:<br />
create table points<br />
(<br />
id int not null primary key,<br />
couleur text not null<br />
);<br />
insert into points<br />
with x(id) as (select generate_series(1,9000))<br />
select id, case when id % 3 = 1 then 'rouge'<br />
when id % 3 = 2 then 'jaune'<br />
else 'blue' end from x;<br />
create index points_couleur on points (couleur);<br />
analyze points;<br />
{|<br />
|+ Primary Colors Example<br />
! session 1<br />
! session 2<br />
! session 3<br />
|-<br />
|<br />
begin;<br />
update points set couleur = 'jaune'<br />
where couleur = 'rouge';<br />
|-<br />
| ||<br />
begin;<br />
update points set couleur = 'blue'<br />
where couleur = 'jaune';<br />
|-<br />
| || ||<br />
begin;<br />
update points set couleur = 'rouge'<br />
where couleur = 'blue';<br />
À ce point, au moins une des trois transactions est condamnée. Pour garantir que les traitement progressent, on attend qu'une valide. Le commit va réussir, ce qui non seulement garantit que les traitements progressent, mais qu'une tentative de reprendre une transaction échouée n'échouera pas ''sur la même combinaison de transactions''.<br />
|-<br />
|<br />
commit;<br />
Le premier commit gagne. La session 2 doit échouer à ce point, parce que durant le commit il a été déterminé qu'elle a les plus grandes chances de réussir si réessayée immédiatement.<br />
select couleur, count(*) from points<br />
group by couleur<br />
order by couleur;<br />
<br />
couleur | count<br />
----------+-------<br />
blue | 3000<br />
jaune | 6000<br />
(2 rows)<br />
Cela semble avoir été exécuté avant les autres mises à jour.<br />
|-<br />
| || ||<br />
commit;<br />
Cela fonctionne si on l'essaye à ce moment. Si la session 2 effectue davantage de travail avant, cette transaction pourrait aussi devoir être annulée et réessayée.<br />
select couleur, count(*) from points<br />
group by couleur<br />
order by couleur;<br />
<br />
couleur | count<br />
----------+-------<br />
rouge | 3000<br />
jaune | 6000<br />
(2 rows)<br />
Elle semble s'être exécutée après la transaction de la session 1.<br />
|-<br />
| ||<br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
Une erreur de sérialisation. Nous annulons et réessayons.<br />
rollback;<br />
begin;<br />
update points set couleur = 'blue'<br />
where couleur = 'jaune';<br />
commit;<br />
Une nouvelle tentative réussira.<br />
select couleur, count(*) from points<br />
group by couleur<br />
order by couleur;<br />
<br />
couleur | count<br />
---------+-------<br />
blue | 6000<br />
rouge | 3000<br />
(2 rows)<br />
Elle semble s'être exécutée en dernier, ce qu'elle a d'ailleurs fait.<br />
|}<br />
Un point intéressant est que si la session 2 avait tenté de valider après la session 1 et avant la session 3, elle aurait tout de même échoué, et une re-tentative aurait aussi réussi, mais le comportement de la transaction de la session 3 n'est pas déterministe. Elle pourrait avoir réussi, ou avoir reçu une erreur de sérialisation et avoir nécessité d'être rejouée.<br />
<br />
C'est parce que le verrouillage de prédicat utilisé par le mécanisme de détection de conflit s'appuie sur les pages et enregistrement effectivement accédés, et il y a un facteur aléatoire utilisé lors de l'insertion des entrées d'index qui ont des clés égales, afin de réduire la contention; donc même avec des séquences d'évènements identiques il est toujours possible de voir des différences sur où les erreurs de sérialisation se produisent. C'est pour cela qu'il est important, quand on s'appuie sur les transactions sérialisables pour gérer la concurrence, d'avoir un système généralisé permettant d'identifier les erreurs de sérialisation et de rejouer les transactions depuis leur début.<br />
<br />
Il convient aussi de noter que si la session 2 avait validé la seconde tentative de transaction avant que la session 3 ait validé sa transaction, toute requête ultérieure qui aurait vu des enregistrements mis à jour de jaune à bleu (et validés) aurait, de façon déterministe, fait échouer la transaction de la session 3, parce que ces enregistrements ne seraient pas des enregistrements que la session 3 verraient comme bleu et mettraient à jour à rouge. Pour que la transaction 3 réussisse, elle doit pouvoir être considérée comme ayant été exécutée avant la transaction validée de la session 2. Par conséquent, exposer un état dans lequel le travail de la transaction de la session 2 est visible, mais pas le travail de la transaction de la session 3 signifie que la transaction de la session 3 doit échouer. L'acte d' ''observer'' un état récemment modifié de la base peut entraîner des erreurs de sérialisation. Cela sera exploré plus avant dans d'autres exemples.<br />
<br />
=== Mettre en place des règles métier dans des triggers ===<br />
<br />
Si toutes les transactions sont sérialisables, des règles métier peuvent être vérifiées par des triggers sans les problèmes associés avec les autres niveaux d'isolation de transactions. Quand une contrainte déclarative fonctionne, elle sera en règle générale plus rapide, plus simple à implémenter et à maintenir, et moins sujette à bug - les triggers ne devront donc être utilisés comme suit que quand une contrainte déclarative ne fonctionnera pas.<br />
<br />
----<br />
==== Contraintes similaires à de l'unicité ====<br />
<br />
Imaginons que vous vouliez quelque chose de similaire à une contrainte unique, mais en un peu plus compliqué. Pour cet exemple, nous voulons l'unicité des six premiers caractères de la colonne texte.<br />
<br />
Cet exemple peut être mis en place avec les ordres suivants:<br />
create table t (id int not null, val text not null);<br />
with x (n) as (select generate_series(1,10000))<br />
insert into t select x.n, md5(x.n::text) from x;<br />
alter table t add primary key(id);<br />
create index t_val on t (val);<br />
vacuum analyze t;<br />
create function t_func()<br />
returns trigger<br />
language plpgsql as $$<br />
declare<br />
st text;<br />
begin<br />
st := substring(new.val from 1 for 6);<br />
if tg_op = 'UPDATE' and substring(old.val from 1 for 6) = st then<br />
return new;<br />
end if;<br />
if exists (select * from t where val between st and st || 'z') then<br />
raise exception 't.val pas unique sur les six premiers caractères: "%"', st;<br />
end if;<br />
return new;<br />
end;<br />
$$;<br />
create trigger t_trig<br />
before insert or update on t<br />
for each row execute procedure t_func();<br />
<br />
Pour vérifier que le trigger fait bien respecter la règle métier quand il n'y a pas de problème de concurrence, sur une connexion unique:<br />
<br />
insert into t values (-1, 'this old dog');<br />
insert into t values (-2, 'this old cat');<br />
<br />
ERROR: t.val pas unique sur les six premiers caractères: "this o"<br />
<br />
Essayons maintenant avec deux sessions concurrentes.<br />
<br />
{|<br />
|+ Exemple de contrainte similaire à de l'unicité<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
begin;<br />
insert into t values (-3, 'the river flows');<br />
|-<br />
| ||<br />
begin;<br />
insert into t values (-4, 'the right stuff');<br />
Cela fonctionne pour le moment, parce que le travail de l'autre transaction n'est pas visible de cette transaction, mais les deux transactions ne peuvent pas valider sans violer la règle métier.<br />
commit;<br />
Le premier à valider gagne. La transaction est garantie.<br />
|-<br />
| <br />
Un commit ici échouerait, ainsi que n'importe quel autre ordre qu'on tenterait d'exécuter dans cette transaction condamnée.<br />
select * from t where id < 0;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Canceled on identification as a pivot,<br />
during conflict out checking.<br />
HINT: The transaction might succeed if retried.<br />
<br />
Comme il s'agit d'une erreur de sérialisation, la transaction devrait être réessayée.<br />
<br />
rollback;<br />
begin;<br />
insert into t values (-3, 'the river flows');<br />
<br />
Lors de la nouvelle tentative, nous recevons une erreur plus utile à l'utilisateur.<br />
<br />
ERROR: t.val pas unique sur les six premiers caractères: "the ri"<br />
|}<br />
<br />
----<br />
==== Contraintes similaires à des clés étrangères ====<br />
<br />
Quelquefois deux tables doivent avoir un lien très similaire à une relation de clé étrangère, mais il y a des critères supplémentaires qui rendrait la clé étrangère insuffisante à traiter la vérification d'intégrité nécessaire. Dans cet exemple un table project contient une référence à la clé d'une table person dans sa propre colonne project_manager, mais une personne ''quelconque'' ne suffira pas; la personne spécifiée doit être un gestionnaire de projet.<br />
<br />
On peut mettre en place cet exemple avec les ordres suivants:<br />
create table person<br />
(<br />
person_id int not null primary key,<br />
person_name text not null,<br />
is_project_manager boolean not null<br />
);<br />
create table project<br />
(<br />
project_id int not null primary key,<br />
project_name text not null,<br />
project_manager int not null<br />
);<br />
create index project_manager<br />
on project (project_manager);<br />
<br />
create function person_func()<br />
returns trigger<br />
language plpgsql as $$<br />
begin<br />
if tg_op = 'DELETE' and old.is_project_manager then<br />
if exists (select * from project<br />
where project_manager = old.person_id) then<br />
raise exception<br />
'une personne ne peut être supprimée tant qu''elle est responsable d''un projet';<br />
end if;<br />
end if;<br />
if tg_op = 'UPDATE' then<br />
if new.person_id is distinct from old.person_id then<br />
raise exception 'il est interdit de modifier person_id';<br />
end if;<br />
if old.is_project_manager and not new.is_project_manager then<br />
if exists (select * from project<br />
where project_manager = old.person_id) then<br />
raise exception<br />
'une personne doit rester gestionnaire de projet tant qu''elle est responsable d''un projet';<br />
end if;<br />
end if;<br />
end if;<br />
if tg_op = 'DELETE' then<br />
return old;<br />
else<br />
return new;<br />
end if;<br />
end;<br />
$$;<br />
create trigger person_trig<br />
before update or delete on person<br />
for each row execute procedure person_func();<br />
<br />
create function project_func()<br />
returns trigger<br />
language plpgsql as $$<br />
begin<br />
if tg_op = 'INSERT'<br />
or (tg_op = 'UPDATE' and new.project_manager <> old.project_manager) then<br />
if not exists (select * from person<br />
where person_id = new.project_manager<br />
and is_project_manager) then<br />
raise exception<br />
'project_manager doit être défini en tant que gestionnaire de projet dans la table person';<br />
end if;<br />
end if;<br />
return new;<br />
end;<br />
$$;<br />
create trigger project_trig<br />
before insert or update on project<br />
for each row execute procedure project_func();<br />
<br />
insert into person values (1, 'Kevin Grittner', true);<br />
insert into person values (2, 'Peter Parker', true);<br />
insert into project values (101, 'parallel processing', 1);<br />
{|<br />
|+ Exemple de contrainte similaire à une contrainte de clé étrangère<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
Une personne est mise à jour pour ne plus être un gestionnaire de projet.<br />
begin;<br />
update person<br />
set is_project_manager = false<br />
where person_id = 2;<br />
|-<br />
| ||<br />
En même temps, un projet est mis à jour afin que de rendre cette personne responsable de ce projet.<br />
begin;<br />
update project<br />
set project_manager = 2<br />
where project_id = 101;<br />
Il n'est pas possible de valider les deux. Le premier à valider gagne.<br />
commit;<br />
L'affectation de la personne au projet valide d'abord, ce qui entraîne que l'autre transaction doit maintenant échouer. Si l'autre transaction s'était exécuté à un autre niveau d'isolation, les deux transactions auraient validé, entraînant une violation des règles métier.<br />
|-<br />
| <br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
A serialization failure. We roll back and try again.<br />
rollback;<br />
begin;<br />
update person<br />
set is_project_manager = false<br />
where person_id = 2;<br />
<br />
ERROR: une personne doit rester gestionnaire de <br />
projet tant qu'elle est responsable d'un projet<br />
Lors de la seconde tentative, nous récupérons un message intelligible.<br />
|}<br />
<br />
<br />
=== Transactions en Lecture Seule ===<br />
<br />
Bien qu'une transaction en lecture seule ne puisse contribuer à une anomalie qui persiste dans la base, dans le mode Repeatable Read, elle peut "voir" un état qui n'est pas cohérent avec l'exécution sérialisée (une à la fois) des transactions. Une transaction Serializable implémentée avec SSI ne verra jamais ces anomalies transitoires.<br />
<br />
----<br />
<br />
==== Rapport de Dépôt ====<br />
<br />
Une classe générale de problèmes invoquant des transactions en lecture seule est le traitement par lots, où une table contrôle quel lot (batch) est actuellement la cible des insertions. Un lot est fermé en mettant à jour la table de contrôle, point à partir duquel le lot est considéré comme "verrouillé" contre tout changement ultérieur, et le traitement de ce lot se produit.<br />
<br />
Ce genre de problématique peut être trouvé de façon concrète dans le traitement de reçus. Des reçus peuvent être ajoutés à un lot identifié par une date de dépôt, ou (si plus d'un dépôt par jour est possible) un numéro de lot de reçu abstrait. Un un point durant la journée, alors que la banque est toujours ouverte, le lot est fermé, un rapport de l'argent reçu est imprimé, et l'argent est emmené à la banque pour y être déposé.<br />
<br />
L'exemple peut être mis en place avec ces ordres:<br />
create table control<br />
(<br />
deposit_no int not null<br />
);<br />
insert into control values (1);<br />
create table receipt<br />
(<br />
receipt_no serial primary key,<br />
deposit_no int not null,<br />
payee text not null,<br />
amount money not null<br />
);<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values ((select deposit_no from control), 'Crosby', '100');<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values ((select deposit_no from control), 'Stills', '200');<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values ((select deposit_no from control), 'Nash', '300');<br />
{|<br />
|+ Exemple de Rapport de Dépôt<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
Au comptoir de réception, un autre reçu est ajouté au lot courant.<br />
begin; -- T1<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values<br />
(<br />
(select deposit_no from control),<br />
'Young', '100'<br />
);<br />
Cette transaction peut voir son propre insert, mais il n'est pas visible pour les autres transactions jusqu'à sa validation.<br />
select * from receipt;<br />
<br />
receipt_no | deposit_no | payee | amount <br />
------------+------------+--------+---------<br />
1 | 1 | Crosby | $100.00<br />
2 | 1 | Stills | $200.00<br />
3 | 1 | Nash | $300.00<br />
4 | 1 | Young | $100.00<br />
(4 rows)<br />
|-<br />
| ||<br />
À peu près au même moment, un superviseur clique sur un bouton pour fermer le lot de reçus.<br />
begin; -- T2<br />
select deposit_no from control;<br />
<br />
deposit_no <br />
------------<br />
1<br />
(1 row)<br />
L'application note le lot de reçus qui est sur le point d'être fermé, incrémente le numéro de lot, et l'enregistre dans la table de contrôle.<br />
update control set deposit_no = 2;<br />
commit;<br />
T1, la transaction qui insère le dernier reçu du dernier lot, n'a pas encore validé, bien que le lot ait été fermé. Si T1 valide avant que quelqu'un ne regarde le contenu du lot, tout va bien. Pour le moment nous n'avons aucun problème; le reçu "a l'air" d'avoir été ajouté avant que le lot ait été fermé. Nous avons un comportement qui est cohérent avec une exécution "une par une" des transactions: T1 -> T2.<br />
<br />
Pour le besoin de la démonstration, nous allons déclencher le rapport de dépôt avant que le dernier reçu ne soit validé.<br />
begin; -- T3<br />
select * from receipt where deposit_no = 1;<br />
<br />
receipt_no | deposit_no | payee | amount <br />
------------+------------+--------+---------<br />
1 | 1 | Crosby | $100.00<br />
2 | 1 | Stills | $200.00<br />
3 | 1 | Nash | $300.00<br />
(3 rows)<br />
Maintenant nous avons un problème. T3 a été démarré en sachant que T2 a été validée, donc T3 doit être considérée comme ayant exécutée avant T2. (cela aurait pu aussi être vrai si T3 avait été lancé indépendamment et avait lu la table de contrôle, voyant le nouveau deposit_no.) Mais T3 ne peut pas voir le travail de T1, donc T1 a l'air d'avoir été exécuté après T3. Nous avons donc une boucle T1 -> T2 -> T3 -> T1. Et cela poserait problème en termes pratiques; le lot est censé être fermé et immuable, mais une modification apparaîtra sur le tard -- peut être après le voyage à la banque.<br />
<br />
Au niveau d'isolation REPEATABLE READ cela se déroulerait sans message d'erreur, sans que l'anomalie ne soit détectée. Au niveau d'isolation SERIALIZABLE une des transactions serait annulée pour préserver l'intégrité du système. Puisqu'une annulation de T3 entraînerait à nouveau la même erreur si T1 était encore active, PostgreSQL va annuler T1, pour qu'une nouvelle tentative ayant lieu immédiatement puisse réussir.<br />
|-<br />
| <br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
OK, let's retry.<br />
rollback;<br />
begin; -- T1 retry<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values<br />
(<br />
(select deposit_no from control),<br />
'Young', '100'<br />
);<br />
<br />
À quoi ressemble la table reçu maintenant?<br />
<br />
select * from receipt;<br />
<br />
receipt_no | deposit_no | payee | amount <br />
------------+------------+--------+---------<br />
1 | 1 | Crosby | $100.00<br />
2 | 1 | Stills | $200.00<br />
3 | 1 | Nash | $300.00<br />
5 | 2 | Young | $100.00<br />
(4 rows)<br />
<br />
Le reçu est maintenant dans le nouveau lot, rendant le rapport de dépôt de T3 correct!<br />
<br />
commit;<br />
<br />
Plus de problème maintenant.<br />
|-<br />
| ||<br />
commit;<br />
Cela n'aurait posé aucun problème à n'importe quel moment après le SELECT de T3.<br />
|}<br />
<br />
[[Category:Français]]</div>Marco44https://wiki.postgresql.org/index.php?title=PGDay_FOSDEM_2013&diff=18813PGDay FOSDEM 20132013-01-07T12:45:02Z<p>Marco44: </p>
<hr />
<div>= PGDay FOSDEM 2013 =<br />
<br />
This is the first PgDay we hold in Belgium. FOSDEM PGDay 2013 will be held on Feb 1st in Brussels, Belgium, at the Radisson Blu Royal hotel. As an extension to the regular PostgreSQL devroom at FOSDEM, it will cover topics for PostgreSQL users, developers and contributors, and anybody else interested in PostgreSQL<br />
<br />
== Details ==<br />
<br />
* '''Date:''' Feb 01st, 2013 9am-5pm<br />
* '''Venue:''': Radisson Blu Royal Hotel<br />
* '''Coordinator:''': PostgreSQL Europe [mailto:contact@pgconf.eu contact@pgconf.eu]<br />
* '''Website:''': http://fosdem2013.pgconf.eu/<br />
<br />
== Registration ==<br />
<br />
Free attendance, web registration required: http://fosdem2013.pgconf.eu/registration/ (limited seats)<br />
<br />
== Schedule ==<br />
<br />
Schedule is be published at: http://fosdem2013.pgconf.eu/schedule/<br />
<br />
== Location and Venue ==<br />
<br />
http://fosdem2013.pgconf.eu/venue/<br />
<br />
Address: <br />
<br />
http://www.radissonblu.com/royalhotel-brussels/location<br />
<br />
<br />
== Dinner ==<br />
<br />
We are organizing a dinner after the event on Friday 1st, 2013 at Hard Rock Cafe Brussels. We have limited (30) number of seats, so please add your name to this list before going there.<br />
<br />
If you are bringing someone to the event, make sure you enter your name *twice* (or more) on the list, so the attendee count matches!<br />
<br />
Attendees:<br />
<br />
# Devrim Gündüz<br />
# Devrim Gündüz +1<br />
# Magnus Hagander<br />
# Andreas Scherbaum<br />
# Andreas Scherbaum +1<br />
# Jean-Paul Argudo<br />
# Patryk Kordylewski<br />
# Patryk Kordylewski +1<br />
# Dimitri Fontaine<br />
# Julien Rouhaud<br />
# Dave Page<br />
# Marc Cousin<br />
<br />
[[Category:PostgreSQL Events]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18207What's new in PostgreSQL 9.22012-09-10T07:59:13Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
If the data has been read only since the last VACUUM then the data is All Visible and the index only scan feature can improve performance.<br />
<br />
Here is an example.<br />
<br />
CREATE TABLE demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains won't be that big in real life.<br />
<br />
INSERT INTO demo_ios SELECT generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
SELECT pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.01 AND 0.02<br />
<br />
In order to use an index only scan on this query, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the table, so that the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
SET enable_indexonlyscan to off;<br />
<br />
EXPLAIN (analyze,buffers) select col1,col2 FROM demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to keep in mind:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations, especially when data is changed between VACUUMs<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication becomes more polished with this release. <br />
<br />
One of the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources. Moreover, in case of a failover, it could be complicated to reconnect all the remaining slaves to the newly promoted master, if one is not using a tool like repmgr. <br />
<br />
With 9.2, a standby can also send replication changes, allowing cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the cluster:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf), as both clusters are running on the same machine:<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the third cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
As you may have noticed from the example, pg_basebackup now works from slaves.<br />
<br />
There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
Remember to rename the last segment to remove the .partial suffix before using it with a PITR restore or any other operation.<br />
<br />
The synchronous_commit parameter has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, and the performance improvement will be large, some people will be interested in this compromise.<br />
<br />
==JSON datatype==<br />
<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# SELECT '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# SELECT '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: SELECT '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: SELECT '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#SELECT * FROM demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# SELECT row_to_json(demo) FROM demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them.<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# SELECT * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range USING gist (period);<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The extension btree_gist is required to create a GiST index on room_id (it's an integer, it is usually indexed with a btree index).<br />
<br />
=# INSERT INTO reservation VALUES (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation VALUES (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation VALUES (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation VALUES (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption<!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and fewer locks of a table's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents<!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* Text-to-anytype concatenation and quote_literal/quote_nullable functions are not volatile any more, enabling better optimization in some cases <!-- Marti Raudsepp --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without too much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate informations about which database is doing most I/O:<br />
<br />
<pre><br />
=# SELECT * FROM pg_stat_database WHERE datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# EXPLAIN (analyze,buffers) SELECT count(*) FROM mots ;<br />
QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1<br />
FROM<br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT *<br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN<br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#SELECT * FROM words WHERE word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# SELECT * FROM words WHERE word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | SELECT * FROM words WHERE word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That makes it is easier to interpret, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# EXPLAIN (analyze on,timing off) SELECT * FROM reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# EXPLAIN ANALYZE SELECT * FROM test WHERE a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# SELECT *, pg_tablespace_location(oid) AS spclocation FROM pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running if stat is "active") query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The limitations are also the same: Since you can only DROP one index with the CONCURRENTLY option, and the CASCADE option is not supported.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# CREATE TABLE test (a int); <br />
CREATE TABLE<br />
=# INSERT INTO test SELECT generate_series(1,100);<br />
INSERT 0 100<br />
=# ALTER TABLE test ADD CHECK (a>100) NOT VALID;<br />
ALTER TABLE<br />
=# INSERT INTO test VALUES (99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# INSERT INTO test VALUES (101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# ALTER TABLE test VALIDATE CONSTRAINT test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# ALTER TABLE test RENAME CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
== NO INHERIT constraints ==<br />
<br />
Here is another improvement about constraints: they can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y>=x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>=x, or to numeric without specifier<br />
* varbit(x) to varbit(y) when y>=x, or to varbit without specifier<br />
* timestamp(x) to timestamp(y) when y>=x or timestamp without specifier<br />
* timestamptz(x) to timestamptz(y) when y>=x or timestamptz without specifier<br />
* interval(x) to interval(y) when y>=x or interval without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
This new feature has to do with views security. First, let's explain the problem, with a very simplified example:<br />
<br />
=# CREATE TABLE all_data (company_id int, company_data varchar);<br />
CREATE TABLE<br />
# INSERT INTO all_data VALUES (1,'secret_data_for_company_1');<br />
INSERT 0 1<br />
=# INSERT INTO all_data VALUES (2,'secret_data_for_company_2');<br />
INSERT 0 1<br />
=# CREATE VIEW company1_data AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
<br />
This is a quite classical way of giving access to only a part of a table to a user: we'll create a user for company_id 1, grant to him the right to access company1_data, and deny him the right to access all_data.<br />
<br />
The plan to this query is the following:<br />
<br />
=# explain SELECT * FROM company1_data ;<br />
QUERY PLAN <br />
----------------------------------------------------------<br />
Seq Scan on all_data (cost=0.00..25.38 rows=6 width=36)<br />
Filter: (company_id = 1)<br />
<br />
Even if there was more data, a sequential scan could still be forced: just "SET enable_indexscan to OFF" and the likes.<br />
<br />
So this query reads all the records from all_data, filters them, and returns to the user only the matching rows. There is a way to display scanned records before they are filtered: just create a function with a very low cost, and call it while doing the query:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
This function just has to cost less than the = operator, which costs 1, to be executed first.<br />
<br />
The result is this:<br />
<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
We got access to the record from the second company (in the NOTICE messages).<br />
<br />
So this is the first new feature: the view can be declared as implementing "security barriers":<br />
<br />
<br />
=# CREATE VIEW company1_data WITH (security_barrier) AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
The view is not leaking anymore. The problem, of course, is that there is a performance impact: maybe the "peek" function could have made the query faster, by filtering lots of rows early in the plan.<br />
<br />
This leads to the complementary feature: some function may be declared as "LEAKPROOF", meaning that they won't leak the data they are passed into error or notice messages.<br />
<br />
Declaring our peek function as LEAKPROOF is a very bad idea, but let's do it just to demonstrate how it's used:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LEAKPROOF LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
A LEAKPROOF function is executed «normally»:<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
Of course, in our case, peek isn't LEAKPROOF and shouldn't be declared as such. Only superuser have the permission to declare a LEAKPROOF function.<br />
<br />
== New options for pg_dump ==<br />
<br />
Until now, one could ask pg_dump to dump a table's data, or a table's meta-data (DDL statements for creating the table's structure, indexes, constraints). Some meta-data is better restored before the data (the table's structure, check constraints), some is better after the data (indexes, unique constraints, foreign keys…), for performance reasons mostly.<br />
<br />
So there are now a few more options:<br />
<br />
* --section=pre-data: dump what's needed before restoring the data. Of course, this can be combined with a -t for instance, to specify only one table<br />
* --section=post-data : dump what's needed after restoring the data.<br />
* --section=data: dump the data<br />
* --exclude-table-data: dump everything, except THIS table's data. It means pg_dump will still dump other tables' data.<br />
<br />
<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18106What's new in PostgreSQL 9.22012-08-28T14:25:48Z<p>Marco44: /* Reduce ALTER TABLE rewrites */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
If the data has been read only since the last VACUUM then the data is All Visible and the index only scan feature can improve performance.<br />
<br />
Here is an example.<br />
<br />
CREATE TABLE demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains won't be that big in real life.<br />
<br />
INSERT INTO demo_ios SELECT generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
SELECT pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.01 AND 0.02<br />
<br />
In order to use an index only scan on this query, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
SET enable_indexonlyscan to off;<br />
<br />
EXPLAIN (analyze,buffers) select col1,col2 FROM demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to keep in mind:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations, especially when data is changed between VACUUMs<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication becomes more polished with this release. <br />
<br />
One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources. Moreover, in case of a failover, it could be complicated to reconnect all the remaining slaves to the newly promoted master, if not using a tool like repmgr. <br />
<br />
* With 9.2, a standby can also send replication changes, allowing cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
Remember to rename the last segment to remove the .partial suffix before using it with PITR or other.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
==JSON datatype==<br />
<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# SELECT '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# SELECT '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: SELECT '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: SELECT '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#SELECT * FROM demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# SELECT row_to_json(demo) FROM demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# SELECT * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range USING gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The extension btree_gist is required to create a GiST index on room_id (it's an integer, it is usually indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation VALUES (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation VALUES (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation VALUES (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation VALUES (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption<!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents<!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* Text-to-anytype concatenation and quote_literal/quote_nullable functions are not volatile any more, enabling better optimization in some cases <!-- Marti Raudsepp --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate informations about which database is doing most I/O:<br />
<br />
<pre><br />
=# SELECT * FROM pg_stat_database WHERE datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# EXPLAIN (analyze,buffers) SELECT count(*) FROM mots ;<br />
QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1<br />
FROM<br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT *<br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN<br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#SELECT * FROM words WHERE word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# SELECT * FROM words WHERE word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | SELECT * FROM words WHERE word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# EXPLAIN (analyze on,timing off) SELECT * FROM reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# EXPLAIN ANALYZE SELECT * FROM test WHERE a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# SELECT *, pg_tablespace_location(oid) AS spclocation FROM pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running if stat is "active") query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The limitations are also the same: Since you can only DROP one index with the CONCURRENTLY option, and the CASCADE option is not supported.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# CREATE TABLE test (a int); <br />
CREATE TABLE<br />
=# INSERT INTO test SELECT generate_series(1,100);<br />
INSERT 0 100<br />
=# ALTER TABLE test ADD CHECK (a>100) NOT VALID;<br />
ALTER TABLE<br />
=# INSERT INTO test VALUES (99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# INSERT INTO test VALUES (101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# ALTER TABLE test VALIDATE CONSTRAINT test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# ALTER TABLE test RENAME CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
== NO INHERIT constraints ==<br />
<br />
Here is another improvement about constraints: they can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y>=x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>=x, or to numeric without specifier<br />
* varbit(x) to varbit(y) when y>=x, or to varbit without specifier<br />
* timestamp(x) to timestamp(y) when y>=x or timestamp without specifier<br />
* timestamptz(x) to timestamptz(y) when y>=x or timestamptz without specifier<br />
* interval(x) to interval(y) when y>=x or interval without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
This new feature has to do with views security. First, let's explain the problem, with a very simplified example:<br />
<br />
=# CREATE TABLE all_data (company_id int, company_data varchar);<br />
CREATE TABLE<br />
# INSERT INTO all_data VALUES (1,'secret_data_for_company_1');<br />
INSERT 0 1<br />
=# INSERT INTO all_data VALUES (2,'secret_data_for_company_2');<br />
INSERT 0 1<br />
=# CREATE VIEW company1_data AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
<br />
This is a quite classical way of giving access to only a part of a table to a user: we'll create a user for company_id 1, grant to him the right to access company1_data, and deny him the right to access all_data.<br />
<br />
The plan to this query is the following:<br />
<br />
=# explain SELECT * FROM company1_data ;<br />
QUERY PLAN <br />
----------------------------------------------------------<br />
Seq Scan on all_data (cost=0.00..25.38 rows=6 width=36)<br />
Filter: (company_id = 1)<br />
<br />
Even if there was more data, a sequential scan could still be forced: just "SET enable_indexscan to OFF" and the likes.<br />
<br />
So this query reads all the records from all_data, filters them, and returns to the user only the matching rows. There is a way to display scanned records before they are filtered: just create a function with a very low cost, and call it while doing the query:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
This function just has to cost less than the = operator, which costs 1, to be executed first.<br />
<br />
The result is this:<br />
<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
We got access to the record from the second company (in the NOTICE messages).<br />
<br />
So this is the first new feature: the view can be declared as implementing "security barriers":<br />
<br />
<br />
=# CREATE VIEW company1_data WITH (security_barrier) AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
The view's not leaking anymore. The problem, of course, is that there is a performance impact: maybe the "peek" function could have made the query faster, by filtering lots of rows early in the plan.<br />
<br />
This leads to the complementary feature: some function may be declared as "LEAKPROOF", meaning that they won't leak the data they are passed into error or notice messages.<br />
<br />
Declaring our peek function as LEAKPROOF is a very bad idea, but let's do it just to demonstrate how it's used:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LEAKPROOF LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
A LEAKPROOF function is executed «normally»:<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
Of course, in our case, peek isn't LEAKPROOF and shouldn't be declared as such. Only superuser have the permission to declare a LEAKPROOF function.<br />
<br />
== New options for pg_dump ==<br />
<br />
Until now, one could ask pg_dump to dump a table's data, or a table's meta-data (DDL statements for creating the table's structure, indexes, constraints). Some meta-data is better restored before the data (the table's structure, check constraints), some is better after the data (indexes, unique constraints, foreign keys…), for performance reasons mostly.<br />
<br />
So there are now a few more options:<br />
<br />
* --section=pre-data: dump what's needed before restoring the data. Of course, this can be combined with a -t for instance, to specify one table<br />
* --section=post-data : dump what's needed after restoring the data.<br />
* --section=data: dump the data<br />
* --exclude-table-data: dump everything, except THIS table's data. It means pg_dump will still dump other tables' data.<br />
<br />
<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18105What's new in PostgreSQL 9.22012-08-28T12:23:02Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
If the data has been read only since the last VACUUM then the data is All Visible and the index only scan feature can improve performance.<br />
<br />
Here is an example.<br />
<br />
CREATE TABLE demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains won't be that big in real life.<br />
<br />
INSERT INTO demo_ios SELECT generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
SELECT pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.01 AND 0.02<br />
<br />
In order to use an index only scan on this query, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
SET enable_indexonlyscan to off;<br />
<br />
EXPLAIN (analyze,buffers) select col1,col2 FROM demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to keep in mind:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations, especially when data is changed between VACUUMs<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication becomes more polished with this release. <br />
<br />
One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources. Moreover, in case of a failover, it could be complicated to reconnect all the remaining slaves to the newly promoted master, if not using a tool like repmgr. <br />
<br />
* With 9.2, a standby can also send replication changes, allowing cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
Remember to rename the last segment to remove the .partial suffix before using it with PITR or other.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
==JSON datatype==<br />
<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# SELECT '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# SELECT '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: SELECT '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: SELECT '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#SELECT * FROM demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# SELECT row_to_json(demo) FROM demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# SELECT * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range USING gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The extension btree_gist is required to create a GiST index on room_id (it's an integer, it is usually indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation VALUES (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation VALUES (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation VALUES (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation VALUES (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption<!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents<!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* Text-to-anytype concatenation and quote_literal/quote_nullable functions are not volatile any more, enabling better optimization in some cases <!-- Marti Raudsepp --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate informations about which database is doing most I/O:<br />
<br />
<pre><br />
=# SELECT * FROM pg_stat_database WHERE datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# EXPLAIN (analyze,buffers) SELECT count(*) FROM mots ;<br />
QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1<br />
FROM<br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT *<br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN<br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#SELECT * FROM words WHERE word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# SELECT * FROM words WHERE word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | SELECT * FROM words WHERE word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# EXPLAIN (analyze on,timing off) SELECT * FROM reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# EXPLAIN ANALYZE SELECT * FROM test WHERE a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# SELECT *, pg_tablespace_location(oid) AS spclocation FROM pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running if stat is "active") query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The limitations are also the same: Since you can only DROP one index with the CONCURRENTLY option, and the CASCADE option is not supported.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# CREATE TABLE test (a int); <br />
CREATE TABLE<br />
=# INSERT INTO test SELECT generate_series(1,100);<br />
INSERT 0 100<br />
=# ALTER TABLE test ADD CHECK (a>100) NOT VALID;<br />
ALTER TABLE<br />
=# INSERT INTO test VALUES (99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# INSERT INTO test VALUES (101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# ALTER TABLE test VALIDATE CONSTRAINT test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# ALTER TABLE test RENAME CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
== NO INHERIT constraints ==<br />
<br />
Here is another improvement about constraints: they can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y > x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>x, or to numeric without specifier<br />
* varbit(x) to varbit(y) when y>x, or to varbit without specifier<br />
* timestamp(x) to timestamp(y) when y>x or timestamp without specifier<br />
* timestamptz(x) to timestamptz(y) when y>x or timestamptz without specifier<br />
* interval(x) to interval(y) when y>x or interval without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
This new feature has to do with views security. First, let's explain the problem, with a very simplified example:<br />
<br />
=# CREATE TABLE all_data (company_id int, company_data varchar);<br />
CREATE TABLE<br />
# INSERT INTO all_data VALUES (1,'secret_data_for_company_1');<br />
INSERT 0 1<br />
=# INSERT INTO all_data VALUES (2,'secret_data_for_company_2');<br />
INSERT 0 1<br />
=# CREATE VIEW company1_data AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
<br />
This is a quite classical way of giving access to only a part of a table to a user: we'll create a user for company_id 1, grant to him the right to access company1_data, and deny him the right to access all_data.<br />
<br />
The plan to this query is the following:<br />
<br />
=# explain SELECT * FROM company1_data ;<br />
QUERY PLAN <br />
----------------------------------------------------------<br />
Seq Scan on all_data (cost=0.00..25.38 rows=6 width=36)<br />
Filter: (company_id = 1)<br />
<br />
Even if there was more data, a sequential scan could still be forced: just "SET enable_indexscan to OFF" and the likes.<br />
<br />
So this query reads all the records from all_data, filters them, and returns to the user only the matching rows. There is a way to display scanned records before they are filtered: just create a function with a very low cost, and call it while doing the query:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
This function just has to cost less than the = operator, which costs 1, to be executed first.<br />
<br />
The result is this:<br />
<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
We got access to the record from the second company (in the NOTICE messages).<br />
<br />
So this is the first new feature: the view can be declared as implementing "security barriers":<br />
<br />
<br />
=# CREATE VIEW company1_data WITH (security_barrier) AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
The view's not leaking anymore. The problem, of course, is that there is a performance impact: maybe the "peek" function could have made the query faster, by filtering lots of rows early in the plan.<br />
<br />
This leads to the complementary feature: some function may be declared as "LEAKPROOF", meaning that they won't leak the data they are passed into error or notice messages.<br />
<br />
Declaring our peek function as LEAKPROOF is a very bad idea, but let's do it just to demonstrate how it's used:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LEAKPROOF LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
A LEAKPROOF function is executed «normally»:<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
Of course, in our case, peek isn't LEAKPROOF and shouldn't be declared as such. Only superuser have the permission to declare a LEAKPROOF function.<br />
<br />
== New options for pg_dump ==<br />
<br />
Until now, one could ask pg_dump to dump a table's data, or a table's meta-data (DDL statements for creating the table's structure, indexes, constraints). Some meta-data is better restored before the data (the table's structure, check constraints), some is better after the data (indexes, unique constraints, foreign keys…), for performance reasons mostly.<br />
<br />
So there are now a few more options:<br />
<br />
* --section=pre-data: dump what's needed before restoring the data. Of course, this can be combined with a -t for instance, to specify one table<br />
* --section=post-data : dump what's needed after restoring the data.<br />
* --section=data: dump the data<br />
* --exclude-table-data: dump everything, except THIS table's data. It means pg_dump will still dump other tables' data.<br />
<br />
<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18101What's new in PostgreSQL 9.22012-08-28T07:39:34Z<p>Marco44: spelling correction</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
CREATE TABLE demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains won't be that big in real life.<br />
<br />
INSERT INTO demo_ios SELECT generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
SELECT pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.01 AND 0.02<br />
<br />
In order to use an index only scan on this query, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
SET enable_indexonlyscan to off;<br />
<br />
EXPLAIN (analyze,buffers) select col1,col2 FROM demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to keep in mind:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
Remember to rename the last segment to remove the .partial suffix before using it with PITR or other.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# SELECT '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# SELECT '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: SELECT '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: SELECT '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#SELECT * FROM demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# SELECT row_to_json(demo) FROM demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# SELECT * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range USING gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The extension btree_gist is required to create a GiST index on room_id (it's an integer, it is usually indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation VALUES (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation VALUES (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation VALUES (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation VALUES (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption<!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents<!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* Text-to-anytype concatenation and quote_literal/quote_nullable functions are not volatile any more, enabling better optimization in some cases <!-- Marti Raudsepp --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate informations about which database is doing most I/O:<br />
<br />
<pre><br />
=# SELECT * FROM pg_stat_database WHERE datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# EXPLAIN (analyze,buffers) SELECT count(*) FROM mots ;<br />
QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1<br />
FROM<br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT *<br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN<br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#SELECT * FROM words WHERE word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# SELECT * FROM words WHERE word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | SELECT * FROM words WHERE word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# EXPLAIN (analyze on,timing off) SELECT * FROM reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# EXPLAIN ANALYZE SELECT * FROM test WHERE a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# SELECT *, pg_tablespace_location(oid) AS spclocation FROM pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch FROM '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running if stat is "active") query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# CREATE TABLE test (a int); <br />
CREATE TABLE<br />
=# INSERT INTO test SELECT generate_series(1,100);<br />
INSERT 0 100<br />
=# ALTER TABLE test ADD CHECK (a>100) NOT VALID;<br />
ALTER TABLE<br />
=# INSERT INTO test VALUES (99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# INSERT INTO test VALUES (101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# ALTER TABLE test VALIDATE CONSTRAINT test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# ALTER TABLE test RENAME CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
Last, but not least, constraints can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y > x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>x, or to numeric without specifier<br />
* varbit(x) to varbit(y) when y>x, or to varbit without specifier<br />
* timestamp(x) to timestamp(y) when y>x or timestamp without specifier<br />
* timestamptz(x) to timestamptz(y) when y>x or timestamptz without specifier<br />
* interval(x) to interval(y) when y>x or interval without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
This new feature has to do with views security. First, let's explain the problem, with a very simplified example:<br />
<br />
=# CREATE TABLE all_data (company_id int, company_data varchar);<br />
CREATE TABLE<br />
# INSERT INTO all_data VALUES (1,'secret_data_for_company_1');<br />
INSERT 0 1<br />
=# INSERT INTO all_data VALUES (2,'secret_data_for_company_2');<br />
INSERT 0 1<br />
=# CREATE VIEW company1_data AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
<br />
This is a quite classical way of giving access to only a part of a table to a user: we'll create a user for company_id 1, grant to him the right to access company1_data, and deny him the right to access all_data.<br />
<br />
The plan to this query is the following:<br />
<br />
=# explain SELECT * FROM company1_data ;<br />
QUERY PLAN <br />
----------------------------------------------------------<br />
Seq Scan on all_data (cost=0.00..25.38 rows=6 width=36)<br />
Filter: (company_id = 1)<br />
<br />
Even if there was more data, a sequential scan could still be forced: just "SET enable_indexscan to OFF" and the likes.<br />
<br />
So this query reads all the records from all_data, filters them, and returns to the user only the matching rows. There is a way to display scanned records before they are filtered: just create a function with a very low cost, and call it while doing the query:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
This function just has to cost less than the = operator, which costs 1, to be executed first.<br />
<br />
The result is this:<br />
<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
We got access to the record from the second company (in the NOTICE messages).<br />
<br />
So this is the first new feature: the view can be declared as implementing "security barriers":<br />
<br />
<br />
=# CREATE VIEW company1_data WITH (security_barrier) AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
The view's not leaking anymore. The problem, of course, is that there is a performance impact: maybe the "peek" function could have made the query faster, by filtering lots of rows early in the plan.<br />
<br />
This leads to the complementary feature: some function may be declared as "LEAKPROOF", meaning that they won't leak the data they are passed into error or notice messages.<br />
<br />
Declaring our peek function as LEAKPROOF is a very bad idea, but let's do it just to demonstrate how it's used:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LEAKPROOF LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
A LEAKPROOF function is executed «normally»:<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
Of course, in our case, peek isn't LEAKPROOF and shouldn't be declared as such. Only superuser have the permission to declare a LEAKPROOF function.<br />
<br />
== New options for pg_dump ==<br />
<br />
Until now, one could ask pg_dump to dump a table's data, or a table's meta-data (DDL statements for creating the table's structure, indexes, constraints). Some meta-data is better restored before the data (the table's structure, check constraints), some is better after the data (indexes, unique constraints, foreign keys…), for performance reasons mostly.<br />
<br />
So there are now a few more options:<br />
<br />
* --section=pre-data: dump what's needed before restoring the data. Of course, this can be combined with a -t for instance, to specify one table<br />
* --section=post-data : dump what's needed after restoring the data.<br />
* --section=data: dump the data<br />
* --exclude-table-data: dump everything, except THIS table's data. It means pg_dump will still dump other tables' data.<br />
<br />
<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18074What's new in PostgreSQL 9.22012-08-23T13:43:40Z<p>Marco44: /* New options for pg_dump */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# create table test (a int); <br />
CREATE TABLE<br />
=# insert into test select generate_series(1,100);<br />
INSERT 0 100<br />
=# alter table test add check (a>100) not valid;<br />
ALTER TABLE<br />
=# insert into test values(99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# insert into test values(101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# alter table test validate constraint test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# alter table test rename CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
Last, but not least, constraints can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y > x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>x, or to numeric without specifier<br />
* varbit(x) to varbit(y) when y>x, or to varbit without specifier<br />
* timestamp(x) to timestamp(y) when y>x or timestamp without specifier<br />
* timestamptz(x) to timestamptz(y) when y>x or timestamptz without specifier<br />
* interval(x) to interval(y) when y>x or interval without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
This new feature has to do with views security. First, let's explain the problem, with a very simplified example:<br />
<br />
=# CREATE TABLE all_data (company_id int, company_data varchar);<br />
CREATE TABLE<br />
# INSERT INTO all_data VALUES (1,'secret_data_for_company_1');<br />
INSERT 0 1<br />
=# INSERT INTO all_data VALUES (2,'secret_data_for_company_2');<br />
INSERT 0 1<br />
=# CREATE VIEW company1_data AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
<br />
This is a quite classical way of giving access to only a part of a table to a user: we'll create a user for company_id 1, grant to him the right to access company1_data, and deny him the right to access all_data.<br />
<br />
The plan to this query is the following:<br />
<br />
=# explain SELECT * FROM company1_data ;<br />
QUERY PLAN <br />
----------------------------------------------------------<br />
Seq Scan on all_data (cost=0.00..25.38 rows=6 width=36)<br />
Filter: (company_id = 1)<br />
<br />
Even if there was more data, a sequential scan could still be forced: just "SET enable_indexscan to OFF" and the likes.<br />
<br />
So this query reads all the records from all_data, filters them, and returns to the user only the matching rows. There is a way to display scanned records before they are filtered: just create a function with a very low cost, and call it while doing the query:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
This function just has to cost less than the = operator, which costs 1, to be executed first.<br />
<br />
The result is this:<br />
<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
We got access to the record from the second company (in the NOTICE messages).<br />
<br />
So this is the first new feature: the view can be declared as implementing "security barriers":<br />
<br />
<br />
=# CREATE VIEW company1_data WITH (security_barrier) AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
The view's not leaking anymore. The problem, of course, is that there is a performance impact: maybe the "peek" function could have made the query faster, by filtering lots of rows early in the plan.<br />
<br />
This leads to the complementary feature: some function may be declared as "LEAKPROOF", meaning that they won't leak the data they are passed into error or notice messages.<br />
<br />
Declaring our peek function as LEAKPROOF is a very bad idea, but let's do it just to demonstrate how it's used:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LEAKPROOF LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
A LEAKPROOF function is executed «normally»:<br />
<br />
=# select * from company1_data where peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
Of course, in our case, peek isn't LEAKPROOF and shouldn't be declared as such. Only superuser have the permission to declare a LEAKPROOF function.<br />
<br />
== New options for pg_dump ==<br />
<br />
Until now, one could ask pg_dump to dump a table's data, or a table's meta-data (DDL statements for creating the table's structure, indexes, constraints). Some meta-data is better restored before the data (the table's structure, check constraints), some is better after the data (indexes, unique constraints, foreign keys…), for performance reasons mostly.<br />
<br />
So there are now a few more options:<br />
<br />
* --section=pre-data: dump what's needed before restoring the data. Of course, this can be combined with a -t for instance, to specify one table<br />
* --section=post-data : dump what's needed after restoring the data.<br />
* --section=data: dump the data<br />
* --exclude-table-data: dump everything, except THIS table's data. It means pg_dump will still dump other tables' data.<br />
<br />
<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18073What's new in PostgreSQL 9.22012-08-23T13:37:19Z<p>Marco44: /* Back-References in Regular Expressions */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# create table test (a int); <br />
CREATE TABLE<br />
=# insert into test select generate_series(1,100);<br />
INSERT 0 100<br />
=# alter table test add check (a>100) not valid;<br />
ALTER TABLE<br />
=# insert into test values(99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# insert into test values(101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# alter table test validate constraint test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# alter table test rename CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
Last, but not least, constraints can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y > x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>x, or to numeric without specifier<br />
* varbit(x) to varbit(y) when y>x, or to varbit without specifier<br />
* timestamp(x) to timestamp(y) when y>x or timestamp without specifier<br />
* timestamptz(x) to timestamptz(y) when y>x or timestamptz without specifier<br />
* interval(x) to interval(y) when y>x or interval without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
This new feature has to do with views security. First, let's explain the problem, with a very simplified example:<br />
<br />
=# CREATE TABLE all_data (company_id int, company_data varchar);<br />
CREATE TABLE<br />
# INSERT INTO all_data VALUES (1,'secret_data_for_company_1');<br />
INSERT 0 1<br />
=# INSERT INTO all_data VALUES (2,'secret_data_for_company_2');<br />
INSERT 0 1<br />
=# CREATE VIEW company1_data AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
<br />
This is a quite classical way of giving access to only a part of a table to a user: we'll create a user for company_id 1, grant to him the right to access company1_data, and deny him the right to access all_data.<br />
<br />
The plan to this query is the following:<br />
<br />
=# explain SELECT * FROM company1_data ;<br />
QUERY PLAN <br />
----------------------------------------------------------<br />
Seq Scan on all_data (cost=0.00..25.38 rows=6 width=36)<br />
Filter: (company_id = 1)<br />
<br />
Even if there was more data, a sequential scan could still be forced: just "SET enable_indexscan to OFF" and the likes.<br />
<br />
So this query reads all the records from all_data, filters them, and returns to the user only the matching rows. There is a way to display scanned records before they are filtered: just create a function with a very low cost, and call it while doing the query:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
This function just has to cost less than the = operator, which costs 1, to be executed first.<br />
<br />
The result is this:<br />
<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
We got access to the record from the second company (in the NOTICE messages).<br />
<br />
So this is the first new feature: the view can be declared as implementing "security barriers":<br />
<br />
<br />
=# CREATE VIEW company1_data WITH (security_barrier) AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
The view's not leaking anymore. The problem, of course, is that there is a performance impact: maybe the "peek" function could have made the query faster, by filtering lots of rows early in the plan.<br />
<br />
This leads to the complementary feature: some function may be declared as "LEAKPROOF", meaning that they won't leak the data they are passed into error or notice messages.<br />
<br />
Declaring our peek function as LEAKPROOF is a very bad idea, but let's do it just to demonstrate how it's used:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LEAKPROOF LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
A LEAKPROOF function is executed «normally»:<br />
<br />
=# select * from company1_data where peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
Of course, in our case, peek isn't LEAKPROOF and shouldn't be declared as such. Only superuser have the permission to declare a LEAKPROOF function.<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18072What's new in PostgreSQL 9.22012-08-23T13:26:37Z<p>Marco44: /* Security barriers and Leakproof */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# create table test (a int); <br />
CREATE TABLE<br />
=# insert into test select generate_series(1,100);<br />
INSERT 0 100<br />
=# alter table test add check (a>100) not valid;<br />
ALTER TABLE<br />
=# insert into test values(99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# insert into test values(101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# alter table test validate constraint test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# alter table test rename CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
Last, but not least, constraints can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y > x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>x, or to numeric without specifier<br />
* varbit(x) to varbit(y) when y>x, or to varbit without specifier<br />
* timestamp(x) to timestamp(y) when y>x or timestamp without specifier<br />
* timestamptz(x) to timestamptz(y) when y>x or timestamptz without specifier<br />
* interval(x) to interval(y) when y>x or interval without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
This new feature has to do with views security. First, let's explain the problem, with a very simplified example:<br />
<br />
=# CREATE TABLE all_data (company_id int, company_data varchar);<br />
CREATE TABLE<br />
# INSERT INTO all_data VALUES (1,'secret_data_for_company_1');<br />
INSERT 0 1<br />
=# INSERT INTO all_data VALUES (2,'secret_data_for_company_2');<br />
INSERT 0 1<br />
=# CREATE VIEW company1_data AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
<br />
This is a quite classical way of giving access to only a part of a table to a user: we'll create a user for company_id 1, grant to him the right to access company1_data, and deny him the right to access all_data.<br />
<br />
The plan to this query is the following:<br />
<br />
=# explain SELECT * FROM company1_data ;<br />
QUERY PLAN <br />
----------------------------------------------------------<br />
Seq Scan on all_data (cost=0.00..25.38 rows=6 width=36)<br />
Filter: (company_id = 1)<br />
<br />
Even if there was more data, a sequential scan could still be forced: just "SET enable_indexscan to OFF" and the likes.<br />
<br />
So this query reads all the records from all_data, filters them, and returns to the user only the matching rows. There is a way to display scanned records before they are filtered: just create a function with a very low cost, and call it while doing the query:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
This function just has to cost less than the = operator, which costs 1, to be executed first.<br />
<br />
The result is this:<br />
<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
We got access to the record from the second company (in the NOTICE messages).<br />
<br />
So this is the first new feature: the view can be declared as implementing "security barriers":<br />
<br />
<br />
=# CREATE VIEW company1_data WITH (security_barrier) AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
The view's not leaking anymore. The problem, of course, is that there is a performance impact: maybe the "peek" function could have made the query faster, by filtering lots of rows early in the plan.<br />
<br />
This leads to the complementary feature: some function may be declared as "LEAKPROOF", meaning that they won't leak the data they are passed into error or notice messages.<br />
<br />
Declaring our peek function as LEAKPROOF is a very bad idea, but let's do it just to demonstrate how it's used:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LEAKPROOF LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
A LEAKPROOF function is executed «normally»:<br />
<br />
=# select * from company1_data where peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
Of course, in our case, peek isn't LEAKPROOF and shouldn't be declared as such. Only superuser have the permission to declare a LEAKPROOF function.<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18071What's new in PostgreSQL 9.22012-08-23T13:18:49Z<p>Marco44: /* Security barriers and Leakproof */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# create table test (a int); <br />
CREATE TABLE<br />
=# insert into test select generate_series(1,100);<br />
INSERT 0 100<br />
=# alter table test add check (a>100) not valid;<br />
ALTER TABLE<br />
=# insert into test values(99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# insert into test values(101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# alter table test validate constraint test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# alter table test rename CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
Last, but not least, constraints can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y > x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>x, or to numeric without specifier<br />
* varbit(x) to varbit(y) when y>x, or to varbit without specifier<br />
* timestamp(x) to timestamp(y) when y>x or timestamp without specifier<br />
* timestamptz(x) to timestamptz(y) when y>x or timestamptz without specifier<br />
* interval(x) to interval(y) when y>x or interval without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
This new feature has to do with views security. First, let's explain the problem, with a very simplified example:<br />
<br />
=# CREATE TABLE all_data (company_id int, company_data varchar);<br />
CREATE TABLE<br />
# INSERT INTO all_data VALUES (1,'secret_data_for_company_1');<br />
INSERT 0 1<br />
=# INSERT INTO all_data VALUES (2,'secret_data_for_company_2');<br />
INSERT 0 1<br />
=# CREATE VIEW company1_data AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
<br />
This is a quite classical way of giving access to only a part of a table to a user: we'll create a user for company_id 1, grant to him the right to access company1_data, and deny him the right to access all_data.<br />
<br />
The plan to this query is the following:<br />
<br />
=# explain SELECT * FROM company1_data ;<br />
QUERY PLAN <br />
----------------------------------------------------------<br />
Seq Scan on all_data (cost=0.00..25.38 rows=6 width=36)<br />
Filter: (company_id = 1)<br />
<br />
Even if there was more data, a sequential scan could still be forced: just "SET enable_indexscan to OFF" and the likes.<br />
<br />
So this query reads all the records from all_data, filters them, and returns to the user only the matching rows. There is a way to display scanned records before they are filtered: just create a function with a very low cost, and call it while doing the query:<br />
<br />
CREATE OR REPLACE FUNCTION peek(text) RETURNS boolean LANGUAGE plpgsql AS<br />
$$<br />
BEGIN<br />
RAISE NOTICE '%',$1;<br />
RETURN true;<br />
END<br />
$$<br />
COST 0.1;<br />
<br />
This function just has to cost less than the = operator, which costs 1, to be executed first.<br />
<br />
The result is this:<br />
<br />
<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
NOTICE: secret_data_for_company_2<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
We got access to the record from the second company (in the NOTICE messages).<br />
<br />
So this is the first new feature: the view can be declared as implementing "security barriers":<br />
<br />
<br />
=# CREATE VIEW company1_data WITH (security_barrier) AS SELECT * FROM all_data WHERE company_id = 1;<br />
CREATE VIEW<br />
=# SELECT * FROM company1_data WHERE peek(company1_data.company_data);<br />
NOTICE: secret_data_for_company_1<br />
company_id | company_data <br />
------------+---------------------------<br />
1 | secret_data_for_company_1<br />
(1 row)<br />
<br />
It's not leaking anymore.<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18070What's new in PostgreSQL 9.22012-08-23T12:52:55Z<p>Marco44: /* Reduce ALTER TABLE rewrites */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# create table test (a int); <br />
CREATE TABLE<br />
=# insert into test select generate_series(1,100);<br />
INSERT 0 100<br />
=# alter table test add check (a>100) not valid;<br />
ALTER TABLE<br />
=# insert into test values(99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# insert into test values(101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# alter table test validate constraint test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# alter table test rename CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
Last, but not least, constraints can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y > x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>x, or to numeric without specifier<br />
* varbit(x) to varbit(y) when y>x, or to varbit without specifier<br />
* timestamp(x) to timestamp(y) when y>x or timestamp without specifier<br />
* timestamptz(x) to timestamptz(y) when y>x or timestamptz without specifier<br />
* interval(x) to interval(y) when y>x or interval without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18069What's new in PostgreSQL 9.22012-08-23T12:49:16Z<p>Marco44: /* Reduce ALTER TABLE rewrites */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# create table test (a int); <br />
CREATE TABLE<br />
=# insert into test select generate_series(1,100);<br />
INSERT 0 100<br />
=# alter table test add check (a>100) not valid;<br />
ALTER TABLE<br />
=# insert into test values(99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# insert into test values(101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# alter table test validate constraint test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# alter table test rename CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
Last, but not least, constraints can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
A table won't get rewritten anymore during an ALTER TABLE when changing the type of a column in the following cases:<br />
<br />
* varchar(x) to varchar(y) when y > x. It works too if going from varchar(x) to varchar or text (no size limitation)<br />
* numeric(x,z) to numeric(y,z) when y>x, or to numeric without specifier.<br />
* varbit(x) to varbit(y) when y>x, or to varbit without specifier<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18068What's new in PostgreSQL 9.22012-08-23T11:31:25Z<p>Marco44: /* NOT VALID CHECK constraints */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# create table test (a int); <br />
CREATE TABLE<br />
=# insert into test select generate_series(1,100);<br />
INSERT 0 100<br />
=# alter table test add check (a>100) not valid;<br />
ALTER TABLE<br />
=# insert into test values(99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# insert into test values(101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# alter table test validate constraint test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# alter table test rename CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
Last, but not least, constraints can be declared as not inheritable, which will be useful in partitioned environments. Let's take PostgreSQL documentation example, and see how it improves the situation:<br />
<br />
CREATE TABLE measurement (<br />
city_id int not null,<br />
logdate date not null,<br />
peaktemp int,<br />
unitsales int,<br />
CHECK (logdate IS NULL) NO INHERIT<br />
);<br />
<br />
CREATE TABLE measurement_y2006m02 (<br />
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )<br />
) INHERITS (measurement);<br />
CREATE TABLE measurement_y2006m03 (<br />
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )<br />
) INHERITS (measurement);<br />
<br />
<br />
INSERT INTO measurement VALUES (1,'2006-02-20',1,1);<br />
ERROR: new row for relation "measurement" violates check constraint "measurement_logdate_check"<br />
DETAIL: Failing row contains (1, 2006-02-20, 1, 1).<br />
INSERT INTO measurement_y2006m02 VALUES (1,'2006-02-20',1,1);<br />
INSERT 0 1<br />
<br />
Until now, every check constraint created on measurement would have been inherited by children tables. So adding a constraint forbidding inserts, or allowing only some of them, on the parent table was impossible.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18067What's new in PostgreSQL 9.22012-08-23T11:16:00Z<p>Marco44: /* NOT VALID CHECK constraints */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
PostgreSQL 9.1 introduced «NOT VALID» foreign keys. This has been extended to CHECK constraints. Adding a «NOT VALID» constraint on a table means that current data won't be validated, only new and updated rows.<br />
<br />
=# create table test (a int); <br />
CREATE TABLE<br />
=# insert into test select generate_series(1,100);<br />
INSERT 0 100<br />
=# alter table test add check (a>100) not valid;<br />
ALTER TABLE<br />
=# insert into test values(99);<br />
ERROR: new row for relation "test" violates check constraint "test_a_check"<br />
DETAIL: Failing row contains (99).<br />
=# insert into test values(101);<br />
INSERT 0 1<br />
<br />
Then, later, we can validate the whole table:<br />
<br />
=# alter table test validate constraint test_a_check ;<br />
ERROR: check constraint "test_a_check" is violated by some row<br />
<br />
Domains, which are types with added constraints, can also be declared as not valid, and validated later.<br />
<br />
Check constraints can also be renamed now:<br />
<br />
=# alter table test rename CONSTRAINT test_a_check TO validate_a;<br />
ALTER TABLE<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18066What's new in PostgreSQL 9.22012-08-23T09:24:07Z<p>Marco44: /* Other new features */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
The regular DROP INDEX command takes an exclusive lock on the table. Most of the time, this isn't a problem, because this lock is short-lived. The problem usually occurs when:<br />
<br />
* A long-running transaction is running, and has a (shared) lock on the table<br />
* A DROP INDEX is run on this table in another session, asking for an exclusive lock (and waiting for it, as it won't be granted until the long-running transaction ends)<br />
<br />
At this point, all other transactions needing to take a shared lock on the table (for a simple SELECT for instance) will have to wait too: their lock acquisition is queued after the DROP INDEX's one.<br />
<br />
<br />
DROP INDEX CONCURRENTLY works around this and won't lock normal DML statements, just as CREATE INDEX CONCURRENTLY. The main limitation is the same: DROP INDEX CONCURRENTLY can't be run in a transaction. Moreover, you can only DROP one index with CONCURRENTLY, and CASCADE isn't supported either.<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18065What's new in PostgreSQL 9.22012-08-23T09:13:45Z<p>Marco44: /* Explain improvements */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
=# explain (analyze on,timing off) select * from reservation ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------<br />
Seq Scan on reservation (cost=0.00..22.30 rows=1230 width=36) (actual rows=2 loops=1)<br />
Total runtime: 0.045 ms<br />
<br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
This new feature makes it much easier to know how many rows are removed by a filter (and spot potential places to put indexes):<br />
<br />
=# explain analyze select * from test where a ~ 'tra';<br />
QUERY PLAN <br />
---------------------------------------------------------------------------------------------------------------<br />
Seq Scan on test (cost=0.00..106876.56 rows=2002 width=11) (actual time=2.914..8538.285 rows=120256 loops=1)<br />
Filter: (a ~ 'tra'::text)<br />
Rows Removed by Filter: 5905600<br />
Total runtime: 8549.539 ms<br />
(4 rows)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18064What's new in PostgreSQL 9.22012-08-23T09:03:06Z<p>Marco44: /* Performance improvements */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18063What's new in PostgreSQL 9.22012-08-23T08:54:37Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18062What's new in PostgreSQL 9.22012-08-23T08:51:22Z<p>Marco44: /* Explain improvements */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
* Have EXPLAIN ANALYZE report the number of rows rejected by filter steps <!--(Marko Tiikkaja)--><br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18061What's new in PostgreSQL 9.22012-08-23T08:45:02Z<p>Marco44: /* Range Types */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) <br />
conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
<br />
One can also declare new range types.<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18060What's new in PostgreSQL 9.22012-08-23T08:43:48Z<p>Marco44: /* Range Types */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
Range types are used to store a range of data of a given type. There are a few pre-defined types. They are integer (int4range), bigint (int8range), numeric (numrange), timestamp without timezone (tsrange), timestamp with timezone (tstzrange), and date (daterange).<br />
<br />
Ranges can be made of continuous (numeric, timestamp...) or discrete (integer, date...) data types. They can be open (the bound isn't part of the range) or closed (the bound is part of the range). A bound can also be infinite.<br />
<br />
Without these datatypes, most people solve the range problems by using two columns in a table. These range types are much more powerful, as you can use many operators on them:<br />
<br />
Here is the intersection between then 1000(open)-2000(closed) and 1000(closed)-1200(closed) numeric range:<br />
<br />
SELECT '(1000,2000]'::numrange * '[1000,1200]'::numrange;<br />
?column? <br />
-------------<br />
(1000,1200]<br />
(1 row)<br />
<br />
So you can query on things like: «give me all ranges that intersect this»:<br />
<br />
=# select * from test_range ;<br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-01-02 12:00:00+01"]<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(3 rows)<br />
<br />
<br />
=# SELECT * FROM test_range WHERE period && '[2012-01-03 00:00:00,2012-01-03 12:00:00]'; <br />
period <br />
-----------------------------------------------------<br />
["2012-01-01 00:00:00+01","2012-03-01 00:00:00+01"]<br />
["2008-01-01 00:00:00+01","2015-01-01 00:00:00+01"]<br />
(2 rows)<br />
<br />
This query could use an index defined like this:<br />
<br />
=# CREATE INDEX idx_test_range on test_range using gist (period);<br />
<br />
<br />
You can also use these range data types to define exclusion constraints:<br />
<br />
CREATE EXTENSION btree_gist ;<br />
CREATE TABLE reservation (room_id int, period tstzrange);<br />
ALTER TABLE reservation ADD EXCLUDE USING GIST (room_id WITH =, period WITH &&);<br />
<br />
This means that now it is forbidden to have two records in this table where room_id is equal and period overlaps. The create extension btree_gist is required to create a GiST index on room_id (it's an integer, it usually is indexed with a btree index).<br />
<br />
<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (2,'(2012-08-23 14:00:00,2012-08-23 15:00:00)');<br />
INSERT 0 1<br />
=# INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
STATEMENT: INSERT INTO reservation values (1,'(2012-08-23 14:45:00,2012-08-23 15:15:00)');<br />
ERROR: conflicting key value violates exclusion constraint "reservation_room_id_period_excl"<br />
DETAIL: Key (room_id, period)=(1, ("2012-08-23 14:45:00+02","2012-08-23 15:15:00+02")) conflicts with existing key (room_id, period)=(1, ("2012-08-23 14:00:00+02","2012-08-23 15:00:00+02")).<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18059What's new in PostgreSQL 9.22012-08-23T07:49:32Z<p>Marco44: /* JSON datatype */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON:<br />
<br />
=# select '{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
json <br />
-------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
(1 row)<br />
<br />
=# select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json at character 8<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
STATEMENT: select '{"username","posts":121,"emailaddress":"john@nowhere.com"}'::json;<br />
ERROR: invalid input syntax for type json<br />
LINE 1: select '{"username","posts":121,"emailaddress":"john@nowhere...<br />
^<br />
DETAIL: Expected ":", but found ",".<br />
CONTEXT: JSON data, line 1: {"username",...<br />
<br />
You can also convert a row type to JSON:<br />
<br />
=#select * from demo ;<br />
username | posts | emailaddress <br />
----------+-------+---------------------<br />
john | 121 | john@nowhere.com<br />
mickael | 215 | mickael@nowhere.com<br />
(2 rows)<br />
<br />
=# select row_to_json(demo) from demo;<br />
row_to_json <br />
-------------------------------------------------------------------------<br />
{"username":"john","posts":121,"emailaddress":"john@nowhere.com"}<br />
{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}<br />
(2 rows)<br />
<br />
Or an array type:<br />
<br />
<br />
=# select array_to_json(array_agg(demo)) from demo;<br />
array_to_json <br />
---------------------------------------------------------------------------------------------------------------------------------------------<br />
[{"username":"john","posts":121,"emailaddress":"john@nowhere.com"},{"username":"mickael","posts":215,"emailaddress":"mickael@nowhere.com"}]<br />
(1 row)<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18056What's new in PostgreSQL 9.22012-08-22T16:02:24Z<p>Marco44: /* SP-GIST */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GiST=<br />
<br />
SP-GiST stands for Space Partitionned GiST, GiST being Generalized Search Tree. GiST is an index type, and has been available for quite a while in PostgreSQL. GiST is already very efficient at indexing complex data types, but performance tends to suffer when the source data isn't uniformly distributed. SP-GiST tries to fix that.<br />
<br />
As all indexing methods available in PostgreSQL, SP-GiST is a generic indexing method, meaning its purpose is to index whatever you'll throw at it, using operators you'll provide. It means that if you want to create a new datatype, and make it indexable through SP-GiST, you'll have to follow the documented API.<br />
<br />
SP-GiST can be used to implement 3 type of indexes: trie (suffix) indexing, Quadtree (data is divided into quadrants), and k-d tree (k-dimensional tree).<br />
<br />
For now, SP-GiST is provided with operator families called "quad_point_ops", "kd_point_ops" and "text_ops".<br />
<br />
As their names indicate, the first one indexes point types, using a quadtree, the second one indexes point types using a k-d tree, and the third one indexes text, using suffix.<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18023What's new in PostgreSQL 9.22012-08-16T13:23:45Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
== Back-References in Regular Expressions ==<br />
<br />
== New options for pg_dump ==<br />
<br />
--section=pre-data --section=post-data --section=data<br />
--exclude-table-data<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18022What's new in PostgreSQL 9.22012-08-16T13:12:49Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Domains can also be declared as not valid<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18021What's new in PostgreSQL 9.22012-08-16T13:12:01Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
= Other new features =<br />
<br />
== DROP INDEX CONCURRENTLY ==<br />
<br />
== NOT VALID CHECK constraints ==<br />
<br />
Check constraints can also be renamed now.<br />
<br />
== Reduce ALTER TABLE rewrites ==<br />
<br />
== Security barriers and Leakproof ==<br />
<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=18020What's new in PostgreSQL 9.22012-08-16T13:00:12Z<p>Marco44: /* pg_stat_activity and pg_stat_replication's definitions have changed */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&amp;lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
==pg_stat_activity and pg_stat_replication's definitions have changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
The view pg_stat_replication has also changed. The column procpid is renamed to pid, to also be consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17972What's new in PostgreSQL 9.22012-08-01T15:14:47Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* Statistics are collected on array contents <!-- Alexander Korotkov -->, allowing for better estimations of selectivity on array operations.<br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for their constant values will be considered the same, as long as their post-parse analysis query tree (that is, the internal representation of the query before rule expansion) are the same. This also implies that differences that are not semantically essential to the query, such as variations in whitespace or alias names, or the use of one particular syntax over another equivalent one will not differentiate queries.<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17889What's new in PostgreSQL 9.22012-07-09T09:33:58Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements <!-- Tom Lane--><br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
This example is straight from the developpers mailing lists <!-- Andres Freund -->:<br />
<br />
<pre><br />
CREATE TABLE a (<br />
a_id serial PRIMARY KEY NOT NULL,<br />
b_id integer<br />
);<br />
CREATE INDEX a__b_id ON a USING btree (b_id);<br />
<br />
<br />
CREATE TABLE b (<br />
b_id serial NOT NULL,<br />
c_id integer<br />
);<br />
CREATE INDEX b__c_id ON b USING btree (c_id);<br />
<br />
<br />
CREATE TABLE c (<br />
c_id serial PRIMARY KEY NOT NULL,<br />
value integer UNIQUE<br />
);<br />
<br />
INSERT INTO b (b_id, c_id)<br />
SELECT g.i, g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO a(b_id)<br />
SELECT g.i FROM generate_series(1, 50000) g(i);<br />
<br />
INSERT INTO c(c_id,value)<br />
VALUES (1,1);<br />
</pre><br />
<br />
So we have a referencing b, b referencing c.<br />
<br />
Here is an example of a query working badly with PostgreSQL 9.1:<br />
<br />
<pre><br />
EXPLAIN ANALYZE SELECT 1 <br />
FROM <br />
c<br />
WHERE<br />
EXISTS (<br />
SELECT * <br />
FROM a<br />
JOIN b USING (b_id)<br />
WHERE b.c_id = c.c_id)<br />
AND c.value = 1;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=1347.00..3702.27 rows=1 width=0) (actual time=13.799..13.802 rows=1 loops=1)<br />
Join Filter: (c.c_id = b.c_id)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Hash Join (cost=1347.00..3069.00 rows=50000 width=4) (actual time=13.788..13.788 rows=1 loops=1)<br />
Hash Cond: (a.b_id = b.b_id)<br />
-> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.007..0.007 rows=1 loops=1)<br />
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=13.760..13.760 rows=50000 loops=1)<br />
Buckets: 8192 Batches: 1 Memory Usage: 1954kB<br />
-> Seq Scan on b (cost=0.00..722.00 rows=50000 width=8) (actual time=0.008..5.702 rows=50000 loops=1)<br />
Total runtime: 13.842 ms<br />
</pre><br />
<br />
Not that bad, 13 milliseconds. Still, we are doing sequential scans on a and b, when our common sense tells us that c.value=1 should be used to filter rows more aggressively.<br />
<br />
Here's what 9.2 does with this query:<br />
<br />
<pre><br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Semi Join (cost=0.00..16.97 rows=1 width=0) (actual time=0.035..0.037 rows=1 loops=1)<br />
-> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.009 rows=1 loops=1)<br />
Index Cond: (value = 1)<br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
The «parameterized path» is:<br />
<pre><br />
-> Nested Loop (cost=0.00..8.69 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)<br />
-> Index Scan using b__c_id on b (cost=0.00..8.33 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=1)<br />
Index Cond: (c_id = c.c_id)<br />
-> Index Only Scan using a__b_id on a (cost=0.00..0.35 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br />
Index Cond: (b_id = b.b_id)<br />
Total runtime: 0.089 ms<br />
</pre><br />
<br />
This part of the plan depends on a parent node (c_id=c.c_id). This part of the plan is called each time with a different parameter coming from the parent node.<br />
<br />
This plan is of course much faster, as there is no need to fully scan a, and to fully scan AND hash b.<br />
<br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for the constant values will be considered the same, as long as their execution plan is the same<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17887What's new in PostgreSQL 9.22012-07-06T17:05:00Z<p>Marco44: /* Performance improvements */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements<br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
<!-- ** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from an upper-level node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
Here is a (not realistic) example:<br />
<pre><br />
CREATE TABLE t1 (a int primary key, b text);<br />
CREATE TABLE t2 (a int primary key, b int references t1(a), c text);<br />
CREATE TABLE t3 (a int primary key, b int references t2(a), c text);<br />
INSERT INTO t1 select generate_series(1,1001),repeat(generate_series(1,1001)::text,20);<br />
INSERT INTO t2 select generate_series(1,100000),generate_series(1,100000)/100+1,repeat(generate_series(1,100000)::text,20);<br />
INSERT INTO t3 select generate_series(1,100000),generate_series(1,100000)/100+1,repeat(generate_series(1,100000)::text,20);<br />
CREATE INDEX t3_b ON t3(b);<br />
</pre><br />
<br />
So we have three tables, t3 referencing t2 referencing t1. On simple joins, everything works well, PostgreSQL chooses nested loops, hash joins or merge joins as it sees fit. Problems can arise when join order are constrained.<br />
<br />
On PostgreSQL 9.1:<br />
<pre><br />
explain ANALYZE select * from t3 left join (t1 join t2 on t1.a=t2.b) on t2.a=t3.b where t3.c='00000'; QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------<br />
Merge Right Join (cost=20.03..344.74 rows=10 width=196) (actual time=12.400..12.410 rows=9 loops=1)<br />
Merge Cond: (t2.a = t3.b)<br />
-> Nested Loop (cost=0.00..32402.26 rows=100000 width=168) (actual time=0.014..0.019 rows=2 loops=1)<br />
-> Index Scan using t2_pkey on t2 (cost=0.00..4310.26 rows=100000 width=106) (actual time=0.006..0.007 rows=2 loops=1)<br />
-> Index Scan using t1_pkey on t1 (cost=0.00..0.27 rows=1 width=62) (actual time=0.002..0.002 rows=1 loops=2)<br />
Index Cond: (a = t2.b)<br />
-> Sort (cost=20.03..20.05 rows=10 width=28) (actual time=12.381..12.382 rows=9 loops=1)<br />
Sort Key: t3.b<br />
Sort Method: quicksort Memory: 25kB<br />
-> Index Scan using t3_c on t3 (cost=0.00..19.86 rows=10 width=28) (actual time=12.349..12.352 rows=9 loops=1)<br />
Index Cond: (c = '00000'::text)<br />
Total runtime: 12.524 ms<br />
</pre><br />
On PostgreSQL 9.2:<br />
<pre><br />
explain analyze select * from t3 left join (t1 join t2 on t1.a=t2.b) on t2.a=t3.b where t3.c='00000';<br />
QUERY PLAN <br />
--------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Left Join (cost=0.00..107.30 rows=10 width=196) (actual time=0.076..0.125 rows=9 loops=1)<br />
-> Index Scan using t3_c on t3 (cost=0.00..20.59 rows=10 width=28) (actual time=0.061..0.061 rows=9 loops=1)<br />
Index Cond: (c = '00000'::text)<br />
-> Nested Loop (cost=0.00..8.66 rows=1 width=168) (actual time=0.005..0.006 rows=1 loops=9)<br />
-> Index Scan using t2_pkey on t2 (cost=0.00..8.38 rows=1 width=106) (actual time=0.002..0.002 rows=1 loops=9)<br />
Index Cond: (a = t3.b)<br />
-> Index Scan using t1_pkey on t1 (cost=0.00..0.27 rows=1 width=62) (actual time=0.002..0.002 rows=1 loops=9)<br />
Index Cond: (a = t2.b)<br />
Total runtime: 0.203 ms<br />
</pre><br />
<br />
Here, the 'parameterized path' is <br />
<pre><br />
-> Nested Loop (cost=0.00..8.66 rows=1 width=168) (actual time=0.005..0.006 rows=1 loops=9)<br />
-> Index Scan using t2_pkey on t2 (cost=0.00..8.38 rows=1 width=106) (actual time=0.002..0.002 rows=1 loops=9)<br />
Index Cond: (a = t3.b)<br />
-> Index Scan using t1_pkey on t1 (cost=0.00..0.27 rows=1 width=62) (actual time=0.002..0.002 rows=1 loops=9)<br />
Index Cond: (a = t2.b)<br />
</pre><br />
<br />
as it depends on t3.b, which is different on each loop: the nested loop is executed with a different parameter each time it produces results for the consuming node (the top nested loop left join).<br />
--><br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for the constant values will be considered the same, as long as their execution plan is the same<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17886What's new in PostgreSQL 9.22012-07-06T14:31:50Z<p>Marco44: /* Performance improvements */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements<br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
** A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from an upper-level node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it wanted to.<br />
<br />
Here is a (not realistic) example:<br />
<pre><br />
CREATE TABLE t1 (a int primary key, b text);<br />
CREATE TABLE t2 (a int primary key, b int references t1(a), c text);<br />
CREATE TABLE t3 (a int primary key, b int references t1(a), c text);<br />
INSERT INTO t1 select generate_series(1,1001),repeat(generate_series(1,1001)::text,20);<br />
INSERT INTO t2 select generate_series(1,100000),generate_series(1,100000)/100+1,repeat(generate_series(1,100000)::text,20);<br />
INSERT INTO t3 select generate_series(1,100000),generate_series(1,100000)/100+1,repeat(generate_series(1,100000)::text,20);<br />
CREATE INDEX t3_b ON t3(b);<br />
</pre><br />
<br />
So we have three tables, t3 referencing t2 referencing t1. On simple joins, everything works well, PostgreSQL chooses nested loops, hash joins or merge joins as it sees fit. Problems can arise when join order are constrained.<br />
<br />
On PostgreSQL 9.1:<br />
<pre><br />
explain ANALYZE select * from t3 left join (t1 join t2 on t1.a=t2.b) on t2.a=t3.b where t3.c='00000'; QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------<br />
Merge Right Join (cost=20.03..344.74 rows=10 width=196) (actual time=12.400..12.410 rows=9 loops=1)<br />
Merge Cond: (t2.a = t3.b)<br />
-> Nested Loop (cost=0.00..32402.26 rows=100000 width=168) (actual time=0.014..0.019 rows=2 loops=1)<br />
-> Index Scan using t2_pkey on t2 (cost=0.00..4310.26 rows=100000 width=106) (actual time=0.006..0.007 rows=2 loops=1)<br />
-> Index Scan using t1_pkey on t1 (cost=0.00..0.27 rows=1 width=62) (actual time=0.002..0.002 rows=1 loops=2)<br />
Index Cond: (a = t2.b)<br />
-> Sort (cost=20.03..20.05 rows=10 width=28) (actual time=12.381..12.382 rows=9 loops=1)<br />
Sort Key: t3.b<br />
Sort Method: quicksort Memory: 25kB<br />
-> Index Scan using t3_c on t3 (cost=0.00..19.86 rows=10 width=28) (actual time=12.349..12.352 rows=9 loops=1)<br />
Index Cond: (c = '00000'::text)<br />
Total runtime: 12.524 ms<br />
</pre><br />
On PostgreSQL 9.2:<br />
<pre><br />
explain analyze select * from t3 left join (t1 join t2 on t1.a=t2.b) on t2.a=t3.b where t3.c='00000';<br />
QUERY PLAN <br />
--------------------------------------------------------------------------------------------------------------------------<br />
Nested Loop Left Join (cost=0.00..107.30 rows=10 width=196) (actual time=0.076..0.125 rows=9 loops=1)<br />
-> Index Scan using t3_c on t3 (cost=0.00..20.59 rows=10 width=28) (actual time=0.061..0.061 rows=9 loops=1)<br />
Index Cond: (c = '00000'::text)<br />
-> Nested Loop (cost=0.00..8.66 rows=1 width=168) (actual time=0.005..0.006 rows=1 loops=9)<br />
-> Index Scan using t2_pkey on t2 (cost=0.00..8.38 rows=1 width=106) (actual time=0.002..0.002 rows=1 loops=9)<br />
Index Cond: (a = t3.b)<br />
-> Index Scan using t1_pkey on t1 (cost=0.00..0.27 rows=1 width=62) (actual time=0.002..0.002 rows=1 loops=9)<br />
Index Cond: (a = t2.b)<br />
Total runtime: 0.203 ms<br />
</pre><br />
<br />
Here, the 'parameterized path' is <br />
<pre><br />
-> Nested Loop (cost=0.00..8.66 rows=1 width=168) (actual time=0.005..0.006 rows=1 loops=9)<br />
-> Index Scan using t2_pkey on t2 (cost=0.00..8.38 rows=1 width=106) (actual time=0.002..0.002 rows=1 loops=9)<br />
Index Cond: (a = t3.b)<br />
-> Index Scan using t1_pkey on t1 (cost=0.00..0.27 rows=1 width=62) (actual time=0.002..0.002 rows=1 loops=9)<br />
Index Cond: (a = t2.b)<br />
</pre><br />
<br />
as it depends on t3.b, which is different on each loop: the nested loop is executed with a different parameter each time it produces results for the consuming node (the top nested loop left join).<br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for the constant values will be considered the same, as long as their execution plan is the same<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17885What's new in PostgreSQL 9.22012-07-06T12:33:40Z<p>Marco44: /* Performance improvements */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
* As for every version, the optimizer has received its share of improvements<br />
** Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.<br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for the constant values will be considered the same, as long as their execution plan is the same<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17883What's new in PostgreSQL 9.22012-07-05T14:49:39Z<p>Marco44: /* Performance improvements */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains (non-exaustive):<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for the constant values will be considered the same, as long as their execution plan is the same<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17882What's new in PostgreSQL 9.22012-07-05T14:48:09Z<p>Marco44: /* Performance improvements */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains:<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
=SP-GIST=<br />
TODO<br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for the constant values will be considered the same, as long as their execution plan is the same<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17881What's new in PostgreSQL 9.22012-07-05T13:15:18Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains:<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
=pg_stat_statements=<br />
<br />
This contrib module has received a lot of improvements in this version:<br />
<br />
* Queries are normalized: queries that are identical except for the constant values will be considered the same, as long as their execution plan is the same<br />
<br />
<pre><br />
=#select * from words where word= 'foo';<br />
word <br />
------<br />
(0 ligne)<br />
<br />
=# select * from words where word= 'bar';<br />
word <br />
------<br />
bar<br />
<br />
=#select * from pg_stat_statements where query like '%words where%';<br />
-[ RECORD 1 ]-------+-----------------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select * from words where word= ?;<br />
calls | 2<br />
total_time | 142.314<br />
rows | 1<br />
shared_blks_hit | 3<br />
shared_blks_read | 5<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 142.165<br />
blk_write_time | 0<br />
<br />
</pre><br />
<br />
The two queries are shown as one in pg_stat_statements.<br />
<br />
* For prepared statements, the execution part (execute statement) is charged on the prepare statement. That way it is easier to use, and avoids the double-counting there was with PostgreSQL 9.1.<br />
<br />
* pg_stat_statements displays timing in milliseconds, to be consistent with other system views.<br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17880What's new in PostgreSQL 9.22012-07-05T12:56:18Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
This version has performance improvements on a very large range of domains:<br />
<br />
* The most visible will probably be the Index Only Scans, which has already been introduced in this document.<br />
<br />
* The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability, for machines with over 32 cores mostly. <!-- Robert Haas --><br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. <!-- Peter Geoghegan --><br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption <!--Peter Geoghegan-->. This is especially useful on virtualized and embedded environments.<br />
<br />
* COPY has been improved, it will generate less WAL volume and less locks of tables's pages. <!-- Heikki Linnakangas --><br />
<br />
* The system can now track IO durations <!--Ants Aasma --><br />
<br />
This one deserves a little explanation, as it can be a little tricky. Tracking IO durations means asking repeatedly the time to the operating system. Depending on the operating system and the hardware, this can be quite cheap, or extremely costly. The most import factor here is where the system gets its time from. It could be directly retrieved from the processor (TSC), dedicated hardware such as HPET, or an ACPI call. What's most important is that the cost of getting time can vary from a factor of thousands.<br />
<br />
If you are interested in this timing data, it's better to first check if your system will support it without to much of a performance hit. PostgreSQL provides you with the pg_test_timing tool:<br />
<br />
<pre><br />
$ pg_test_timing <br />
Testing timing overhead for 3 seconds.<br />
Per loop time including overhead: 28.02 nsec<br />
Histogram of timing durations:<br />
< usec: count percent<br />
32: 41 0.00004%<br />
16: 1405 0.00131%<br />
8: 200 0.00019%<br />
4: 388 0.00036%<br />
2: 2982558 2.78523%<br />
1: 104100166 97.21287%<br />
</pre><br />
<br />
Here, everything is good: getting time costs around 28 nanoseconds, and has a very small variation. Anything under 100 nanoseconds should be good for production. If you get higher values, you may still find a way to tune your system. You'd better check on the [http://www.postgresql.org/docs/9.2/static/pgtesttiming.html documentation].<br />
<br />
Anyway, here is the data you'll be able to collect if your system is ready for this:<br />
<br />
First, you'll get per-database statistics, which will now give accurate information about which database is doing most I/O:<br />
<br />
<pre><br />
=# select * from pg_stat_database where datname = 'mydb';<br />
-[ RECORD 1 ]--+------------------------------<br />
datid | 16384<br />
datname | mydb<br />
numbackends | 1<br />
xact_commit | 270<br />
xact_rollback | 2<br />
blks_read | 1961<br />
blks_hit | 17944<br />
tup_returned | 269035<br />
tup_fetched | 8850<br />
tup_inserted | 16<br />
tup_updated | 4<br />
tup_deleted | 45<br />
conflicts | 0<br />
temp_files | 0<br />
temp_bytes | 0<br />
deadlocks | 0<br />
blk_read_time | 583.774<br />
blk_write_time | 0<br />
stats_reset | 2012-07-03 17:18:54.796817+02<br />
</pre><br />
We see here that mydb has only consumed 583.774 milliseconds of read time.<br />
<br />
Explain will benefit from this too:<br />
<pre><br />
=# explain (analyze,buffers) select count(*) from mots ;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------<br />
Aggregate (cost=1669.95..1669.96 rows=1 width=0) (actual time=21.943..21.943 rows=1 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
-> Seq Scan on mots (cost=0.00..1434.56 rows=94156 width=0) (actual time=0.059..12.933 rows=94156 loops=1)<br />
Buffers: shared read=493<br />
I/O Timings: read=2.578<br />
Total runtime: 22.059 ms<br />
</pre><br />
We now have a separate information about the time taken to retrieve data from the operating system. Obviously, here, the data was in the operating system's cache (2 milliseconds to read 493 blocks).<br />
<br />
And last, if you have enabled pg_stat_statements:<br />
<pre><br />
select * from pg_stat_statements where query ~ 'words';<br />
-[ RECORD 1 ]-------+---------------------------<br />
userid | 10<br />
dbid | 16384<br />
query | select count(*) from words;<br />
calls | 2<br />
total_time | 78.332<br />
rows | 2<br />
shared_blks_hit | 0<br />
shared_blks_read | 986<br />
shared_blks_dirtied | 0<br />
shared_blks_written | 0<br />
local_blks_hit | 0<br />
local_blks_read | 0<br />
local_blks_dirtied | 0<br />
local_blks_written | 0<br />
temp_blks_read | 0<br />
temp_blks_written | 0<br />
blk_read_time | 58.427<br />
blk_write_time | 0<br />
</pre><br />
<br />
= Explain improvements=<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive <!--Tomas Vondra--><br />
<br />
<br />
Have EXPLAIN ANALYZE report the number of rows rejected by filter steps (Marko Tiikkaja)<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17878What's new in PostgreSQL 9.22012-07-03T16:23:44Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements <!-- Fujii Masao, Simon Riggs, Magnus Hagander, Jun Ishizuka -->==<br />
<br />
Streaming Replication is getting even more polished with this release. One on the main remaining gripes about streaming replication is that all the slaves have to be connected to the same and unique master, consuming its resources.<br />
<br />
Moreover, in case of a failover, it was very complicated to reconnect all the remaining slaves to the newly promoted master.<br />
<br />
To be on the safe side, it was easier to re-synchronize the slaves to the new masters from scratch, meaning that during this failover, only one server was active, and under heavy load, as it was used to rebuild all the slaves.<br />
<br />
* With 9.2, a slave can also be a replication master, allowing for cascading replication.<br />
<br />
Let's build this. We start with an already working 9.2 database.<br />
<br />
We set it up for replication:<br />
<br />
postgresql.conf:<br />
wal_level=hot_standby #(could be archive too)<br />
max_wal_senders=5<br />
hot_standby=on<br />
<br />
You'll probably also want to activate archiving in production, it won't be done here.<br />
<br />
pg_hba.conf (do not use trust in production):<br />
host replication replication_user 0.0.0.0/0 md5<br />
<br />
Create the user:<br />
create user replication_user replication password 'secret';<br />
<br />
Clone the database:<br />
<br />
pg_basebackup -h localhost -U replication_user -D data2<br />
Password:<br />
<br />
We have a brand new cluster in the data2 directory. We'll change the port so that it can start (postgresql.conf):<br />
port=5433<br />
<br />
We add a recovery.conf to tell it how to stream from the master database:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5432 user=replication_user password=secret' <br />
<br />
pg_ctl -D data2 start<br />
server starting<br />
LOG: database system was interrupted; last known up at 2012-07-03 17:58:09 CEST<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9D0000B8<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, let's add a second slave, which will use this slave:<br />
<br />
<br />
pg_basebackup -h localhost -U replication_user -D data3 -p 5433<br />
Password: <br />
<br />
We edit data3's postgresql.conf to change the port:<br />
port=5434<br />
<br />
We modify the recovery.conf to stream from the slave:<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=5433 user=replication_user password=secret' # e.g. 'host=localhost port=5432'<br />
<br />
We start the cluster:<br />
pg_ctl -D data3 start<br />
server starting<br />
LOG: database system was interrupted while in recovery at log time 2012-07-03 17:58:09 CEST<br />
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.<br />
LOG: creating missing WAL directory "pg_xlog/archive_status"<br />
LOG: entering standby mode<br />
LOG: streaming replication successfully connected to primary<br />
LOG: redo starts at 0/9D000020<br />
LOG: consistent recovery state reached at 0/9E000000<br />
LOG: database system is ready to accept read only connections<br />
<br />
Now, everything modified on the master cluster get streamed to the first slave, and from there to the second slave. This second replication has to be monitored from the first slave (the master knows nothing about it).<br />
<br />
<br />
* As you may have noticed from the examble, pg_basebackup now works from slaves.<br />
<br />
* There is another use case that wasn't covered: what if a user didn't care for having a full fledged slave, and only wanted to stream the WAL files to another location, to benefit from the reduced data loss without the burden of maintaining a slave ?<br />
<br />
pg_receivexlog is provided just for this purpose: it pretends to be a PostgreSQL slave, but only stores the log files as they are streamed, in a directory:<br />
pg_receivexlog -D /tmp/new_logs -h localhost -U replication_user<br />
<br />
will connect to the master (or a slave), and start creating files: <br />
ls /tmp/new_logs/<br />
00000001000000000000009E.partial<br />
<br />
Files are of the segment size, so they can be used for a normal recovery of the database. It's the same as an archive command, but with a much smaller granularity.<br />
<br />
* synchronous_commit has a new value: remote_write. It can be used when there is a synchronous slave (synchronous_standby_names is set), meaning that the master doesn't have to wait for the slave to have written the data to disk, only for the slave to have acknowledged the data. With this set, data is protected from a crash on the master, but could still be lost if the slave crashed at the same time (i.e. before having written the in flight data to disk). As this is a quite remote possibility, some people will be interested in this compromise.<br />
<br />
<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17873What's new in PostgreSQL 9.22012-07-03T09:49:56Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Replication improvements==<br />
Streaming replication slaves can now serve as a source for other slaves.<br />
<br />
pg_receivexlog<br />
<br />
pg_basebackup command now also works from slaves<br />
<br />
synchronous_commit remote_write<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17872What's new in PostgreSQL 9.22012-07-03T08:54:42Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans <!-- Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane -->==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in the table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big performance problem: the index is mostly ordered, so accessing its records is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an «all visible» page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backends doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the visibility map to be up-to-date:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache (that's where the gains are, as the purpose on Index Only Scans is to reduce I/O).<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Cascading replication==<br />
Streaming replication slaves can now serve as a source for other slaves. This can be used to reduce the impact of replication on the master server. (More info: [http://www.depesz.com/2011/07/26/waiting-for-9-2-cascading-streaming-replication/ depesz blog])<br />
<br />
A related feature, the pg_basebackup command now also works from slaves (More info: [http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ depesz blog])<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17871What's new in PostgreSQL 9.22012-07-03T08:37:36Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big problem, as the index is mostly ordered, so accessing its record is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an all visible page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backend doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the main query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the table to set the visibility map:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache.<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Cascading replication==<br />
Streaming replication slaves can now serve as a source for other slaves. This can be used to reduce the impact of replication on the master server. (More info: [http://www.depesz.com/2011/07/26/waiting-for-9-2-cascading-streaming-replication/ depesz blog])<br />
<br />
A related feature, the pg_basebackup command now also works from slaves (More info: [http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ depesz blog])<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
==postgresql.conf parameters changes <!-- (Heikki Linnakangas, Tom Lane, Peter Eisentraut) -->==<br />
<br />
* silent_mode has been removed. Use pg_ctl -l postmaster.log<br />
* wal_sender_delay has been removed. It is no longer needed<br />
* custom_variable_classes has been removed. All «classes» are accepted without declaration now<br />
* ssl_ca_file, ssl_cert_file, ssl_crl_file, ssl_key_file have been added, meaning you can now specify the ssl files<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17869What's new in PostgreSQL 9.22012-07-02T13:21:15Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big problem, as the index is mostly ordered, so accessing its record is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an all visible page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backend doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the main query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the table to set the visibility map:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache.<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Cascading replication==<br />
Streaming replication slaves can now serve as a source for other slaves. This can be used to reduce the impact of replication on the master server. (More info: [http://www.depesz.com/2011/07/26/waiting-for-9-2-cascading-streaming-replication/ depesz blog])<br />
<br />
A related feature, the pg_basebackup command now also works from slaves (More info: [http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ depesz blog])<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
==pg_stat_activity's definition has changed <!--Magnus Hagander -->==<br />
<br />
The view pg_stat_activity has changed. It's not backward compatible, but let's see what this new definition brings us:<br />
<br />
* current_query disappears and is replaced by two columns:<br />
** state: is the session running a query, waiting<br />
** query: what is the last run (or still running) query<br />
* The column procpid is renamed to pid, to be consistent with other system views<br />
<br />
The benefit is mostly for tracking «idle in transaction» sessions. Up until now, all we could know was that one of these sessions was idle in transaction, meaning it has started a transaction, maybe done some operations, but still not committed. If that session stayed in this state for a while, there was no way of knowing how it got in this state.<br />
<br />
Here is an example:<br />
<pre><br />
-[ RECORD 1 ]----+---------------------------------<br />
datid | 16384<br />
datname | postgres<br />
pid | 20804<br />
usesysid | 10<br />
usename | postgres<br />
application_name | psql<br />
client_addr | <br />
client_hostname | <br />
client_port | -1<br />
backend_start | 2012-07-02 15:02:51.146427+02<br />
xact_start | 2012-07-02 15:15:28.386865+02<br />
query_start | 2012-07-02 15:15:30.410834+02<br />
state_change | 2012-07-02 15:15:30.411287+02<br />
waiting | f<br />
state | idle in transaction<br />
query | DELETE FROM test;<br />
</pre><br />
<br />
With PostgreSQL 9.1, all we would have would be «idle in transaction».<br />
<br />
As this change was backward-incompatible, procpid was also renamed to pid, to be more consistent with other system views.<br />
<br />
==Change all SQL-level statistics timing values to float8-stored milliseconds <!-- (Tom Lane) -->==<br />
<br />
pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17868What's new in PostgreSQL 9.22012-07-02T12:57:30Z<p>Marco44: /* Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 */</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big problem, as the index is mostly ordered, so accessing its record is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an all visible page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backend doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the main query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the table to set the visibility map:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache.<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Cascading replication==<br />
Streaming replication slaves can now serve as a source for other slaves. This can be used to reduce the impact of replication on the master server. (More info: [http://www.depesz.com/2011/07/26/waiting-for-9-2-cascading-streaming-replication/ depesz blog])<br />
<br />
A related feature, the pg_basebackup command now also works from slaves (More info: [http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ depesz blog])<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
Now PostgreSQL chooses the date closest to 2020, for 2 and 3 digit dates.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17867What's new in PostgreSQL 9.22012-07-02T09:51:44Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big problem, as the index is mostly ordered, so accessing its record is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an all visible page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backend doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the main query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the table to set the visibility map:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache.<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Cascading replication==<br />
Streaming replication slaves can now serve as a source for other slaves. This can be used to reduce the impact of replication on the master server. (More info: [http://www.depesz.com/2011/07/26/waiting-for-9-2-cascading-streaming-replication/ depesz blog])<br />
<br />
A related feature, the pg_basebackup command now also works from slaves (More info: [http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ depesz blog])<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values <!-- (Florian Pflug)--> ==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
==Have EXTRACT of a non-timezone-aware value measure the epoch from local midnight, not UTC midnight <!-- (Tom Lane) -->==<br />
<br />
<br />
With PostgreSQL 9.1:<br />
<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
<br />
There is no difference in behaviour between a timstamp with or without timezone.<br />
<br />
With 9.1:<br />
<pre><br />
=#SELECT extract(epoch from '2012-07-02 00:00:00'::timestamp);<br />
date_part <br />
------------<br />
1341187200<br />
(1 row)<br />
<br />
=# SELECT extract(epoch from '2012-07-02 00:00:00'::timestamptz);<br />
date_part <br />
------------<br />
1341180000<br />
(1 row)<br />
</pre><br />
When the timestamp has no timezone, the epoch is calculated with the "local midnight", meaning the 1st january of 1970 at midnight, local-time.<br />
<br />
<br />
==Fix to_date() and to_timestamp() to wrap incomplete dates toward 2020 <!-- (Bruce Momjian)-->==<br />
<br />
The wrapping was not consistent between 2 digit dates and 3 digit dates: 2 digit dates always chose the date closest to 2020, 3 digit dates mapped dates from 100 to 999 on 1100 to 1999, and 000 to 099 on 2000 to 2099.<br />
<br />
With 9.1:<br />
<pre><br />
=# SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
1200-07-02<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT to_date('200-07-02','YYY-MM-DD');<br />
to_date <br />
------------<br />
2200-07-02<br />
</pre><br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17866What's new in PostgreSQL 9.22012-07-02T09:03:38Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big problem, as the index is mostly ordered, so accessing its record is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an all visible page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backend doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the main query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the table to set the visibility map:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache.<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Cascading replication==<br />
Streaming replication slaves can now serve as a source for other slaves. This can be used to reduce the impact of replication on the master server. (More info: [http://www.depesz.com/2011/07/26/waiting-for-9-2-cascading-streaming-replication/ depesz blog])<br />
<br />
A related feature, the pg_basebackup command now also works from slaves (More info: [http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ depesz blog])<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values== <!-- (Florian Pflug)-->==<br />
<br />
Before 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
<br />
'<' Isn't valid XML.<br />
</pre><br />
With 9.2:<br />
<pre><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&amp;lt;<br />
</pre><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)-->==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<pre><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</pre><br />
<br />
With 9.2:<br />
<pre><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</pre><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<pre><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</pre><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <!-- (Phil Sorber)-->==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <!-- (Magnus Hagander)-->==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<pre><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</pre><br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17865What's new in PostgreSQL 9.22012-07-02T08:57:56Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big problem, as the index is mostly ordered, so accessing its record is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an all visible page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backend doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the main query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the table to set the visibility map:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache.<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Cascading replication==<br />
Streaming replication slaves can now serve as a source for other slaves. This can be used to reduce the impact of replication on the master server. (More info: [http://www.depesz.com/2011/07/26/waiting-for-9-2-cascading-streaming-replication/ depesz blog])<br />
<br />
A related feature, the pg_basebackup command now also works from slaves (More info: [http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ depesz blog])<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
=Backward compatibility=<br />
<br />
These changes may incur regressions in your applications.<br />
<br />
==Ensure that xpath() escapes special characters in string values== <!-- (Florian Pflug)><br />
<br />
Before 9.2:<br />
<code><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
<<br />
</code><br />
'<' Isn't valid XML.<br />
<br />
With 9.2:<br />
<code><br />
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1];<br />
xpath <br />
-------<br />
&lt;<br />
</code><br />
<br />
==Remove hstore's => operator <!-- (Robert Haas)>==<br />
Up to 9.1, one could use the => operator to create a hstore. Hstore is a contrib, used to store key/values pairs in a column.<br />
<br />
In 9.1:<br />
<code><br />
SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
<br />
=# SELECT 'a'=>'b';<br />
?column? <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# SELECT pg_typeof('a'=>'b');<br />
pg_typeof <br />
-----------<br />
hstore<br />
(1 row)<br />
</code><br />
<br />
In 9.2:<br />
<code><br />
SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown at character 11<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
STATEMENT: SELECT 'a'=>'b';<br />
ERROR: operator does not exist: unknown => unknown<br />
LINE 1: SELECT 'a'=>'b';<br />
^<br />
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.<br />
</code><br />
<br />
It doesn't mean one cannot use '=>' in hstores, it just isn't an operator anymore:<br />
<br />
<code><br />
=# select hstore('a=>b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
<br />
=# select hstore('a','b');<br />
hstore <br />
----------<br />
"a"=>"b"<br />
(1 row)<br />
</code><br />
are still two valid ways to input a hstore.<br />
<br />
"=>" is removed as an operator as it is a reserved keyword in SQL.<br />
<br />
<br />
==Have pg_relation_size() and friends return NULL if the object does not exist <-!!(Phil Sorber)>==<br />
<br />
A relation could be dropped by a concurrent session, while one was doing a pg_relation_size on it, leading to a SQL exception. Now, it merely returns NULL for this record.<br />
<br />
<br />
==Remove the spclocation field from pg_tablespace <-!!(Magnus Hagander)>==<br />
<br />
The spclocation field provided the real location of the tablespace. It was filled in during the CREATE or ALTER TABLESPACE command. So it could be wrong: somebody just had to shutdown the cluster, move the tablespace's directory, re-create the symlink in pg_tblspc, and forget to update the spclocation field. The cluster would still run, as the spclocation wasn't used.<br />
<br />
So this field has been removed. To get the tablespace's location, use pg_tablespace_location():<br />
<br />
<code><br />
=# select *, pg_tablespace_location(oid) as spclocation from pg_tablespace;<br />
spcname | spcowner | spcacl | spcoptions | spclocation <br />
------------+----------+--------+------------+----------------<br />
pg_default | 10 | | | <br />
pg_global | 10 | | | <br />
tmptblspc | 10 | | | /tmp/tmptblspc<br />
</code><br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.2&diff=17645What's new in PostgreSQL 9.22012-05-24T14:05:38Z<p>Marco44: redo index only scans with an example that shows a big gain</p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.2, compared to the last major release &ndash; PostgreSQL 9.1. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in ''Release Notes''.<br />
<br />
'''This page is incomplete!'''<br />
<br />
=Major new features=<br />
<br />
==Index-only scans==<br />
<br />
In PostgreSQL, indexes have no "visibility" information. It means that when you access a record by its index, PostgreSQL has to visit the real tuple in table to be sure it is visible to you: the tuple the index points to may simply be an old version of the record you are looking for.<br />
<br />
It can be a very big problem, as the index is mostly ordered, so accessing its record is quite efficient, while the records may be scattered all over the place (that's a reason why PostgreSQL has a cluster command, but that's another story). In 9.2, PostgreSQL will use an "Index Only Scan" when possible, and not access the record itself if it doesn't need to.<br />
<br />
There is still no visibility information in the index. So in order to do this, PostgreSQL uses the visibility map ([http://www.postgresql.org/docs/devel/static/storage-vm.html visibility map]) , which tells it whether the whole content of a (usually) 8K page is visible to all transactions or not. When the index record points to a tuple contained in an all visible page, PostgreSQL won't have to access the tuple, it will be able to build it directly from the index. Of course, all the columns requested by the query must be in the index.<br />
<br />
The visibility map is maintained by VACUUM (it sets the visible bit), and by the backend doing SQL work (they unset the visible bit).<br />
<br />
Here is an example.<br />
<br />
create table demo_ios (col1 float, col2 float, col3 text);<br />
<br />
In this table, we'll put random data, in order to have "scattered" data. We'll put 100 million records, to have a big recordset, and have it not fit in memory (that's a 4GB-ram machine). This is an ideal case, made for this demo. The gains wont be that big in real life.<br />
<br />
insert into demo_ios select generate_series(1,100000000),random(), 'mynotsolongstring';<br />
<br />
select pg_size_pretty(pg_total_relation_size('demo_ios'));<br />
pg_size_pretty <br />
----------------<br />
6512 MB<br />
<br />
Let's pretend that the main query is this:<br />
<br />
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.02 AND 0.03<br />
<br />
In order to use an index only scan on this, we need an index on col2,col1 (col2 first, as it is used in the WHERE clause).<br />
<br />
CREATE index idx_demo_ios on demo_ios(col2,col1);<br />
<br />
We vacuum the table to set the visibility map:<br />
<br />
VACUUM demo_ios;<br />
<br />
All the timing you'll see below are done on a cold OS and PostgreSQL cache.<br />
<br />
Let's first try without Index Only Scans:<br />
<br />
set enable_indexonlyscan to off;<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
----------------------------------------------------------------------------------------------------------------------------------------<br />
Bitmap Heap Scan on demo_ios (cost=25643.01..916484.44 rows=993633 width=16) (actual time=763.391..362963.899 rows=1000392 loops=1)<br />
Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Rows Removed by Index Recheck: 68098621<br />
Buffers: shared hit=2 read=587779<br />
-> Bitmap Index Scan on idx_demo_ios (cost=0.00..25394.60 rows=993633 width=0) (actual time=759.011..759.011 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Buffers: shared hit=2 read=3835<br />
Total runtime: 364390.127 ms<br />
<br />
<br />
With Index Only Scans:<br />
<br />
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;<br />
QUERY PLAN <br />
-----------------------------------------------------------------------------------------------------------------------------------------------<br />
Index Only Scan using idx_demo_ios on demo_ios (cost=0.00..35330.93 rows=993633 width=16) (actual time=58.100..3250.589 rows=1000392 loops=1)<br />
Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))<br />
Heap Fetches: 0<br />
Buffers: shared hit=923073 read=3848<br />
Total runtime: 4297.405 ms<br />
<br />
<br />
<br />
As nothing is free, there are a few things to note:<br />
<br />
* Adding indexes for index only scans obviously adds indexes to your table. So updates will be slower.<br />
* You will index columns that weren't indexed before. So there will be less opportunities for HOT updates.<br />
* Gains will probably be smaller in real life situations.<br />
<br />
This required making visibility map changes crash-safe, so visibility map bit changes are now WAL-logged.<br />
<br />
==Cascading replication==<br />
Streaming replication slaves can now serve as a source for other slaves. This can be used to reduce the impact of replication on the master server. (More info: [http://www.depesz.com/2011/07/26/waiting-for-9-2-cascading-streaming-replication/ depesz blog])<br />
<br />
A related feature, the pg_basebackup command now also works from slaves (More info: [http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ depesz blog])<br />
<br />
==Multi-processor scalability improvements==<br />
The lock contention of several big locks has been significantly reduced, leading to better multi-processor scalability. (More info: [http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Robert Haas blog])<br />
<br />
==JSON datatype==<br />
The JSON datatype is meant for storing JSON-structured data. (More info: [http://www.depesz.com/2012/02/12/waiting-for-9-2-json/ depesz blog])<br />
<br />
== Range Types ==<br />
[[RangeTypes]] are added.<br />
(More info: [http://www.depesz.com/2011/11/07/waiting-for-9-2-range-data-types/])<br />
<br />
=Performance improvements=<br />
<br />
* The performance of in-memory sorts has been improved by up to 25% in some situations, with certain specialized sort functions introduced. (More info: [http://momjian.us/main/blogs/pgblog/2012.html#February_16_2012 Bruce Momjian's blog])<br />
<br />
* An idle PostgreSQL server now makes less wakeups, leading to lower power consumption ([http://pgeoghegan.blogspot.com/2012/01/power-consumption-in-postgres-92.html Peter Geoghegan's blog])<br />
<br />
* Timing can now be disabled with EXPLAIN (analyze on, timing off), leading to lower overhead on platforms where getting the current time is expensive ([http://www.depesz.com/2012/02/13/waiting-for-9-2-explain-timing/ depesz blog])<br />
<br />
<br />
[[Category:PostgreSQL 9.2]]</div>Marco44https://wiki.postgresql.org/index.php?title=SSI/fr&diff=15884SSI/fr2011-11-26T12:35:29Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
Documentation de la «Serializable Snapshot Isolation» (Isolation par Instantanés Sérialisables, ou SSI) dans PostgreSQL, comparée à la «Snapshot Isolation» (Isolation par Instantanés, ou SI). Celles-ci correspondent respectivement aux niveaux d'isolation de transaction SERIALIZABLE et REPEATABLE READ dans PostgreSQL, à partir de la version 9.1.<br />
<br />
== Aperçu ==<br />
<br />
Avec de vraies transactions sérialisables, si vous pouvez prouver que votre transaction fera ce qui est prévu si il n'y a aucune transaction concurrente, elle fera ce qui est prévu quelles que soient les autres transactions sérialisables qui s'exécuteront en même temps qu'elle, ou sera annulée pour erreur de sérialisation.<br />
<br />
Ce document montre les problèmes qui peuvent se produire avec certaines combinaisons de transactions au niveau d'isolation de transaction REPEATABLE READ, et comment elles sont évitées avec le niveau d'isolation SERIALIZABLE, à partir de PostgreSQL 9.1.<br />
<br />
Ce document est destiné au programmeur d'applications ou à l'administrateur de bases de données. Pour les détails sur l'implémentation de SSI, voyez la page de Wiki [[Serializable]]. Pour plus d'informations sur comment utiliser ce niveau d'isolation, voyez [http://docs.postgresql.fr/current/transaction-iso.html#XACT-SERIALIZABLE la documentation PostgreSQL courante].<br />
<br />
== Exemples ==<br />
<br />
Dans les environnements qui évitent de protéger leur intégrité en mettant en place des verrous bloquants, il sera fréquent que la base soit configurée (dans postgresql.conf) avec:<br />
default_transaction_isolation = 'serializable'<br />
Pour cette raison, tous les exemples ont été effectués avec ce paramétrage, ce qui a évité de polluer les exemples en se contentant d'un simple begin plutôt que de déclarer explicitement le niveau d'isolation pour chaque transaction.<br />
<br />
=== Write Skew Simple (Écriture Faussée Simple?) ===<br />
<br />
Quand deux transactions concurrentes déterminent chacune ce qu'elles écrivent en lisant des données qui se chevauchent avec des données que l'autre modifie, on peut se retrouver dans un état qui ne devrait pas apparaître si une des deux s'était exécutée avant l'autre. C'est un phénomène connu sous le nom de ''write skew'', et c'est la forme la plus simple de défaut de sérialisation contre laquelle SSI vous protège.<br />
<br />
Quand il y a write skew dans SSI, les deux transactions se déroulent jusqu'à ce que l'une valide. La première à valider gagne, et l'autre transaction est annulée. La règle du "le premier à valider gagne" garantit que du travail peut avoir lieu sur la base et que la transaction qui est annulée puisse être tentée à nouveau immédiatement.<br />
<br />
----<br />
==== Noir et Blanc ====<br />
<br />
Dans ce cas, il y a des enregistrement avec une colonne couleur contenant 'blanc' ou 'noir'. Deux utilisateurs essayent simultanément de convertir tous les enregistrements vers une couleur unique, mais chacun dans une direction opposée. Un veut tout passer tous les blancs en noir, et l'autre tous les noirs en blanc.<br />
<br />
L'exemple peut être mis en place avec ces ordres: <br />
create table points<br />
(<br />
id int not null primary key,<br />
couleur text not null<br />
);<br />
insert into points<br />
with x(id) as (select generate_series(1,10))<br />
select id, case when id % 2 = 1 then 'noir'<br />
else 'blanc' end from x;<br />
{|<br />
|+ Exemple Noir et Blanc<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
begin;<br />
update points set couleur = 'noir'<br />
where couleur = 'blanc';<br />
|-<br />
| ||<br />
begin;<br />
update points set couleur = 'blanc'<br />
where couleur = 'noir';<br />
À ce moment, une des deux transaction est condamnée à mourir.<br />
commit;<br />
Le premier à valider gagne.<br />
select * from points order by id;<br />
<br />
id | couleur<br />
----+-------<br />
1 | blanc<br />
2 | blanc<br />
3 | blanc<br />
4 | blanc<br />
5 | blanc<br />
6 | blanc<br />
7 | blanc<br />
8 | blanc<br />
9 | blanc<br />
10 | blanc<br />
(10 rows)<br />
Celle-ci s'est exécutée comme si elle était seule.<br />
|-<br />
| <br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
Une erreur de sérialisation. On annule et on réessaye.<br />
rollback;<br />
begin;<br />
update points set couleur = 'noir'<br />
where couleur = 'blanc';<br />
commit;<br />
Il n'y a pas de transaction concurrente pour gêner.<br />
select * from points order by id;<br />
<br />
id | couleur<br />
----+-------<br />
1 | noir<br />
2 | noir<br />
3 | noir<br />
4 | noir<br />
5 | noir<br />
6 | noir<br />
7 | noir<br />
8 | noir<br />
9 | noir<br />
10 | noir<br />
(10 rows)<br />
La transaction s'est exécutée seule, après l'autre.<br />
|}<br />
<br />
----<br />
==== Données en intersection ====<br />
<br />
Cet exemple est tiré de la documentation PostgreSQL. Deux transactions concurrentes lisent des données, et chacune utilise ces données pour mettre à jour l'ensemble lu par l'autre. Un exemple simple, même si un peu artificiel, de données faussées.<br />
<br />
L'exemple peut être mis en place avec ces ordres:<br />
CREATE TABLE mytab<br />
(<br />
class int NOT NULL,<br />
value int NOT NULL<br />
);<br />
INSERT INTO mytab VALUES<br />
(1, 10), (1, 20), (2, 100), (2, 200);<br />
{|<br />
|+ Exemple de données en intersection<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
BEGIN;<br />
SELECT SUM(value) FROM mytab WHERE class = 1;<br />
<br />
sum<br />
-----<br />
30<br />
(1 row)<br />
<br />
INSERT INTO mytab VALUES (2, 30);<br />
|-<br />
| ||<br />
BEGIN;<br />
SELECT SUM(value) FROM mytab WHERE class = 2;<br />
<br />
sum<br />
-----<br />
300<br />
(1 row)<br />
<br />
INSERT INTO mytab VALUES (1, 300);<br />
Chaque transaction a modifié ce que l'autre transaction aurait lu. Si les deux étaient autorisées à valider, le comportement sérialisable ne serait plus respecté, parce que si elles avaient été exécutées une seule à la fois, une des transactions aurait vu l'INSERT que l'autre a validé. Nous attendons qu'une des transactions ait validé avant d'annuler quoi que ce soit, toutefois, pour garantir que des traitements soient effectués et éviter que le système ne s'effondre.<br />
COMMIT;<br />
|-<br />
|<br />
COMMIT;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
Donc, maintenant nous annulons la transaction en échec et nous la réessayons depuis le début.<br />
ROLLBACK;<br />
BEGIN;<br />
SELECT SUM(value) FROM mytab WHERE class = 1;<br />
<br />
sum<br />
-----<br />
330<br />
(1 row)<br />
<br />
INSERT INTO mytab VALUES (2, 330);<br />
COMMIT;<br />
Cela réussit, et le résultat est cohérent avec une exécution sérialisée des transactions.<br />
SELECT * FROM mytab;<br />
<br />
class | value<br />
-------+-------<br />
1 | 10<br />
1 | 20<br />
2 | 100<br />
2 | 200<br />
1 | 300<br />
2 | 330<br />
(6 rows)<br />
|}<br />
<br />
----<br />
==== Protection contre le Découvert ====<br />
<br />
Le cas hypothétique est celui d'une banque qui autorise ses clients à retirer de l'argent jusqu'au total de tout ce qu'ils ont sur tous leurs comptes. La banque transfèrera ensuite automatiquement les fonds au besoin pour terminer la journée avec un solde positif sur chaque compte. À l'intérieur d'une seule transaction, on vérifie que la somme de tous les comptes dépasse la somme requise.<br />
<br />
Quelqu'un essaye d'être malin et de piéger la banque en soumettant deux retraits de 900$ sur deux comptes ayant chacun 500$ de solde simultanément. Au niveau d'isolation de transaction REPEATABLE READ, cela pourrait marcher; mais si le niveau d'isolation de transaction SERIALIZABLE est utilisé, SSI détectera une "structure dangereuse" dans le schéma de lecture/écriture et rejettera une des deux transactions.<br />
<br />
Cet exemple peut être mis en place avec ces ordres:<br />
<br />
create table compte<br />
(<br />
nom text not null,<br />
type text not null,<br />
solde money not null default '0.00'::money,<br />
primary key (nom, type)<br />
);<br />
insert into compte values<br />
('kevin','epargne', 500),<br />
('kevin','courant', 500);<br />
<br />
{|<br />
|+ Exemple de Protection contre le Découvert<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
begin;<br />
select type, solde from compte<br />
where nom = 'kevin';<br />
<br />
type | solde<br />
-----------+---------<br />
epargne | $500.00<br />
courant | $500.00<br />
(2 rows)<br />
Le total est de $1000, un retrait de $900 est donc permis.<br />
|-<br />
| ||<br />
begin;<br />
select type, solde from compte<br />
where nom = 'kevin';<br />
<br />
type | solde<br />
-----------+---------<br />
epargne | $500.00<br />
courant | $500.00<br />
(2 rows)<br />
Le total est de $1000, un retrait de $900 est donc permis.<br />
|-<br />
| <br />
update compte<br />
set solde = solde - 900::money<br />
where nom = 'kevin' and type = 'epargne';<br />
Jusqu'ici tout va bien.<br />
|-<br />
| ||<br />
update compte<br />
set solde = solde - 900::money<br />
where nom = 'kevin' and type = 'courant';<br />
Maintenant nous avons un problème. Cela ne peut co-exister avec l'activité de l'autre transaction. Nous n'annulons pas encore, parce que la transaction échouerait avec les mêmes conflits si on la réessayait. Le premier à valider va gagner, et l'autre échouera quand elle essayera de continuer après cela.<br />
|-<br />
| <br />
commit;<br />
Celle ci a validé la première. Son travail est enregistré.<br />
|-<br />
| ||<br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
Cette transaction n'a pas réussi à retirer l'argent.<br />
Maintenant nous l'annulons et réessayons la transaction.<br />
<br />
rollback;<br />
begin;<br />
select type, solde from compte<br />
where nom = 'kevin';<br />
<br />
type | solde<br />
-----------+----------<br />
epargne | -$400.00<br />
courant | $500.00<br />
(2 rows)<br />
On voit qu'il y a un solde net de $100. Cette demande de $900 sera rejetée par l'application.<br />
|}<br />
<br />
=== Trois Transactions ou Plus ===<br />
<br />
Des anomalies de sérialisation peuvent résulter de motifs plus complexes d'accès, impliquant trois transactions ou plus.<br />
<br />
----<br />
==== Couleurs Primaires ====<br />
<br />
C'est similaire à l'exemple "Blanc et Noir" précédent, à la différence que nous utilisons les trois couleurs primaires. Une transaction essaye de passer le rouge à jaune, la suivante le jaune au bleu, et la troisième le bleu au rouge. Si ces transactions étaient exécutées une seule à la fois, on aurait à la fin de l'exécution deux des trois couleurs, en fonction de l'ordre d'exécution. Si deux d'entre elles sont exécutées simultanément, celle essayant de lire les enregistrements mis à jour par l'autre semblera s'exécuter première, puisqu'elle ne verra pas le travail de l'autre transaction, il n'y a donc pas de problème dans ce cas. Que l'autre transaction soit exécutée avant ou après cela, les résultats sont cohérents avec un ordre d'exécution sérialisé.<br />
<br />
Si les trois s'exécutent en même temps, il y a un cycle dans l'ordre apparent d'exécution. Une transaction Repeatable Read ne détecterait pas cela, et la table aurait toujours trois couleurs. Une transaction Sérialisable détectera le problème et annulera une des transactions avec une erreur de sérialisation.<br />
<br />
L'exemple peut être mis en place avec ces ordres:<br />
create table points<br />
(<br />
id int not null primary key,<br />
couleur text not null<br />
);<br />
insert into points<br />
with x(id) as (select generate_series(1,9000))<br />
select id, case when id % 3 = 1 then 'rouge'<br />
when id % 3 = 2 then 'jaune'<br />
else 'blue' end from x;<br />
create index points_couleur on points (couleur);<br />
analyze points;<br />
{|<br />
|+ Primary Colors Example<br />
! session 1<br />
! session 2<br />
! session 3<br />
|-<br />
|<br />
begin;<br />
update points set couleur = 'jaune'<br />
where couleur = 'rouge';<br />
|-<br />
| ||<br />
begin;<br />
update points set couleur = 'blue'<br />
where couleur = 'jaune';<br />
|-<br />
| || ||<br />
begin;<br />
update points set couleur = 'rouge'<br />
where couleur = 'blue';<br />
À ce point, au moins une des trois transactions est condamnée. Pour garantir que les traitement progressent, on attend qu'une valide. Le commit va réussir, ce qui non seulement garantit que les traitements progressent, mais qu'une tentative de reprendre une transaction échouée n'échouera pas ''sur la même combinaison de transactions''.<br />
|-<br />
|<br />
commit;<br />
Le premier commit gagne. La session 2 doit échouer à ce point, parce que durant le commit il a été déterminé qu'elle a les plus grandes chances de réussir si réessayée immédiatement.<br />
select couleur, count(*) from points<br />
group by couleur<br />
order by couleur;<br />
<br />
couleur | count<br />
----------+-------<br />
blue | 3000<br />
jaune | 6000<br />
(2 rows)<br />
Cela semble avoir été exécuté avant les autres mises à jour.<br />
|-<br />
| || ||<br />
commit;<br />
Cela fonctionne si on l'essaye à ce moment. Si la session 2 effectue davantage de travail avant, cette transaction pourrait aussi devoir être annulée et réessayée.<br />
select couleur, count(*) from points<br />
group by couleur<br />
order by couleur;<br />
<br />
couleur | count<br />
----------+-------<br />
rouge | 3000<br />
jaune | 6000<br />
(2 rows)<br />
Elle semble s'être exécutée après la transaction de la session 1.<br />
|-<br />
| ||<br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
Une erreur de sérialisation. Nous annulons et réessayons.<br />
rollback;<br />
begin;<br />
update points set couleur = 'blue'<br />
where couleur = 'jaune';<br />
commit;<br />
Une nouvelle tentative réussira.<br />
select couleur, count(*) from points<br />
group by couleur<br />
order by couleur;<br />
<br />
couleur | count<br />
---------+-------<br />
blue | 6000<br />
rouge | 3000<br />
(2 rows)<br />
Elle semble s'être exécutée en dernier, ce qu'elle a d'ailleurs fait.<br />
|}<br />
Un point intéressant est que si la session 2 avait tenté de valider après la session 1 et avant la session 3, elle aurait tout de même échoué, et une re-tentative aurait aussi réussi, mais le comportement de la transaction de la session 3 n'est pas déterministe. Elle pourrait avoir réussi, ou avoir reçu une erreur de sérialisation et avoir nécessité d'être rejouée.<br />
<br />
C'est parce que le verrouillage de prédicat utilisé par le mécanisme de détection de conflit s'appuie sur les pages et enregistrement effectivement accédés, et il y a un facteur aléatoire utilisé lors de l'insertion des entrées d'index qui ont des clés égales, afin de réduire la contention; donc même avec des séquences d'évènements identiques il est toujours possible de voir des différences sur où les erreurs de sérialisation se produisent. C'est pour cela qu'il est important, quand on s'appuie sur les transactions sérialisables pour gérer la concurrence, d'avoir un système généralisé permettant d'identifier les erreurs de sérialisation et de rejouer les transactions depuis leur début.<br />
<br />
Il convient aussi de noter que si la session 2 avait validé la seconde tentative de transaction avant que la session 3 ait validé sa transaction, toute requête ultérieure qui aurait vu des enregistrements mis à jour de jaune à bleu (et validés) aurait, de façon déterministe, fait échouer la transaction de la session 3, parce que ces enregistrements ne seraient pas des enregistrements que la session 3 verraient comme bleu et mettraient à jour à rouge. Pour que la transaction 3 réussisse, elle doit pouvoir être considérée comme ayant été exécutée avant la transaction validée de la session 2. Par conséquent, exposer un état dans lequel le travail de la transaction de la session 2 est visible, mais pas le travail de la transaction de la session 3 signifie que la transaction de la session 3 doit échouer. L'acte d' ''observer'' un état récemment modifié de la base peut entraîner des erreurs de sérialisation. Cela sera exploré plus avant dans d'autres exemples.<br />
<br />
=== Mettre en place des règles métier dans des triggers ===<br />
<br />
Si toutes les transactions sont sérialisables, des règles métier peuvent être vérifiées par des triggers sans les problèmes associés avec les autres niveaux d'isolation de transactions. Quand une contrainte déclarative fonctionne, elle sera en règle générale plus rapide, plus simple à implémenter et à maintenir, et moins sujette à bug - les triggers ne devront donc être utilisés comme suit que quand une contrainte déclarative ne fonctionnera pas.<br />
<br />
----<br />
==== Contraintes similaires à de l'unicité ====<br />
<br />
Imaginons que vous vouliez quelque chose de similaire à une contrainte unique, mais en un peu plus compliqué. Pour cet exemple, nous voulons l'unicité des six premiers caractères de la colonne texte.<br />
<br />
Cet exemple peut être mis en place avec les ordres suivants:<br />
create table t (id int not null, val text not null);<br />
with x (n) as (select generate_series(1,10000))<br />
insert into t select x.n, md5(x.n::text) from x;<br />
alter table t add primary key(id);<br />
create index t_val on t (val);<br />
vacuum analyze t;<br />
create function t_func()<br />
returns trigger<br />
language plpgsql as $$<br />
declare<br />
st text;<br />
begin<br />
st := substring(new.val from 1 for 6);<br />
if tg_op = 'UPDATE' and substring(old.val from 1 for 6) = st then<br />
return new;<br />
end if;<br />
if exists (select * from t where val between st and st || 'z') then<br />
raise exception 't.val pas unique sur les six premiers caractères: "%"', st;<br />
end if;<br />
return new;<br />
end;<br />
$$;<br />
create trigger t_trig<br />
before insert or update on t<br />
for each row execute procedure t_func();<br />
<br />
Pour vérifier que le trigger fait bien respecter la règle métier quand il n'y a pas de problème de concurrence, sur une connexion unique:<br />
<br />
insert into t values (-1, 'this old dog');<br />
insert into t values (-2, 'this old cat');<br />
<br />
ERROR: t.val pas unique sur les six premiers caractères: "this o"<br />
<br />
Essayons maintenant avec deux sessions concurrentes.<br />
<br />
{|<br />
|+ Exemple de contrainte similaire à de l'unicité<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
begin;<br />
insert into t values (-3, 'the river flows');<br />
|-<br />
| ||<br />
begin;<br />
insert into t values (-4, 'the right stuff');<br />
Cela fonctionne pour le moment, parce que le travail de l'autre transaction n'est pas visible de cette transaction, mais les deux transactions ne peuvent pas valider sans violer la règle métier.<br />
commit;<br />
Le premier à valider gagne. La transaction est garantie.<br />
|-<br />
| <br />
Un commit ici échouerait, ainsi que n'importe quel autre ordre qu'on tenterait d'exécuter dans cette transaction condamnée.<br />
select * from t where id < 0;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Canceled on identification as a pivot,<br />
during conflict out checking.<br />
HINT: The transaction might succeed if retried.<br />
<br />
Comme il s'agit d'une erreur de sérialisation, la transaction devrait être réessayée.<br />
<br />
rollback;<br />
begin;<br />
insert into t values (-3, 'the river flows');<br />
<br />
Lors de la nouvelle tentative, nous recevons une erreur plus utile à l'utilisateur.<br />
<br />
ERROR: t.val pas unique sur les six premiers caractères: "the ri"<br />
|}<br />
<br />
----<br />
==== Contraintes similaires à des clés étrangères ====<br />
<br />
Quelquefois deux tables doivent avoir un lien très similaire à une relation de clé étrangère, mais il y a des critères supplémentaires qui rendrait la clé étrangère insuffisante à traiter la vérification d'intégrité nécessaire. Dans cet exemple un table project contient une référence à la clé d'une table person dans sa propre colonne project_manager, mais une personne ''quelconque'' ne suffira pas; la personne spécifiée doit être un gestionnaire de projet.<br />
<br />
On peut mettre en place cet exemple avec les ordres suivants:<br />
create table person<br />
(<br />
person_id int not null primary key,<br />
person_name text not null,<br />
is_project_manager boolean not null<br />
);<br />
create table project<br />
(<br />
project_id int not null primary key,<br />
project_name text not null,<br />
project_manager int not null<br />
);<br />
create index project_manager<br />
on project (project_manager);<br />
<br />
create function person_func()<br />
returns trigger<br />
language plpgsql as $$<br />
begin<br />
if tg_op = 'DELETE' and old.is_project_manager then<br />
if exists (select * from project<br />
where project_manager = old.person_id) then<br />
raise exception<br />
'une personne ne peut être supprimée tant qu''elle est responsable d''un projet';<br />
end if;<br />
end if;<br />
if tg_op = 'UPDATE' then<br />
if new.person_id is distinct from old.person_id then<br />
raise exception 'il est interdit de modifier person_id';<br />
end if;<br />
if old.is_project_manager and not new.is_project_manager then<br />
if exists (select * from project<br />
where project_manager = old.person_id) then<br />
raise exception<br />
'une personne doit rester gestionnaire de projet tant qu''elle est responsable d''un projet';<br />
end if;<br />
end if;<br />
end if;<br />
if tg_op = 'DELETE' then<br />
return old;<br />
else<br />
return new;<br />
end if;<br />
end;<br />
$$;<br />
create trigger person_trig<br />
before update or delete on person<br />
for each row execute procedure person_func();<br />
<br />
create function project_func()<br />
returns trigger<br />
language plpgsql as $$<br />
begin<br />
if tg_op = 'INSERT'<br />
or (tg_op = 'UPDATE' and new.project_manager <> old.project_manager) then<br />
if not exists (select * from person<br />
where person_id = new.project_manager<br />
and is_project_manager) then<br />
raise exception<br />
'project_manager doit être défini en tant que gestionnaire de projet dans la table person';<br />
end if;<br />
end if;<br />
return new;<br />
end;<br />
$$;<br />
create trigger project_trig<br />
before insert or update on project<br />
for each row execute procedure project_func();<br />
<br />
insert into person values (1, 'Kevin Grittner', true);<br />
insert into person values (2, 'Peter Parker', true);<br />
insert into project values (101, 'parallel processing', 1);<br />
{|<br />
|+ Exemple de contrainte similaire à une contrainte de clé étrangère<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
Une personne est mise à jour pour ne plus être un gestionnaire de projet.<br />
begin;<br />
update person<br />
set is_project_manager = false<br />
where person_id = 2;<br />
|-<br />
| ||<br />
En même temps, un projet est mis à jour afin que de rendre cette personne responsable de ce projet.<br />
begin;<br />
update project<br />
set project_manager = 2<br />
where project_id = 101;<br />
Il n'est pas possible de valider les deux. Le premier à valider gagne.<br />
commit;<br />
L'affectation de la personne au projet valide d'abord, ce qui entraîne que l'autre transaction doit maintenant échouer. Si l'autre transaction s'était exécuté à un autre niveau d'isolation, les deux transactions auraient validé, entraînant une violation des règles métier.<br />
|-<br />
| <br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
A serialization failure. We roll back and try again.<br />
rollback;<br />
begin;<br />
update person<br />
set is_project_manager = false<br />
where person_id = 2;<br />
<br />
ERROR: une personne doit rester gestionnaire de <br />
projet tant qu'elle est responsable d'un projet<br />
Lors de la seconde tentative, nous récupérons un message intelligible.<br />
|}<br />
<br />
<br />
=== Transactions en Lecture Seule ===<br />
<br />
Bien qu'une transaction en lecture seule ne puisse contribuer à une anomalie qui persiste dans la base, dans le mode Repeatable Read implémenté par le SSI, elle peut "voir" un état qui n'est pas cohérent avec l'exécution sérialisée (une à la fois) des transactions. Une transaction Serializable implémentée avec SSI ne verra jamais ces anomalies transitoires.<br />
<br />
----<br />
<br />
==== Rapport de Dépôt ====<br />
<br />
Une classe générale de problèmes invoquant des transactions en lecture seule est le traitement par lots, où une table contrôle quel lot (batch) est actuellement la cible des insertions. Un lot est fermé en mettant à jour la table de contrôle, point à partir duquel le lot est considéré comme "verrouillé" contre tout changement ultérieur, et le traitement de ce lot se produit.<br />
<br />
Ce genre de problématique peut être trouvé de façon concrète dans le traitement de reçus. Des reçus peuvent être ajoutés à un lot identifié par une date de dépôt, ou (si plus d'un dépôt par jour est possible) un numéro de lot de reçu abstrait. Un un point durant la journée, alors que la banque est toujours ouverte, le lot est fermé, un rapport de l'argent reçu est imprimé, et l'argent est emmené à la banque pour y être déposé.<br />
<br />
L'exemple peut être mis en place avec ces ordres:<br />
create table control<br />
(<br />
deposit_no int not null<br />
);<br />
insert into control values (1);<br />
create table receipt<br />
(<br />
receipt_no serial primary key,<br />
deposit_no int not null,<br />
payee text not null,<br />
amount money not null<br />
);<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values ((select deposit_no from control), 'Crosby', '100');<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values ((select deposit_no from control), 'Stills', '200');<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values ((select deposit_no from control), 'Nash', '300');<br />
{|<br />
|+ Exemple de Rapport de Dépôt<br />
! session 1<br />
! session 2<br />
|-<br />
|<br />
Au comptoir de réception, un autre reçu est ajouté au lot courant.<br />
begin; -- T1<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values<br />
(<br />
(select deposit_no from control),<br />
'Young', '100'<br />
);<br />
Cette transaction peut voir son propre insert, mais il n'est pas visible pour les autres transactions jusqu'à sa validation.<br />
select * from receipt;<br />
<br />
receipt_no | deposit_no | payee | amount <br />
------------+------------+--------+---------<br />
1 | 1 | Crosby | $100.00<br />
2 | 1 | Stills | $200.00<br />
3 | 1 | Nash | $300.00<br />
4 | 1 | Young | $100.00<br />
(4 rows)<br />
|-<br />
| ||<br />
À peu près au même moment, un superviseur clique sur un bouton pour fermer le lot de reçus.<br />
begin; -- T2<br />
select deposit_no from control;<br />
<br />
deposit_no <br />
------------<br />
1<br />
(1 row)<br />
L'application note le lot de reçus qui est sur le point d'être fermé, incrémente le numéro de lot, et l'enregistre dans la table de contrôle.<br />
update control set deposit_no = 2;<br />
commit;<br />
T1, la transaction qui insère le dernier reçu du dernier lot, n'a pas encore validé, bien que le lot ait été fermé. Si T1 valide avant que quelqu'un ne regarde le contenu du lot, tout va bien. Pour le moment nous n'avons aucun problème; le reçu "a l'air" d'avoir été ajouté avant que le lot ait été fermé. Nous avons un comportement qui est cohérent avec une exécution "une par une" des transactions: T1 -> T2.<br />
<br />
Pour le besoin de la démonstration, nous allons déclencher le rapport de dépôt avant que le dernier reçu ne soit validé.<br />
begin; -- T3<br />
select * from receipt where deposit_no = 1;<br />
<br />
receipt_no | deposit_no | payee | amount <br />
------------+------------+--------+---------<br />
1 | 1 | Crosby | $100.00<br />
2 | 1 | Stills | $200.00<br />
3 | 1 | Nash | $300.00<br />
(3 rows)<br />
Maintenant nous avons un problème. T3 a été démarré en sachant que T2 a été validée, donc T3 doit être considérée comme ayant exécutée avant T2. (cela aurait pu aussi être vrai si T3 avait été lancé indépendamment et avait lu la table de contrôle, voyant le nouveau deposit_no.) Mais T3 ne peut pas voir le travail de T1, donc T1 a l'air d'avoir été exécuté après T3. Nous avons donc une boucle T1 -> T2 -> T3 -> T1. Et cela poserait problème en termes pratiques; le lot est censé être fermé et immuable, mais une modification apparaîtra sur le tard -- peut être après le voyage à la banque.<br />
<br />
Au niveau d'isolation REPEATABLE READ cela se déroulerait sans message d'erreur, sans que l'anomalie ne soit détectée. Au niveau d'isolation SERIALIZABLE une des transactions serait annulée pour préserver l'intégrité du système. Puisqu'une annulation de T3 entraînerait à nouveau la même erreur si T2 était encore active, PostgreSQL va annuler T2, pour qu'une nouvelle tentative ayant lieu immédiatement puisse réussir.<br />
|-<br />
| <br />
commit;<br />
<br />
ERROR: could not serialize access<br />
due to read/write dependencies<br />
among transactions<br />
DETAIL: Cancelled on identification<br />
as a pivot, during commit attempt.<br />
HINT: The transaction might succeed if retried.<br />
OK, let's retry.<br />
rollback;<br />
begin; -- T1 retry<br />
insert into receipt<br />
(deposit_no, payee, amount)<br />
values<br />
(<br />
(select deposit_no from control),<br />
'Young', '100'<br />
);<br />
<br />
À quoi ressemble la table reçu maintenant?<br />
<br />
select * from receipt;<br />
<br />
receipt_no | deposit_no | payee | amount <br />
------------+------------+--------+---------<br />
1 | 1 | Crosby | $100.00<br />
2 | 1 | Stills | $200.00<br />
3 | 1 | Nash | $300.00<br />
5 | 2 | Young | $100.00<br />
(4 rows)<br />
<br />
Le reçu est maintenant dans le nouveau lot, rendant le rapport de dépôt de T3 correct!<br />
<br />
commit;<br />
<br />
Plus de problème maintenant.<br />
|-<br />
| ||<br />
commit;<br />
Cela n'aurait posé aucun problème à n'importe quel moment après le SELECT de T3.<br />
|}<br />
<br />
[[Category:Français]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.1/fr&diff=15279What's new in PostgreSQL 9.1/fr2011-08-26T08:03:20Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<span style="font-size:188%;color:#E65600">Quoi de neuf dans PostgreSQL 9.1</span><br />
<br />
Ce document présente, si possible par l'exemple, un grand nombre des nouveautés de PostgreSQL 9.1, comparé à la version majeure précédente - PostgreSQL 9.0. Il y a de nombreuses nouveautés dans cette version, cette page de wiki ne couvre donc que les changements les plus importants en détail. La liste complète des modifications se trouve dans le chapitre [http://docs.postgresql.fr/9.1/release.html Notes de version] de la documentation officielle.<br />
<br />
===Nouveautés majeures===<br />
<br />
==Réplication synchrone et autres fonctionnalités de réplication==<br />
<br />
Il y a un certain nombre de nouvelles fonctionnalités autour de la réplication en 9.1:<br />
<br />
<br />
* En 9.0, l'utilisateur servant à la réplication devait être superutilisateur. Ce n'est plus le cas, il y a un nouvel attribut appelé 'replication'.<br />
<br />
CREATE ROLE replication_role REPLICATION LOGIN PASSWORD 'pwd_replication'.<br />
<br />
Ce rôle peut alors être ajouté au pg_hba.conf, et être utilisé pour la streaming replication. C'est évidemment préférable, d'un point de vue sécurité, que d'avoir un rôle superutilisateur dédié à cela.<br />
<br />
Maintenant que nous avons une instance créée, ainsi qu'un utilisateur de réplication, nous pouvons mettre en place la streaming replication. Il ne s'agit que d'ajouter la permission de se connecter à la base virtuelle 'replication' dans "pg_hba.conf", positionner ''wal_level'', l'archivage (''archive_mode'' et ''archive_command'') et ''max_wal_senders'', ce qui est déjà traité dans le billet sur les nouveautés de la 9.0.<br />
<br />
Quand l'instance est prête pour le streaming, nous pouvons montrer la seconde nouveauté.<br />
<br />
* pg_basebackup.<br />
<br />
Ce nouvel outil permet de cloner une base, ou en faire une sauvegarde, en n'utilisant que le protocole réseau PostgreSQL. Il n'y a pas besoin d'appeler "pg_start_backup()", puis réaliser une copie manuelle et enfin appeler "pg_stop_backup()". pg_basebackup effectue tout ce travail en une seule commande. Pour la démonstration, nous allons cloner l'instance en cours de fonctionnement vers /tmp/newcluster.<br />
<br />
> pg_basebackup -D /tmp/newcluster -U replication -v<br />
Password: <br />
NOTICE: pg_stop_backup complete, all required WAL segments have been archived<br />
pg_basebackup: base backup completed<br />
<br />
Cette nouvelle instance est prête à démarrer: ajoutez simplement un fichier "recovery.conf" avec une "restore_command" pour récupérer les fichiers archivés, et démarrez la nouvelle instance. pg_basebackup peut aussi fabriquer un tar, ou inclure tous les fichiers xlog requis (pour avoir une sauvegarde totalement autonome).<br />
<br />
Comme nous allons maintenant montrer la réplication synchrone, préparons un "recovery.conf" pour se connecter à la base maître et récupérer les enregistrements au fil de l'eau.<br />
<br />
Le fichier va ressembler à ceci<br />
<br />
restore_command = 'cp /tmp/%f %p'<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=59121 user=replication password=replication application_name=newcluster'<br />
trigger_file = '/tmp/trig_f_newcluster'<br />
<br />
Puis nous démarrons la nouvelle instance:<br />
<br />
pg_ctl -D /tmp/newcluster start<br />
<br />
LOG: database system was interrupted; last known up at 2011-05-22 17:15:45 CEST<br />
LOG: entering standby mode<br />
LOG: restored log file "00000001000000010000002F" from archive<br />
LOG: redo starts at 1/2F000020<br />
LOG: consistent recovery state reached at 1/30000000<br />
LOG: database system is ready to accept read only connections<br />
cp: cannot stat « /tmp/000000010000000100000030 »: No such file or directory<br />
LOG: streaming replication successfully connected to primary<br />
<br />
Nous avons notre esclave, et il récupère les données provenant du maître par le mode «streaming», mais nous sommes toujours en asynchrone. Notez que nous avons positionné un paramètre "application_name" dans la chaîne de connexion du "recovery.conf".<br />
<br />
* Réplication synchrone<br />
<br />
Pour que la réplication devienne synchrone, c'est très simple, il suffit de positionner ceci dans le postgresql.conf du maître:<br />
<br />
synchronous_standby_names = 'newcluster'<br />
<br />
C'est bien sûr l'"application_name" provenant du "primary_conninfo" de l'esclave. Un «pg_ctl_reload», et le nouveau paramètre est pris en compte. Maintenant, tout «COMMIT» sur le maître ne sera considéré comme terminé que quand l'esclave l'aura écrit sur son propre journal, et l'aura notifié au maître.<br />
<br />
Un petit avertissement: les transactions sont considérées comme validées quand elles sont écrites dans le journal de l'esclave, pas quand elles sont visibles sur l'esclave. Cela veut dire qu'il y a toujours un délai entre le moment où une transaction est validée sur le maître, et le moment où elle est visible sur l'esclave. La réplication est tout de même synchrone: vous ne perdrez pas de données dans le cas du crash d'un maître.<br />
<br />
La réplication synchrone peut être réglée assez finement: elle est contrôlable par session. Le paramètre "synchronous_commit" peut être désactivé (il est évidemment actif par défaut) par session, si celle-ci n'a pas besoin de cette garantie de réplication synchrone. Si, dans votre transaction, vous n'avez pas besoin de la réplication synchrone, faites simplement<br />
SET synchronous_commit TO off<br />
et vous ne paierez pas la pénalité due à l'attente de l'esclave.<br />
<br />
Il y a quelques autres nouveautés à mentionner pour la réplication:<br />
<br />
* Les esclaves peuvent maintenant demander au maître de ne pas nettoyer par VACUUM les enregistrements dont ils pourraient encore avoir besoin.<br />
<br />
C'était une des principales difficultés du paramétrage de la réplication en 9.0, si on souhaitait utiliser l'esclave: un VACUUM pouvait détruire des enregistrements qui étaient encore nécessaires à l'exécution des requêtes de l'esclave, engendrant des conflits de réplication. L'esclave avait alors à faire un choix: soit tuer la requête en cours d'exécution, soit accepter de retarder l'application des modifications générées par le VACUUM (et toutes celles qui le suivent bien sûr), et donc prendre du retard. On pouvait contourner le problème, en positionnant "vacuum_defer_cleanup_age" à une valeur non nulle, mais c'était difficile de trouver une bonne valeur. La nouvelle fonctionnalité est activée en positionnant "hot_standby_feedback", sur les bases de standby. Bien sûr, cela entraîne que la base de standby va pouvoir empêcher VACUUM de faire son travail de maintenance sur le maître, s'il y a des requêtes très longues qui s'exécutent sur l'esclave.<br />
<br />
* pg_stat_replication est une nouvelle vue système.<br />
<br />
Elle affiche, sur le maître, l'état de tous les esclaves: combien de WAL ils ont reçu, s'ils sont connectés, synchrones, où ils en sont de l'application des modifications:<br />
<br />
=# SELECT * from pg_stat_replication ;<br />
procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state <br />
---------+----------+-------------+------------------+-------------+-----------------+-------------+------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------<br />
17135 | 16671 | replication | newcluster | 127.0.0.1 | | 43745 | 2011-05-22 18:13:04.19283+02 | streaming | 1/30008750 | 1/30008750 | 1/30008750 | 1/30008750 | 1 | sync<br />
<br />
Il n'est dont plus nécessaire d'exécuter des requêtes sur les esclaves pour connaître leur état par rapport au maître.<br />
<br />
* pg_stat_database_conflicts est une autre vue système.<br />
<br />
Celle ci est sur la base de standby, et montre combien de requêtes ont été annulées, et pour quelles raisons:<br />
<br />
=# SELECT * from pg_stat_database_conflicts ;<br />
datid | datname | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock <br />
-------+-----------+------------------+------------+----------------+-----------------+----------------<br />
1 | template1 | 0 | 0 | 0 | 0 | 0<br />
11979 | template0 | 0 | 0 | 0 | 0 | 0<br />
11987 | postgres | 0 | 0 | 0 | 0 | 0<br />
16384 | marc | 0 | 0 | 1 | 0 | 0<br />
<br />
* la réplication peut maintenant être mise en pause sur un esclave.<br />
<br />
Appelez tout simplement ''pg_xlog_replay_pause()'' pour mettre en pause, et ''pg_xlog_replay_resume()'' pour reprendre. Cela gèlera la base, ce qui en fait un excellent outil pour réaliser des sauvegardes cohérentes.<br />
<br />
''pg_is_xlog_replay_paused()'' permet de connaître l'état actuel.<br />
<br />
On peut aussi demander à PostgreSQL de mettre l'application des journaux en pause à la fin de la récupération d'instance, sans passer la base en production, pour permettre à l'administrateur d'exécuter des requêtes sur la base. L'administrateur peut alors vérifier si le point de récupération atteint est correct, avant de mettre fin à la réplication. Ce nouveau paramètre est "pause_at_recovery_target", et se positionne dans le recovery.conf.<br />
<br />
* On peut créer des points de récupération (Restore Points)<br />
<br />
Ce ne sont rien de plus que des points nommés dans le journal de transactions.<br />
<br />
Il peuvent être utilisés en spécifiant un "recovery_target_name" à la place d'un "recovery_target_time" ou un "recovery_target_xid" dans le fichier recovery.conf.<br />
<br />
Ils sont créés en appelant "pg_create_restore_point()".<br />
<br />
==Collations par colonne==<br />
<br />
L'ordre de collation n'est plus unique dans une base de données.<br />
<br />
Imaginons que vous utilisiez une base en 9.0, avec un encodage UTF8, et une collation de_DE.utf8 (tri alphabétique), parce que la plupart de vos utilisateurs parlent allemand. Si vous avez des données françaises à stocker aussi, et que vous avez besoin de les trier, les utilisateurs français les plus pointilleux ne seraient pas satisfaits:<br />
<br />
SELECT * from (values ('élève'),('élevé'),('élever'),('Élève')) as tmp order by column1;<br />
column1 <br />
---------<br />
élevé<br />
élève<br />
Élève<br />
élever<br />
<br />
Pour être honnête, ce n'est pas si mal. Mais ce n'est pas l'ordre alphabétique français: les caractères accentués sont considérés comme non accentués durant une première passe de tri. Ensuite, on effectue une seconde passe, où on considère que les caractères accentués sont après les non accentués (dans un ordre bien précis). Mais, pour que la chose soit plus amusante, le tri est fait du dernier au premier caractère dans cette seconde passe, et non plus du premier au dernier. Évidemment, la règle n'est pas la même en allemand.<br />
<br />
En 9.1, vous disposez de deux nouvelles fonctionnalités:<br />
<br />
* Vous pouvez spécifier la collation dans une requête:<br />
<br />
SELECT * FROM (VALUES ('élève'),('élevé'),('élever'),('Élève')) AS tmp ORDER BY column1 COLLATE "fr_FR.utf8";<br />
column1 <br />
---------<br />
élève<br />
Élève<br />
élevé<br />
élever<br />
<br />
* Vous pouvez définir la collation au moment de la déclaration de la table:<br />
<br />
CREATE TABLE french_messages (message TEXT COLLATE "fr_FR.utf8");<br />
INSERT INTO french_messages VALUES ('élève'),('élevé'),('élever'),('Élève');<br />
SELECT * FROM french_messages ORDER BY message;<br />
message <br />
---------<br />
élève<br />
Élève<br />
élevé<br />
élever<br />
<br />
Et bien sûr, vous pouvez créer un index sur la colonne message, qui pourra être utilisé pour trier rapidement en français. Par exemple, avec une table plus grande et sans collation précisée:<br />
<br />
CREATE TABLE french_messages2 (message TEXT);<br />
INSERT INTO french_messages2 SELECT * FROM french_messages, generate_series(1,100000); -- 400k lignes<br />
CREATE INDEX idx_french_ctype ON french_messages2 (message COLLATE "fr_FR.utf8");<br />
EXPLAIN SELECT * FROM french_messages2 ORDER BY message;<br />
QUERY PLAN <br />
-------------------------------------------------------------------------------<br />
Sort (cost=62134.28..63134.28 rows=400000 width=32)<br />
Sort Key: message<br />
-> Seq Scan on french_messages2 (cost=0.00..5770.00 rows=400000 width=32)<br />
<br />
EXPLAIN SELECT * FROM french_messages2 ORDER BY message COLLATE "fr_FR.utf8";<br />
QUERY PLAN <br />
--------------------------------------------------------------------------------------------------<br />
Index Scan using idx_french_ctype on french_messages2 (cost=0.00..17139.15 rows=400000 width=8)<br />
<br />
==Unlogged Tables/Tables non journalisées==<br />
<br />
Ces tables peuvent être utilisées pour stocker des données éphémères. Une table non journalisée est bien plus rapide à écrire, mais elle ne survivra pas à un crash (elle sera tronquée au redémarrage de l'instance en cas de crash).<br />
<br />
Elles n'ont pas pas le coût de maintenance associé à la journalisation, elles sont donc bien plus rapides à écrire.<br />
<br />
Voici un exemple (idiot, l'exemple):<br />
<br />
# CREATE TABLE test (a int);<br />
CREATE TABLE<br />
# CREATE UNLOGGED table testu (a int);<br />
CREATE TABLE<br />
# CREATE INDEX idx_test on test (a);<br />
CREATE INDEX<br />
# CREATE INDEX idx_testu on testu (a);<br />
CREATE INDEX<br />
=# \timing <br />
Timing is on.<br />
=# INSERT INTO test SELECT generate_series(1,1000000);<br />
INSERT 0 1000000<br />
Time: 17601,201 ms<br />
=# INSERT INTO testu SELECT generate_series(1,1000000);<br />
INSERT 0 1000000<br />
Time: 3439,982 ms<br />
<br />
Elles sont donc très efficace pour des données de cache, ou pour n'importe quoi qui puisse être reconstruit après un crash.<br />
<br />
==Extensions==<br />
<br />
Ce point et le suivant sont l'occasion de présenter plusieurs fonctionnalités d'un coup. Nous allons commencer par installer pg_trgm, et c'est maintenant une extension.<br />
<br />
Installons donc pg_trgm. Jusqu'à la 9.0, nous devions lancer un script manuellement. La commande ressemblait à ceci:<br />
<br />
\i /usr/local/pgsql/share/contrib/pg_trgm.sql<br />
<br />
Cela entraînait des problèmes de maintenance: les fonctions créées allaient par défaut dans le schéma public, elles étaient envoyées telles quelles dans les fichiers pg_dump, ne se restauraient souvent pas bien, puisqu'elles dépendaient souvent d'objets binaires externes, ou pouvaient changer de définition entre les différentes versions de PostgreSQL.<br />
<br />
Avec la 9.1, vous pouvez utiliser la commande CREATE EXTENSION:<br />
<br />
CREATE EXTENSION [ IF NOT EXISTS ] extension_name<br />
[ WITH ] [ SCHEMA schema ]<br />
[ VERSION version ]<br />
[ FROM old_version ]<br />
<br />
Les options les plus importantes sont "extension_name", bien sûr, et "schema": les extensions peuvent être stockées dans un schéma.<br />
<br />
Installons donc pg_trgm, pour l'exemple qui va suivre:<br />
<br />
=# CREATE schema extensions;<br />
CREATE SCHEMA<br />
<br />
=# CREATE EXTENSION pg_trgm WITH SCHEMA extensions;<br />
CREATE EXTENSION<br />
<br />
Maintenant, pg_trgm est installé dans un schéma "extensions". Il sera inclus dans les exports de base correctement, avec la syntaxe CREATE EXTENSION. Par conséquent, si quelque chose change dans l'extension, elle sera restaurée avec la nouvelle définition.<br />
<br />
La liste des extensions peut être obtenue comme suit dans psql:<br />
\dx<br />
List of installed extensions<br />
Name | Version | Schema | Description <br />
----------+---------+------------+-------------------------------------------------------------------<br />
pg_trgm | 1.0 | extensions | text similarity measurement and index searching based on trigrams<br />
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language<br />
(2 rows)<br />
<br />
<br />
==K-Nearest-Neighbor Indexing/Indexation des k plus proches voisins==<br />
<br />
Les index GIST peuvent maintenant être utilisés pour retourner des enregistrements triés, si la notion de distance a une signification pour ces données, et qu'on peut en fournir une définition. Pour le moment, ce travail a été effectué pour le type 'point', l'extension 'pg_trgm' et plusieurs types de données btree_gist. Cette fonctionnalité est fournie à tous les types de données, il y en aura donc probablement d'autres qui l'implémenteront dans un futur proche.<br />
<br />
Pour l'heure, voici donc un exemple avec pg_trgm. pg_trgm utilise des trigrammes pour comparer des chaînes. Voici les trigrammes pour la chaîne 'hello':<br />
<br />
SELECT show_trgm('hello');<br />
show_trgm <br />
---------------------------------<br />
{" h"," he",ell,hel,llo,"lo "}<br />
<br />
Les trigrammes sont utilisés pour évaluer la similarité (entre 0 et 1) entre des chaînes. Il y a donc une notion de distance, et on peut la définir par '1-similarité'.<br />
<br />
Voici un exemple. La table contient 5 millions d'enregistrements et pèse 750Mo.<br />
<br />
CREATE TABLE test_trgm ( text_data text);<br />
<br />
CREATE INDEX test_trgm_idx on test_trgm using gist (text_data extensions.gist_trgm_ops);<br />
<br />
Jusqu'à la 9.0, si nous voulons les deux text_data les plus proches de 'hello' dans la table, la requête était celle-ci:<br />
<br />
SELECT text_data, similarity(text_data, 'hello')<br />
FROM test_trgm <br />
WHERE text_data % 'hello'<br />
ORDER BY similarity(text_data, 'hello')<br />
LIMIT 2;<br />
<br />
Sur cette base de test, il faut environ 2 secondes pour obtenir le résultat.<br />
<br />
Avec 9.1 et la nouvelle fonctionnalité KNN, on peut l'écrire comme ceci:<br />
<br />
SELECT text_data, text_data <-> 'hello'<br />
FROM test_trgm <br />
ORDER BY text_data <-> 'hello'<br />
LIMIT 2;<br />
<br />
L'opérateur <-> est l'opérateur de distance. La requête dure 20 millisecondes, et passe par l'index pour récupérer directement les deux meilleurs enregistrements.<br />
<br />
Tant que nous parlons de pg_trgm, une autre fonctionnalité apparaissant en 9.1 est que les opérateurs LIKE et ILIKE peuvent maintenant utiliser automatiquement un index trgm. Toujours sur la même table:<br />
<br />
SELECT text_data<br />
FROM test_trgm<br />
WHERE text_data like '%hello%';<br />
<br />
utilise l'index test_trgm_idx (au lieu de parcourir la table entière).<br />
<br />
Attention tout de même: les index trgm sont très volumineux, et coûteux à maintenir.<br />
<br />
==Serializable Snapshot Isolation/Isolation par Instantanés Sérialisable/SSI==<br />
<br />
Cette fonctionnalité est très utile si vous avez besoin que toutes vos transactions se comportent comme si elles s'exécutaient les unes après les autres, sans trop sacrifier les performances, comme c'est le cas pour la plupart des implémentations d'isolation «sérialisable» actuelles (elles s'appuient habituellement sur le verrouillage de tous les enregistrements accédés).<br />
<br />
Comme cette fonctionnalité est complexe à montrer et à expliquer, voici un lien vers l'explication complète de cette fonctionnalité: http://wiki.postgresql.org/wiki/SSI/fr<br />
<br />
==Writeable Common Table Expressions/Expression de Table Commune en Écriture==<br />
<br />
Cela étend la syntaxe WITH introduite en 8.4. Dorénavant, des requêtes de modification de données peuvent être utilisées dans la partie WITH de la requête, et les données retournées par cet ordre utilisées.<br />
<br />
Imaginons que nous voulons archiver tous les enregistrements correspondant à %hello% de la table test_trgm:<br />
<br />
CREATE TABLE old_text_data (text_data text);<br />
<br />
WITH deleted AS (DELETE FROM test_trgm WHERE text_data like '%hello%' RETURNING text_data)<br />
INSERT INTO old_text_data SELECT * FROM deleted;<br />
<br />
Tout en une seule requête (donc en une seule passe sur test_trgm).<br />
<br />
Voici un exemple plus ambitieux. Cette requête met à jour une base de données pgbench, en enlevant un groupe de transactions erronées et en mettant à jour les totaux de teller, branch et account, en un seul ordre:<br />
<br />
WITH deleted_xtns AS (<br />
DELETE FROM pgbench_history<br />
WHERE bid = 4 and tid = 9<br />
RETURNING *<br />
),<br />
deleted_per_account as (<br />
SELECT aid, sum(delta) as baldiff<br />
FROM deleted_xtns<br />
GROUP BY 1<br />
),<br />
accounts_rebalanced as (<br />
UPDATE pgbench_accounts<br />
SET abalance = abalance - baldiff<br />
FROM deleted_per_account<br />
WHERE deleted_per_account.aid = pgbench_accounts.aid<br />
RETURNING deleted_per_account.aid, pgbench_accounts.bid,<br />
baldiff<br />
),<br />
branch_adjustment as (<br />
SELECT bid, SUM(baldiff) as branchdiff<br />
FROM accounts_rebalanced<br />
GROUP BY bid<br />
)<br />
UPDATE pgbench_branches<br />
SET bbalance = bbalance - branchdiff<br />
FROM branch_adjustment<br />
WHERE branch_adjustment.bid = pgbench_branches.bid<br />
RETURNING branch_adjustment.bid,branchdiff,bbalance;<br />
<br />
<br />
==SE-Postgres==<br />
<br />
PostgreSQL est la seule base qui propose une intégration complète avec le framework de sécurisation SELinux. Sécurité de niveau militaire pour votre base de données.<br />
TODO<br />
<br />
==PGXN==<br />
<br />
[http://pgxn.org/ PGXN] est le PostgreSQL Extension Network (le réseau d'extensions PostgreSQL), un système de distribution centralisée pour les bibliothèques d'extension PostgreSQL open-source. Les auteurs d'extensions peuvent [http://manager.pgxn.org/ soumettre leur travail] en même temps que [http://pgxn.org/spec/ les métadonnées le décrivant]: les packages et leur documentation sont [http://pgxn.org/ indexés] et distribués sur plusieurs serveurs. Le système peut être utilisé au travers d'une interface web ou en utilisant des clients en ligne de commande grâce à une [https://github.com/pgxn/pgxn-api/wiki API simple].<br />
<br />
Un [http://pgxnclient.projects.postgresql.org/ client PGXN] complet est en cours de développement. Il peut être installé avec:<br />
<br />
$ easy_install pgxnclient<br />
Searching for pgxnclient<br />
...<br />
Best match: pgxnclient 0.2.1<br />
Processing pgxnclient-0.2.1-py2.6.egg<br />
...<br />
Installed pgxnclient-0.2.1-py2.6.egg<br />
<br />
Il permet entre autres de rechercher des extensions sur le site web:<br />
<br />
$ pgxn search pair<br />
pair 0.1.3<br />
... Usage There are two ways to construct key/value *pairs*: Via the<br />
*pair*() function: % SELECT *pair*('foo', 'bar'); *pair* ------------<br />
(foo,bar) Or by using the ~> operator: % SELECT 'foo' ~> 'bar';<br />
*pair*...<br />
<br />
semver 0.2.2<br />
*pair* │ 0.1.0 │ Key/value *pair* data type Note that "0.35.0b1" is less<br />
than "0.35.0", as required by the specification. Use ORDER BY to get<br />
more of a feel for semantic version ordering rules: SELECT...<br />
<br />
Pour compiler et installer sur le système:<br />
<br />
$ pgxn install pair<br />
INFO: best version: pair 0.1.3<br />
INFO: saving /tmp/tmpezwyEO/pair-0.1.3.zip<br />
INFO: unpacking: /tmp/tmpezwyEO/pair-0.1.3.zip<br />
INFO: building extension<br />
...<br />
INFO: installing extension<br />
[sudo] password for piro: <br />
/bin/mkdir -p '/usr/local/pg91b1/share/postgresql/extension'<br />
...<br />
<br />
Et pour les charger en tant qu'extension de base de données:<br />
<br />
$ pgxn load -d mydb pair<br />
INFO: best version: pair 0.1.3<br />
CREATE EXTENSION<br />
<br />
==SQL/MED==<br />
<br />
Le support de SQL/MED (Management of External Data ou Gestion de Données Externes) a été démarré en 8.4. Maintenant, PostgreSQL peut définir des tables externes, ce qui est le but principal de SQL/MED: accéder à des données externes. Voici un exemple, s'appuyant sur l'extension file_fdw.<br />
<br />
Nous allons accéder à un fichier CSV au travers d'une table.<br />
<br />
CREATE EXTENSION file_fdw WITH SCHEMA extensions;<br />
\dx+ file_fdw<br />
Objects in extension "file_fdw"<br />
Object Description <br />
----------------------------------------------------<br />
foreign-data wrapper file_fdw<br />
function extensions.file_fdw_handler()<br />
function extensions.file_fdw_validator(text[],oid)<br />
<br />
L'étape suivante est optionnelle. Elle est là juste pour montrer la syntaxe de 'CREATE FOREIGN DATA WRAPPER' (le foreign data wrapper étant en quelque sorte le connecteur pour un type de données externes):<br />
<br />
=# CREATE FOREIGN DATA WRAPPER file_data_wrapper HANDLER extensions.file_fdw_handler;<br />
CREATE FOREIGN DATA WRAPPER<br />
<br />
L'extension crée déjà un «foreign data wrapper» appelé file_fdw. Nous allons l'utiliser à partir de maintenant.<br />
<br />
Nous avons besoin de créer un 'server'. Comme les données que nous allons récupérer ne proviennent que d'un fichier, cela semble un peu inutile, mais SQL/MED est aussi capable de gérer des bases de données distantes.<br />
<br />
CREATE SERVER file FOREIGN DATA WRAPPER file_fdw ;<br />
CREATE SERVER<br />
<br />
Maintenant, attachons un fichier statistical_data.csv à une table statistical_data:<br />
<br />
CREATE FOREIGN TABLE statistical_data (field1 numeric, field2 numeric) server file options (filename '/tmp/statistical_data.csv', format 'csv', delimiter ';') ;<br />
CREATE FOREIGN TABLE<br />
marc=# SELECT * from statistical_data ;<br />
field1 | field2 <br />
--------+--------<br />
0.1 | 0.2<br />
0.2 | 0.4<br />
0.3 | 0.9<br />
0.4 | 1.6<br />
<br />
Pour le moment, les foreign tables ne sont accessibles qu'en SELECT.<br />
<br />
Voyez [[Foreign_data_wrappers|la liste des extensions Foreign Data Wrapper existantes]], qui inclut Oracle, MySQL, CouchDB, Redis, Twitter, et autres.<br />
<br />
=Modifications pouvant entraîner des régressions=<br />
<br />
Les points suivants doivent être vérifiés lors d'une migration vers la version 9.1.<br />
<br />
* La valeur par défaut de ''standard_conforming_strings'' est devenue ''on''<br />
<br />
Traditionnellement, PostgreSQL ne traitait pas les littéraux de type chaîne ('..') comme le spécifie le standard SQL: les anti-slashs ('\') étaient considérés comme des caractères d'échappement, ce qui entraînait que le caractère suivant un '\' était interprété. Par exemple, '\n' est un caractère newline, '\\' est le caractère '\' lui-même. Cela s'apparentait davantage à la syntaxe du C.<br />
<br />
En 9.1, ''standard_conforming_strings'' est maintenant par défaut à ''on'', ce qui signifie que les littéraux de type chaîne sont maintenant traités comme spécifié par le standard SQL. Ce qui signifie que les caractères apostrophe doivent maintenant être protégés avec une deuxième apostrophe plutôt qu'un anti-slash, et que les anti-slashs ne sont plus des caractères d'échappement.<br />
<br />
Par exemple, quand précédemment on écrivait <nowiki>'l\'heure', on doit maintenant écrire 'l''heure'.</nowiki><br />
<br />
Certaines subtilités sont à connaître, même si elles ne sont pas apparues en 9.1:<br />
<br />
:* L'ancienne syntaxe est toujours disponible. Mettez simplement un E devant le guillemet de départ: E'l\'heure'<br />
:* ''standard_conforming_strings'' peut toujours être remis à ''off''<br />
:* Beaucoup de langages de programmation font déjà ce qu'il faut, si vous leur demandez de faire le travail d'échappement pour vous. Par exemple, la fonction PQescapeLiteral de la libpq détecte automatiquement la valeur de standard_conforming_strings et s'y adapte.<br />
Toutefois, vérifiez bien que votre programme est prêt à supporter ce changement de comportement.<br />
<br />
* les conversions de type de données de style 'fonction' ou 'attribut' ne sont plus autorisés pour les types composites<br />
<br />
Depuis la version 8.4, il est possible de convertir à peu près n'importe quoi vers son format texte.<br />
Essayons cela avec la foreign table définie précédemment:<br />
<br />
=# SELECT cast(statistical_data as text) from statistical_data ;<br />
statistical_data <br />
------------------<br />
(0.1,0.2)<br />
(0.2,0.4)<br />
(0.3,0.9)<br />
(0.4,1.6)<br />
(4 rows)<br />
<br />
Le problème c'est que les versions 8.4 et 9.0 nous donnent 4 syntaxes différentes pour effectuer cela:<br />
:* SELECT cast(statistical_data as text) from statistical_data ;<br />
:* SELECT statistical_data::text from statistical_data;<br />
:* SELECT statistical_data.text from statistical_data;<br />
:* SELECT text(statistical_data) from statistical_data;<br />
les deux dernières syntaxes ne sont plus autorisées pour les types composites (comme un enregistrement de table): ils étaient bien trop faciles à utiliser accidentellement.<br />
<br />
* Les vérifications de conversion sur les domaines définis à partir de tableaux ont été renforcées<br />
<br />
Maintenant, PostgreSQL vérifie quand vous faites une mise à jour d'un élément d'une contrainte définie sur un tableau.<br />
<br />
Voici ce qui se passait en 9.0:<br />
<br />
=#CREATE DOMAIN test_dom as int[] check (value[1] > 0);<br />
CREATE DOMAIN<br />
=#SELECT '{-1,0,0,0,0}'::test_dom;<br />
ERROR: value for domain test_dom violates check constraint "test_dom_check"<br />
<br />
Jusque là, tout va bien.<br />
<br />
=#CREATE TABLE test_dom_table (test test_dom);<br />
CREATE TABLE<br />
=# INSERT INTO test_dom_table values ('{1,0,0,0,0}');<br />
INSERT 0 1<br />
=# UPDATE test_dom_table SET test[1]=-1;<br />
UPDATE 1<br />
<br />
Par contre, là, c'est anormal… la contrainte check nous interdit de le faire. C'est maintenant impossible en 9.1, la vérification est faite correctement.<br />
<br />
* string_to_array() retourne maintenant un tableau vide pour une chaîne d'entrée de longueur zéro. Précédemment, cela retournait NULL.<br />
<br />
=# SELECT string_to_array('','whatever');<br />
string_to_array <br />
-----------------<br />
{}<br />
<br />
* string_to_array() découpe maintenant une chaîne en ses caractères si le séparateur est NULL. Précédemment, cela retournait NULL:<br />
<br />
=# SELECT string_to_array('foo',NULL);<br />
string_to_array <br />
-----------------<br />
{f,o,o}<br />
<br />
* PL/pgSQL's RAISE sans paramètre a changé de comportement.<br />
<br />
C'est un cas assez rare, mais qui piégeait les utilisateurs habitués au comportement d'Oracle sur ce point.<br />
<br />
Voici un exemple:<br />
<br />
CREATE OR REPLACE FUNCTION raise_demo () returns void language plpgsql as $$<br />
BEGIN<br />
RAISE NOTICE 'Main body';<br />
BEGIN<br />
RAISE NOTICE 'Sub-block';<br />
RAISE EXCEPTION serialization_failure; -- Simulate a problem<br />
EXCEPTION WHEN serialization_failure THEN<br />
BEGIN<br />
-- Maybe we had a serialization error<br />
-- Won't happen here of course<br />
RAISE DEBUG 'There was probably a serialization failure. It could be because of...';<br />
-- ..<br />
-- If I get there let's pretend I couldn't find a solution to the error<br />
RAISE; -- Let's forward the error<br />
EXCEPTION WHEN OTHERS THEN<br />
-- This should capture everything<br />
RAISE EXCEPTION 'Couldn t figure what to do with the error';<br />
END;<br />
END;<br />
END;<br />
$$<br />
;<br />
CREATE FUNCTION<br />
<br />
En 9.0, vous aurez ce résultat (avec ''client_min_messages'' à ''debug''):<br />
=# SELECT raise_demo();<br />
NOTICE: Main body<br />
NOTICE: Sub-block<br />
DEBUG: There was probably a serialization failure. It could be because of...<br />
ERROR: serialization_failure<br />
<br />
<br />
En 9.1:<br />
=# SELECT raise_demo();<br />
NOTICE: Main body<br />
NOTICE: Sub-block<br />
DEBUG: There was probably a serialization failure. It could be because of...<br />
ERROR: Couldn t figure what to do with the error<br />
<br />
La différence est que RAISE sans paramètres, en 9.0, ramène le déroulement du code à l'endroit où l'EXCEPTION s'est déclenchée. En 9.1, le RAISE continue dans le bloc dans lequel il se produit, le bloc BEGIN intérieur n'est pas quitté quand le RAISE se déclenche. Son bloc d'exception est exécuté.<br />
<br />
=Améliorations liées aux performances=<br />
<br />
* Les écritures synchrones ont été optimisées pour moins charger le système de fichiers.<br />
<br />
Ce point est difficile à mettre en évidence dans ce document. Mais la performance et les temps de réponse (la latence) ont été fortement améliorés quand la charge en écriture est élevée.<br />
<br />
* Les tables filles (par héritage) dans les requêtes peuvent maintenant retourner des résultats triés de façon utile, ce qui permet des optimisations de MIN/MAX pour l'héritage (et donc le partitionnement).<br />
<br />
Si vous utilisez beaucoup d'héritage, dans un contexte de partitionnement en particulier, vous allez adorer cette optimisation.<br />
<br />
Le planificateur de requête est devenu bien plus intelligent dans le cas suivant.<br />
<br />
Créons un schéma factice:<br />
<br />
=# CREATE TABLE parent (a int);<br />
CREATE TABLE<br />
=# CREATE TABLE children_1 ( check (a between 1 and 10000000)) inherits (parent);<br />
CREATE TABLE<br />
=# CREATE TABLE children_2 ( check (a between 10000001 and 20000000)) inherits (parent);<br />
CREATE TABLE<br />
=# INSERT INTO children_1 select generate_series(1,10000000);<br />
INSERT 0 10000000<br />
=# INSERT INTO children_2 select generate_series(10000001,20000000);<br />
INSERT 0 10000000<br />
=# CREATE INDEX test_1 ON children_1 (a);<br />
CREATE INDEX;<br />
=# CREATE INDEX test_2 ON children_2 (a);<br />
CREATE INDEX;<br />
<br />
Et demandons les 50 plus grandes valeurs de a.<br />
<br />
SELECT * from parent order by a desc limit 50;<br />
<br />
Cela prend, sur une petite machine de test, 13 secondes sur une base en 9.0, et 0.8 millisecondes sur une base en 9.1.<br />
<br />
Le plan en 9.0 est:<br />
<br />
Limit (cost=952993.36..952993.48 rows=50 width=4)<br />
-> Sort (cost=952993.36..1002999.24 rows=20002354 width=4)<br />
Sort Key: public.parent.a<br />
-> Result (cost=0.00..288529.54 rows=20002354 width=4)<br />
-> Append (cost=0.00..288529.54 rows=20002354 width=4)<br />
-> Seq Scan on parent (cost=0.00..34.00 rows=2400 width=4)<br />
-> Seq Scan on children_1 parent (cost=0.00..144247.77 rows=9999977 width=4)<br />
-> Seq Scan on children_2 parent (cost=0.00..144247.77 rows=9999977 width=4)<br />
<br />
Le plan en 9.1 est:<br />
<br />
Limit (cost=113.75..116.19 rows=50 width=4)<br />
-> Result (cost=113.75..975036.98 rows=20002400 width=4)<br />
-> Merge Append (cost=113.75..975036.98 rows=20002400 width=4)<br />
Sort Key: public.parent.a<br />
-> Sort (cost=113.73..119.73 rows=2400 width=4)<br />
Sort Key: public.parent.a<br />
-> Seq Scan on parent (cost=0.00..34.00 rows=2400 width=4)<br />
-> Index Scan Backward using test_1 on children_1 parent (cost=0.00..303940.35 rows=10000000 width=4)<br />
-> Index Scan Backward using test_2 on children_2 parent (cost=0.00..303940.35 rows=10000000 width=4)<br />
<br />
Le plan en 9.0 signifie: je vais prendre tous les enregistrements de toutes les tables, les trier, et ensuite retourner les 50 plus grands.<br />
<br />
Le plan en 9.1 signifie: je vais prendre les enregistrements de chaque table dans l'ordre trié, en utilisant leurs index s'il y en a, les fusionner comme ils arrivent, et retourner les 50 premiers.<br />
<br />
C'était un piège très fréquent, ce genre de requête devenait extrêmement lent quand on partitionnait une table. Et il était un peu compliqué de le contourner par réécriture de requête.<br />
<br />
* Les algorithmes de hachage peuvent maintenant être utilisés pour les full outer join, et pour les tableaux.<br />
<br />
Il est très simple de démontrer ce point (pour les full outer join):<br />
<br />
CREATE TABLE test1 (a int);<br />
CREATE TABLE test2 (a int);<br />
INSERT INTO test1 SELECT generate_series(1,100000);<br />
INSERT INTO test2 SELECT generate_series(100,1000);<br />
<br />
Nous avons donc une grosse table test1 et une petite table test2.<br />
<br />
En 9.0, la requête est faite avec ce plan:<br />
<br />
EXPLAIN ANALYZE SELECT * FROM test1 FULL OUTER JOIN test2 USING (a);<br />
QUERY PLAN <br />
--------------------------------------------------------------------------------------------------------------------------<br />
Merge Full Join (cost=11285.07..11821.07 rows=100000 width=8) (actual time=330.092..651.618 rows=100000 loops=1)<br />
Merge Cond: (test1.a = test2.a)<br />
-> Sort (cost=11116.32..11366.32 rows=100000 width=4) (actual time=327.926..446.814 rows=100000 loops=1)<br />
Sort Key: test1.a<br />
Sort Method: external sort Disk: 1368kB<br />
-> Seq Scan on test1 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.011..119.246 rows=100000 loops=1)<br />
-> Sort (cost=168.75..174.75 rows=2400 width=4) (actual time=2.156..3.208 rows=901 loops=1)<br />
Sort Key: test2.a<br />
Sort Method: quicksort Memory: 67kB<br />
-> Seq Scan on test2 (cost=0.00..34.00 rows=2400 width=4) (actual time=0.009..1.066 rows=901 loops=1<br />
Total runtime: 733.368 ms<br />
<br />
Voici le nouveau plan, en 9.1 cette fois-ci:<br />
<br />
--------------------------------------------------------------------------------------------------------------------<br />
Hash Full Join (cost=24.27..1851.28 rows=100000 width=8) (actual time=2.536..331.547 rows=100000 loops=1)<br />
Hash Cond: (test1.a = test2.a)<br />
-> Seq Scan on test1 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.014..119.884 rows=100000 loops=1)<br />
-> Hash (cost=13.01..13.01 rows=901 width=4) (actual time=2.505..2.505 rows=901 loops=1)<br />
Buckets: 1024 Batches: 1 Memory Usage: 32kB<br />
-> Seq Scan on test2 (cost=0.00..13.01 rows=901 width=4) (actual time=0.017..1.186 rows=901 loops=1)<br />
Total runtime: 412.735 ms<br />
<br />
Le plan en 9.0 effectue 2 tris. Celui en 9.1 n'a besoin que d'un hachage de la plus petite table.<br />
<br />
Le temps d'exécution est divisé par presque 2. Une autre propriété intéressante est que le nouveau plan a un coût de démarrage bien plus faible: le premier enregistrement est retourné après 2 millisecondes, alors qu'il en faut 330 à l'ancien plan.<br />
<br />
SELECT * from test1 full outer join test2 using (a) LIMIT 10<br />
<br />
prend 330ms en 9.0, et 3ms en 9.1.<br />
<br />
<br />
=Administration=<br />
<br />
* Paramétrage automatique de wal_buffers.<br />
Le paramètre wal_buffers est maintenant positionné automatiquement quand sa valeur est -1, sa nouvelle valeur par défaut. Il est positionné automatiquement à 1/32ème de shared_buffers, avec un maximum à 16Mo. Un paramètre de moins à gérer…<br />
<br />
* Enregistrement des dernières remises à zéro dans les vues de statistiques de base de données et de background writer.<br />
Vous pouvez maintentant savoir quand les statistiques ont été réinitialisées. Pour une base de données, par exemple:<br />
<br />
SELECT datname, stats_reset FROM pg_stat_database;<br />
datname | stats_reset <br />
-----------+-------------------------------<br />
template1 | <br />
template0 | <br />
postgres | 2011-05-11 19:22:05.946641+02<br />
marc | 2011-05-11 19:22:09.133483+02<br />
<br />
* Nouvelles colonnes montrant le nombre d'opérations de vacuum et d'analyze dans les vues pg_stat_*_tables.<br />
<br />
C'est maintenant bien plus facile de savoir quelle table attire l'attention d'autovacuum:<br />
<br />
SELECT relname, last_vacuum, vacuum_count, last_autovacuum, autovacuum_count, last_analyze, analyze_count, last_autoanalyze, autoanalyze_count<br />
FROM pg_stat_user_tables <br />
WHERE relname in ('test1','test2');<br />
relname | last_vacuum | vacuum_count | last_autovacuum | autovacuum_count | last_analyze | analyze_count | last_autoanalyze | autoanalyze_count <br />
---------+-------------+--------------+-----------------+------------------+--------------+---------------+-------------------------------+-------------------<br />
test1 | | 0 | | 0 | | 0 | 2011-05-22 15:51:50.48562+02 | 1<br />
test2 | | 0 | | 0 | | 0 | 2011-05-22 15:52:50.325494+02 | 2<br />
<br />
<br />
<br />
=Fonctionnalités SQL et PL/PgSQL=<br />
<br />
* Group by peut deviner des colonnes manquantes<br />
<br />
CREATE TABLE entities (entity_name text primary key, entity_address text);<br />
CREATE TABLE employees (employee_name text primary key, entity_name text references entities (entity_name));<br />
INSERT INTO entities VALUES ('HR', 'address1');<br />
INSERT INTO entities VALUES ('SALES', 'address2');<br />
INSERT INTO employees VALUES ('Smith', 'HR');<br />
INSERT INTO employees VALUES ('Jones', 'HR');<br />
INSERT INTO employees VALUES ('Taylor', 'SALES');<br />
INSERT INTO employees VALUES ('Brown', 'SALES');<br />
<br />
On peut maintenant écrire:<br />
<br />
SELECT count(*), entity_name, address<br />
FROM entities JOIN employees using (entity_name)<br />
GROUP BY entity_name;<br />
count | entity_name | address <br />
-------+-------------+----------<br />
2 | HR | address1<br />
2 | SALES | address2<br />
<br />
En 9.0, il aurait fallu grouper aussi sur address. Comme entity_name est la clé primaire d'entities, address est fonctionnellement dépendant d'entity_name, il est donc évident que PostgreSQL doit aussi regrouper sur address.<br />
<br />
* De nouvelles valeurs peuvent être ajoutées à un type enum par ALTER TYPE.<br />
<br />
=# CREATE TYPE package_status AS ENUM ('RECEIVED', 'DELIVERED'); ;<br />
CREATE TYPE<br />
=# ALTER TYPE package_status ADD VALUE 'READY FOR DELIVERY' AFTER 'RECEIVED';<br />
ALTER TYPE<br />
<br />
Jusqu'à la 9.0, il était nécessaire de détruire le type et en créer un nouveau. Cela impliquait de détruire toutes les colonnes utilisant ce type. C'était une des principales raisons pour lesquelles les enum étaient peu utilisés.<br />
<br />
* Les types composites peuvent être modifiés par ALTER TYPE ... ADD/DROP/ALTER/RENAME ATTRIBUTE.<br />
<br />
Créons un type composite simple:<br />
<br />
=#CREATE TYPE package AS (destination text);<br />
<br />
Créons une fonction vide utilisant ce type:<br />
<br />
=#CREATE FUNCTION package_exists (pack package) RETURNS boolean LANGUAGE plpgsql AS $$<br />
BEGIN<br />
RETURN true;<br />
END<br />
$$<br />
;<br />
<br />
Testons cette fonction:<br />
<br />
=#SELECT package_exists(row('test'));<br />
package_exists <br />
----------------<br />
t<br />
<br />
Cela fonctionne.<br />
<br />
Il est maintenant possible de modifier le type 'package':<br />
<br />
=#ALTER TYPE package ADD ATTRIBUTE received boolean;<br />
<br />
le type a changé:<br />
<br />
=#SELECT package_exists(row('test'));<br />
ERROR: cannot cast type record to package<br />
LINE 1: SELECT package_exists(row('test'));<br />
^<br />
DETAIL: Input has too few columns.<br />
=# SELECT package_exists(row('test',true));<br />
package_exists <br />
----------------<br />
t<br />
<br />
* ALTER TABLE ... ADD UNIQUE/PRIMARY KEY USING INDEX<br />
<br />
Cela sera certainement utilisé principalement pour créer une clé unique ou primaire sans verrouiller une table pendant trop longtemps:<br />
<br />
=# CREATE UNIQUE INDEX CONCURRENTLY idx_pk ON test_pk (a);<br />
CREATE INDEX<br />
=# ALTER TABLE test_pk ADD primary key using index idx_pk;<br />
ALTER TABLE<br />
<br />
La table test_pk ne sera verrouillée en écriture que pendant la durée de l'ALTER TABLE. Le reste du travail sera fait sans bloquer le travail des utilisateurs.<br />
<br />
On peut bien sûr utiliser cela pour reconstruire l'index d'une clé primaire sans verrouiller la table pendant toute l'opération:<br />
<br />
=# CREATE UNIQUE INDEX CONCURRENTLY idx_pk2 ON test_pk (a);<br />
=# BEGIN ;<br />
=# ALTER TABLE test_pk DROP CONSTRAINT idx_pk;<br />
=# ALTER TABLE test_pk ADD primary key using index idx_pk2;<br />
=# COMMIT ;<br />
<br />
* ALTER TABLE ... SET DATA TYPE peut éviter la ré-écriture de toute la table pour les cas les plus appropriés.<br />
<br />
Par exemple, convertir une colonne varchar en texte ne demande plus de réécrire la table.<br />
<br />
Par contre, augmenter la taille d'une colonne varchar nécessite toujours une réécriture de la table<br />
<br />
Il reste encore un certain nombre de cas non gérés, qui déclenchent une réécriture. Il y aura probablement des améliorations dans les prochaines versions de PostgreSQL, ce travail se poursuivant.<br />
<br />
* New CREATE TABLE IF NOT EXISTS syntax.<br />
<br />
Vous n'aurez pas d'erreur si une table existe déjà, seulement un NOTICE.<br />
<br />
Attention au fait qu'il ne vérifiera pas que la définition de votre CREATE TABLE et la définition de la table sont identiques.<br />
<br />
* Nouvelle option ENCODING à COPY TO/FROM. Cela permet de spécifier un encodage à COPY qui soit indépendant du client_encoding.<br />
<br />
COPY test1 TO stdout ENCODING 'latin9'<br />
<br />
convertira l'encodage directement. Il n'est donc pas nécessaire de changer le client_encoding avant le COPY.<br />
<br />
* triggers INSTEAD OF sur vues.<br />
<br />
Cette fonctionnalité peut être utilisée pour implémenter des vues en mise à jour. Voici un exemple:<br />
<br />
Continuons sur l'exemple employees/entities.<br />
<br />
=#CREATE VIEW emp_entity AS SELECT employee_name, entity_name, address<br />
FROM entities JOIN employees USING (entity_name);<br />
<br />
Pour rendre cette vue modifiable en 9.0, il fallait utiliser des RULES (règles). Cela pouvait vite tourner au cauchemar, les rules sont assez complexes à écrire, et encore pire à déboguer. Voici comment on procédait: [http://www.postgresql.org/docs/9.1/static/rules-update.html mises à jour par règles]<br />
<br />
On peut maintenant faire tout cela avec un trigger. Voici un exemple, en PL/PgSQL (il n'y a que la partie INSERT ici):<br />
<br />
=#CREATE OR REPLACE FUNCTION dml_emp_entity () RETURNS trigger LANGUAGE plpgsql AS $$<br />
DECLARE<br />
vrecord RECORD;<br />
BEGIN<br />
IF TG_OP = 'INSERT' THEN<br />
-- Does the record exist in entity ?<br />
SELECT entity_name,address INTO vrecord FROM entities WHERE entity_name=NEW.entity_name;<br />
IF NOT FOUND THEN<br />
INSERT INTO entities (entity_name,address) VALUES (NEW.entity_name, NEW.address);<br />
ELSE<br />
IF vrecord.address != NEW.address THEN<br />
RAISE EXCEPTION 'There already is a record for % in entities. Its address is %. It conflics with your address %',<br />
NEW.entity_name, vrecord.address, NEW.address USING ERRCODE = 'unique_violation';<br />
END IF;<br />
END IF; -- Nothing more to do, the entity already exists and is OK<br />
-- We now try to insert the employee data. Let's directly try an INSERT<br />
BEGIN<br />
INSERT INTO employees (employee_name, entity_name) VALUES (NEW.employee_name, NEW.entity_name);<br />
EXCEPTION WHEN unique_violation THEN<br />
RAISE EXCEPTION 'There is already an employee with this name %', NEW.employee_name USING ERRCODE = 'unique_violation';<br />
END;<br />
RETURN NEW; -- The trigger succeeded<br />
END IF;<br />
END<br />
$$<br />
;<br />
<br />
Il ne reste plus qu'à déclarer notre trigger maintenant:<br />
<br />
=#CREATE TRIGGER trig_dml_emp_entity INSTEAD OF INSERT OR UPDATE OR DELETE ON emp_entity FOR EACH ROW EXECUTE PROCEDURE dml_emp_entity ();<br />
<br />
Il y a d'autres avantages: une rule ne fait que réécrire la requête. Avec le trigger, nous avons rajouté de la logique, et nous pouvons retourner des messages d'erreur pertinents. Cela permet bien plus facilement de comprendre ce qui a échoué. On peut aussi gérer des exceptions. Nous avons tous les avantages des triggers sur les rules.<br />
<br />
* PL/PgSQL FOREACH IN ARRAY.<br />
<br />
C'est devenu bien plus simple de faire une boucle sur un tableau en PL/PgSQL. Jusqu'à maintenant, le mot clé FOR ne fonctionnait que pour boucler sur des recordsets (résultat de requête).<br />
<br />
On peut maintenant s'en servir pour boucler sur un tableau.<br />
<br />
Avant la 9.1, il aurait fallu écrire quelque chose de ce genre:<br />
<br />
=# CREATE OR REPLACE FUNCTION test_array (parray int[]) RETURNS int LANGUAGE plpgsql AS $$<br />
DECLARE<br />
vcounter int :=0;<br />
velement int;<br />
BEGIN<br />
FOR velement IN SELECT unnest (parray)<br />
LOOP<br />
vcounter:=vcounter+velement;<br />
END LOOP;<br />
RETURN vcounter;<br />
END<br />
$$<br />
;<br />
<br />
Maintenant:<br />
<br />
=# CREATE OR REPLACE FUNCTION test_array (parray int[]) RETURNS int LANGUAGE plpgsql AS $$<br />
DECLARE<br />
vcounter int :=0;<br />
velement int;<br />
BEGIN<br />
FOREACH velement IN ARRAY parray<br />
LOOP<br />
vcounter:=vcounter+velement;<br />
END LOOP;<br />
RETURN vcounter;<br />
END<br />
$$<br />
;<br />
<br />
C'est bien plus facile à lire, et plus performant à l'exécution.<br />
<br />
Il y a un autre avantage: nous pouvons découper le tableau quand il est multi-dimensionnel. Voici un exemple, tiré directement de la documentation:<br />
<br />
=#CREATE FUNCTION scan_rows(int[]) RETURNS void AS $$<br />
DECLARE<br />
x int[];<br />
BEGIN<br />
FOREACH x SLICE 1 IN ARRAY $1<br />
LOOP<br />
RAISE NOTICE 'row = %', x;<br />
END LOOP;<br />
END;<br />
$$ LANGUAGE plpgsql;<br />
<br />
=#SELECT scan_rows(ARRAY[[1,2,3],[4,5,6],[7,8,9],[10,11,12]]);<br />
NOTICE: row = {1,2,3}<br />
NOTICE: row = {4,5,6}<br />
NOTICE: row = {7,8,9}<br />
NOTICE: row = {10,11,12}<br />
<br />
[[Category:PostgreSQL 9.1]]<br />
[[Category:Français]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.1&diff=15278What's new in PostgreSQL 9.12011-08-26T07:56:27Z<p>Marco44: </p>
<hr />
<div>{{Languages}}<br />
<br />
This document showcases many of the latest developments in PostgreSQL 9.1, compared to the last major release &ndash; PostgreSQL 9.0. There are many improvements in this release, so this wiki page covers many of the more important changes in detail. The full list of changes is itemised in [http://www.postgresql.org/docs/9.1/static/release-9-1 Release Notes].<br />
<br />
=Major new features=<br />
<br />
==Synchronous replication and other replication features==<br />
<br />
There are quite a lot of new features around replication in 9.1:<br />
<br />
* In 9.0, the user used for replication had to be a superuser. It's no longer the case, there is a new 'replication' privilege.<br />
<br />
CREATE ROLE replication_role REPLICATION LOGIN PASSWORD 'pwd_replication'<br />
<br />
This role can then be added to the pg_hba.conf to be used for streaming replication. It's better, from a security point of view,<br />
than having a superuser role doing this job.<br />
<br />
Now that we have a cluster and created our replication user, we can set the database up for streaming replication. It's a matter of<br />
adding the permission to connect to the virtual replication database in ''pg_hba.conf'', setting up ''wal_level'', archiving (''archive_mode'', ''archive_command'') and ''max_wal_senders'', and has<br />
been covered in the 9.0 documentation.<br />
<br />
When our database cluster is ready for streaming, we can demo the second new feature.<br />
<br />
* pg_basebackup.<br />
<br />
This new tool is used to create a clone of a database, or a backup, using only the streaming replication features. There is no need to call ''pg_start_backup()'', then copy the database manually and call ''pg_stop_backup()''. pg_basebackup does all in one command. We'll clone the running database to /tmp/newcluster:<br />
<br />
> pg_basebackup -D /tmp/newcluster -U replication -v<br />
Password: <br />
NOTICE: pg_stop_backup complete, all required WAL segments have been archived<br />
pg_basebackup: base backup completed<br />
<br />
This new database is ready to start: just add a ''recovery.conf'' file with a ''restore_command'' to retrieve archived WAL files, and start the new cluster.<br />
pg_basebackup can also create tar backups, or include all required xlog files (to get a standalone backup).<br />
<br />
As we're going to now demo streaming replication with synchronous commit, we'll setup a recovery.conf to connect to the master database and stream changes.<br />
<br />
We'll create a recovery.conf containing something like this:<br />
<br />
restore_command = 'cp /tmp/%f %p' # e.g. 'cp /mnt/server/archivedir/%f %p'<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=59121 user=replication password=replication application_name=newcluster' # e.g. 'host=localhost port=5432'<br />
trigger_file = '/tmp/trig_f_newcluster'<br />
<br />
Then we'll start the new cluster:<br />
<br />
pg_ctl -D /tmp/newcluster start<br />
<br />
LOG: database system was interrupted; last known up at 2011-05-22 17:15:45 CEST<br />
LOG: entering standby mode<br />
LOG: restored log file "00000001000000010000002F" from archive<br />
LOG: redo starts at 1/2F000020<br />
LOG: consistent recovery state reached at 1/30000000<br />
LOG: database system is ready to accept read only connections<br />
cp: cannot stat « /tmp/000000010000000100000030 »: No such file or directory<br />
LOG: streaming replication successfully connected to primary<br />
<br />
Ok, now we have a slave, streaming from the master, but we're still asynchronous. Notice that we set ''application_name'' in the connection string in recovery.conf.<br />
<br />
* Synchronous replication<br />
<br />
To get synchronous, just change, in the master's postgresql.conf:<br />
<br />
synchronous_standby_names = 'newcluster'<br />
<br />
This is the application_name from our primary_conninfo from the slave. Just do a pg_ctl reload, and this new parameter will be set. Now any commit on the master will only be reported as committed on the master when the slave has written it on its on journal, and acknowledged it to the master.<br />
<br />
A word of warning: transactions are considered committed when they are applied to the slave's journal, not when they are visible on the slave. It means there will still be a delay between the moment a transaction is committed on the master, and the moment it is visible on the slave. This still is synchronous replication because no data will be lost if the master crashes.<br />
<br />
One of the really great features of synchronous replication is that it is controllable per session. The parameter ''synchronous_commit'' can be turned off (it is on by default) in a session, if it does not require this synchronous guarantee. If you don't need it in your transaction, just do a<br />
SET synchronous_commit TO off<br />
and you wont pay the penalty.<br />
<br />
There are other new replication features for PostgreSQL 9.1:<br />
<br />
* The slaves can now ask the master not to vacuum records they still need.<br />
<br />
It was a major setup problem with 9.0: a vacuum could destroy records that were still necessary to running queries on the slave, triggering replication conflicts. The slave then had to make a choice: kill the running query, or accept deferring the application of the modifications, and lag behind. One could work around this by setting ''vacuum_defer_cleanup_age'' to a non-zero value, but it was quite hard to get a correct value for it. This new feature is enabled with the parameter ''hot_standby_feedback'', on the standby databases. Of course, this means that the standby can prevent VACUUM from doing a correct maintenance on the master, if there are very long running queries on the slave.<br />
<br />
* pg_stat_replication is a new system view.<br />
<br />
It displays, on the master, the status of all slaves: how much WAL they received, if they are connected, synchronous, what they did replay:<br />
<br />
=# SELECT * from pg_stat_replication ;<br />
procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state <br />
---------+----------+-------------+------------------+-------------+-----------------+-------------+------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------<br />
17135 | 16671 | replication | newcluster | 127.0.0.1 | | 43745 | 2011-05-22 18:13:04.19283+02 | streaming | 1/30008750 | 1/30008750 | 1/30008750 | 1/30008750 | 1 | sync<br />
<br />
There is no need to query the slaves anymore to know their status relative to the master.<br />
<br />
* pg_stat_database_conflicts is another new system view.<br />
<br />
This one is on the standby database, and shows how many queries have been cancelled, and for what reasons:<br />
<br />
=# SELECT * from pg_stat_database_conflicts ;<br />
datid | datname | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock <br />
-------+-----------+------------------+------------+----------------+-----------------+----------------<br />
1 | template1 | 0 | 0 | 0 | 0 | 0<br />
11979 | template0 | 0 | 0 | 0 | 0 | 0<br />
11987 | postgres | 0 | 0 | 0 | 0 | 0<br />
16384 | marc | 0 | 0 | 1 | 0 | 0<br />
<br />
* replication can now be easily paused on a slave.<br />
<br />
Just call ''pg_xlog_replay_pause()'' to pause, ''pg_xlog_replay_resume()'' to resume. This will freeze the database, making it a very good tool to do consistent backups.<br />
<br />
''pg_is_xlog_replay_paused()'' can be used to know the current status. <br />
<br />
Log replay can also now be paused at the end of a database recovery without putting the database into production, to give the administrator the opportunity to query the database. The administrator can then check if the recovery point reached is correct, before ending recovery. This new parameter is ''pause_at_recovery_target'', in recovery.conf.<br />
<br />
* Restore points can now be created.<br />
<br />
They are just named addresses in the transaction journal.<br />
<br />
They can then be used by specifying a recovery_target_name, instead of a recovery_target_time or a recovery_target_xid in the recovery.conf file.<br />
<br />
They are created by calling pg_create_restore_point().<br />
<br />
==Per-column collations==<br />
<br />
The collation order is not unique in a database anymore.<br />
<br />
Let's say you were using a 9.0 database, with an UTF8 encoding and a de_DE.utf8 collation (alphabetical sort)<br />
order, because most of your users speak German. If you had to store french data too,<br />
and had to sort, some french users could have been disappointed:<br />
<br />
SELECT * from (values ('élève'),('élevé'),('élever'),('Élève')) as tmp order by column1;<br />
column1 <br />
---------<br />
élevé<br />
élève<br />
Élève<br />
élever<br />
<br />
It's not that bad, but it's not the french collation order: accentuated (diactric) characters are<br />
considered equal on first pass to the unaccentuated characters. Then, on a second pass, they are considered to be after the unaccentuated ones.<br />
Except that on that second pass, the letters are considered from the end to the beginning of the word. That's a bit strange, but that's the<br />
french collation rules…<br />
<br />
With 9.1, two new features are available:<br />
<br />
* You can specify collation at query time:<br />
<br />
SELECT * FROM (VALUES ('élève'),('élevé'),('élever'),('Élève')) AS tmp ORDER BY column1 COLLATE "fr_FR.utf8";<br />
column1 <br />
---------<br />
élève<br />
Élève<br />
élevé<br />
élever<br />
<br />
* You can specify collation at table definition time:<br />
<br />
CREATE TABLE french_messages (message TEXT COLLATE "fr_FR.utf8");<br />
INSERT INTO french_messages VALUES ('élève'),('élevé'),('élever'),('Élève');<br />
SELECT * FROM french_messages ORDER BY message;<br />
message <br />
---------<br />
élève<br />
Élève<br />
élevé<br />
élever<br />
<br />
And of course you can create an index on the message column, that can be used for fast french sorting. For instance, using a table with more data and without collation defined:<br />
<br />
CREATE TABLE french_messages2 (message TEXT); -- no collation here<br />
INSERT INTO french_messages2 SELECT * FROM french_messages, generate_series(1,100000); -- 400k rows<br />
CREATE INDEX idx_french_ctype ON french_messages2 (message COLLATE "fr_FR.utf8");<br />
EXPLAIN SELECT * FROM french_messages2 ORDER BY message;<br />
QUERY PLAN <br />
-------------------------------------------------------------------------------<br />
Sort (cost=62134.28..63134.28 rows=400000 width=32)<br />
Sort Key: message<br />
-> Seq Scan on french_messages2 (cost=0.00..5770.00 rows=400000 width=32)<br />
<br />
EXPLAIN SELECT * FROM french_messages2 ORDER BY message COLLATE "fr_FR.utf8";<br />
QUERY PLAN <br />
--------------------------------------------------------------------------------------------------<br />
Index Scan using idx_french_ctype on french_messages2 (cost=0.00..17139.15 rows=400000 width=8)<br />
<br />
==Unlogged Tables==<br />
<br />
These can be used for ephemeral data. An unlogged table is much faster to write,<br />
but won't survive a crash (it will be truncated at database restart in case of a crash).<br />
<br />
They don't have the WAL maintenance overhead, so they are much faster to write to.<br />
<br />
Here is a (non-realistic) example:<br />
<br />
# CREATE TABLE test (a int);<br />
CREATE TABLE<br />
# CREATE UNLOGGED table testu (a int);<br />
CREATE TABLE<br />
# CREATE INDEX idx_test on test (a);<br />
CREATE INDEX<br />
# CREATE INDEX idx_testu on testu (a );<br />
CREATE INDEX<br />
=# \timing <br />
Timing is on.<br />
=# INSERT INTO test SELECT generate_series(1,1000000);<br />
INSERT 0 1000000<br />
Time: 17601,201 ms<br />
=# INSERT INTO testu SELECT generate_series(1,1000000);<br />
INSERT 0 1000000<br />
Time: 3439,982 ms<br />
<br />
These table are very efficient for caching data, or for anything that can be rebuilt<br />
in case of a crash.<br />
<br />
<br />
==Extensions==<br />
<br />
This item and the following one are another occasion to present several features in one go. We'll need to<br />
install pg_trgm, and it is now an extension.<br />
<br />
Let's first install pg_trgm. Until 9.0, we had to run a script manually, the command looked like this:<br />
<br />
\i /usr/local/pgsql/share/contrib/pg_trgm.sql<br />
<br />
This was a real maintenance problem: the created functions defaulted to the public<br />
schema, were dumped "as is" in pg_dump files, often didn't restore correctly as<br />
they depended on external binary objects, or could change definitions between releases.<br />
<br />
With 9.1, one can use the CREATE EXTENSION command:<br />
<br />
CREATE EXTENSION [ IF NOT EXISTS ] extension_name<br />
[ WITH ] [ SCHEMA schema ]<br />
[ VERSION version ]<br />
[ FROM old_version ]<br />
<br />
Most important options are ''extension_name'', of course, and ''schema'': extensions<br />
can be stored in a schema.<br />
<br />
So let's install the pg_trgm for the next example:<br />
<br />
=# CREATE schema extensions;<br />
CREATE SCHEMA<br />
<br />
=# CREATE EXTENSION pg_trgm WITH SCHEMA extensions;<br />
CREATE EXTENSION<br />
<br />
Now, pg_trgm is installed in an 'extensions' schema. It will be included in database<br />
dumps correctly, with the CREATE EXTENSION syntax. So if anything changes in the extension,<br />
this extension will be restored with the new definition.<br />
<br />
One can get the list of extensions under psql:<br />
\dx<br />
List of installed extensions<br />
Name | Version | Schema | Description <br />
----------+---------+------------+-------------------------------------------------------------------<br />
pg_trgm | 1.0 | extensions | text similarity measurement and index searching based on trigrams<br />
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language<br />
(2 rows)<br />
<br />
<br />
==K-Nearest-Neighbor Indexing==<br />
<br />
GIST indexes can now be used to return sorted rows, if a 'distance' has a meaning and<br />
can be defined for the data type.<br />
For now, this work has been done for the point datatype, the pg_trgm contrib, and <br />
many btree_gist datatypes. This feature is available for all datatypes to use, so<br />
there will probably be more in the near future.<br />
<br />
For now, here is an example with pg_trgm. pg_trgm uses trigrams<br />
to compare strings. Here are the trigrams for the 'hello' string:<br />
<br />
SELECT show_trgm('hello');<br />
show_trgm <br />
---------------------------------<br />
{" h"," he",ell,hel,llo,"lo "}<br />
<br />
Trigrams are used to evaluate similarity (between 0 and 1) between strings. So there is a notion of<br />
distance, with distance defined as '1-similarity'.<br />
<br />
Here is an example using pg_trgm. The table contains 5 million text records, for 750MB.<br />
<br />
CREATE TABLE test_trgm ( text_data text);<br />
<br />
CREATE INDEX test_trgm_idx on test_trgm using gist (text_data extensions.gist_trgm_ops);<br />
<br />
Until 9.0, if we wanted the 2 closest text_data to hello from the table, here was the query:<br />
<br />
SELECT text_data, similarity(text_data, 'hello')<br />
FROM test_trgm <br />
WHERE text_data % 'hello'<br />
ORDER BY similarity(text_data, 'hello')<br />
LIMIT 2;<br />
<br />
On the test database, it takes around 2 seconds to complete.<br />
<br />
With 9.1 and KNN, one can write:<br />
<br />
SELECT text_data, text_data <-> 'hello'<br />
FROM test_trgm <br />
ORDER BY text_data <-> 'hello'<br />
LIMIT 2;<br />
<br />
The <-> operator is the distance operator.<br />
It runs in 20ms, using the index to directly retrieve the 2 best records.<br />
<br />
While we're talking about pg_trgm, another new feature is that the LIKE and ILIKE<br />
operators can now automatically make use of a trgm index. Still using the same<br />
table:<br />
SELECT text_data<br />
FROM test_trgm<br />
WHERE text_data like '%hello%';<br />
<br />
uses the test_trgm_idx index (instead of scanning the whole table).<br />
<br />
==Serializable Snapshot Isolation==<br />
<br />
This feature is very useful if you need all your transactions to behave as if they are<br />
running serially, without sacrificing too much throughput, as is currently the<br />
case with other 'serializable' isolation implementations (this is usually done by locking every record accessed).<br />
<br />
As is it quite complex to demonstrate correctly, here is a link to a full explanation<br />
of this feature:<br />
http://wiki.postgresql.org/wiki/SSI<br />
<br />
TODO: the SSI documentation always concludes with a commit. It may be confusing to the reader.<br />
<br />
<br />
==Writeable Common Table Expressions==<br />
<br />
This extends the WITH syntax introduced in 8.4. Now, data modification queries can<br />
be put in the WITH part of the query, and the returned data used later.<br />
<br />
Let's say we want to archive all records matching %hello% from our test_trgm table:<br />
<br />
CREATE TABLE old_text_data (text_data text);<br />
<br />
WITH deleted AS (DELETE FROM test_trgm WHERE text_data like '%hello%' RETURNING text_data)<br />
INSERT INTO old_text_data SELECT * FROM deleted;<br />
<br />
All in one query.<br />
<br />
As a more ambitious example, the following query updates a pgbench database, deleting a bunch of erroneous transactions and updating all related teller, branch, and account totals in a single statement:<br />
<br />
WITH deleted_xtns AS (<br />
DELETE FROM pgbench_history<br />
WHERE bid = 4 and tid = 9<br />
RETURNING *<br />
), <br />
deleted_per_account as (<br />
SELECT aid, sum(delta) as baldiff <br />
FROM deleted_xtns<br />
GROUP BY 1<br />
),<br />
accounts_rebalanced as (<br />
UPDATE pgbench_accounts<br />
SET abalance = abalance - baldiff<br />
FROM deleted_per_account<br />
WHERE deleted_per_account.aid = pgbench_accounts.aid<br />
RETURNING deleted_per_account.aid, pgbench_accounts.bid,<br />
baldiff<br />
),<br />
branch_adjustment as (<br />
SELECT bid, SUM(baldiff) as branchdiff<br />
FROM accounts_rebalanced<br />
GROUP BY bid<br />
)<br />
UPDATE pgbench_branches<br />
SET bbalance = bbalance - branchdiff<br />
FROM branch_adjustment<br />
WHERE branch_adjustment.bid = pgbench_branches.bid<br />
RETURNING branch_adjustment.bid,branchdiff,bbalance;<br />
<br />
==SE-Postgres==<br />
PostgreSQL is the only database which offers full integration <br />
with SELinux secure data frameworks. Military-grade security <br />
for your database. <br />
TODO<br />
<br />
==PGXN==<br />
<br />
[http://pgxn.org/ PGXN] is the PostgreSQL Extension Network, a central distribution system for open-source PostgreSQL extension libraries. Extensions author can [http://manager.pgxn.org/ submit their work] together with [http://pgxn.org/spec/ metadata describing them]: the packages and their documentation are [http://pgxn.org/ indexed] and distributed across several servers. The system can be used via web interface or using command line clients thanks to a [https://github.com/pgxn/pgxn-api/wiki simple API].<br />
<br />
A comprehensive [http://pgxnclient.projects.postgresql.org/ PGXN client] is being developed. It can be installed with:<br />
<br />
$ easy_install pgxnclient<br />
Searching for pgxnclient<br />
...<br />
Best match: pgxnclient 0.2.1<br />
Processing pgxnclient-0.2.1-py2.6.egg<br />
...<br />
Installed pgxnclient-0.2.1-py2.6.egg<br />
<br />
Among the other commands, it allows to search for extensions in the website:<br />
<br />
$ pgxn search pair<br />
pair 0.1.3<br />
... Usage There are two ways to construct key/value *pairs*: Via the<br />
*pair*() function: % SELECT *pair*('foo', 'bar'); *pair* ------------<br />
(foo,bar) Or by using the ~> operator: % SELECT 'foo' ~> 'bar';<br />
*pair*...<br />
<br />
semver 0.2.2<br />
*pair* │ 0.1.0 │ Key/value *pair* data type Note that "0.35.0b1" is less<br />
than "0.35.0", as required by the specification. Use ORDER BY to get<br />
more of a feel for semantic version ordering rules: SELECT...<br />
<br />
To build them and install in the system:<br />
<br />
$ pgxn install pair<br />
INFO: best version: pair 0.1.3<br />
INFO: saving /tmp/tmpezwyEO/pair-0.1.3.zip<br />
INFO: unpacking: /tmp/tmpezwyEO/pair-0.1.3.zip<br />
INFO: building extension<br />
...<br />
INFO: installing extension<br />
[sudo] password for piro: <br />
/bin/mkdir -p '/usr/local/pg91b1/share/postgresql/extension'<br />
...<br />
<br />
And to load them as database extensions:<br />
<br />
$ pgxn load -d mydb pair<br />
INFO: best version: pair 0.1.3<br />
CREATE EXTENSION<br />
<br />
==SQL/MED==<br />
Support for SQL/MED (Management of External Data) was started with 8.4. But now PostgreSQL can define foreign tables, which is the main purpose of SQL/MED: accessing external data. <br />
<br />
See the [[Foreign_data_wrappers|list of existing Foreign Data Wrapper extensions]], which includes Oracle, MySQL, CouchDB, Redis, Twitter, and more.<br />
<br />
Here is an example, using the file_fdw extension.<br />
<br />
We'll map a CSV file to a table.<br />
<br />
CREATE EXTENSION file_fdw WITH SCHEMA extensions;<br />
\dx+ file_fdw<br />
Objects in extension "file_fdw"<br />
Object Description <br />
----------------------------------------------------<br />
foreign-data wrapper file_fdw<br />
function extensions.file_fdw_handler()<br />
function extensions.file_fdw_validator(text[],oid)<br />
<br />
This next step is optional. It's just to show the 'CREATE FOREIGN DATA WRAPPER' syntax:<br />
<br />
=# CREATE FOREIGN DATA WRAPPER file_data_wrapper HANDLER extensions.file_fdw_handler;<br />
CREATE FOREIGN DATA WRAPPER<br />
<br />
The extension already creates a foreign data wrapper called file_fdw. We'll use it from now on.<br />
<br />
We need to create a 'server'. As we're only retrieving data from a file, it seems to be overkill,<br />
but SQL/MED is also capable of coping with remote databases.<br />
<br />
CREATE SERVER file FOREIGN DATA WRAPPER file_fdw ;<br />
CREATE SERVER<br />
<br />
Now, let's link a statistical_data.csv file to a statistical_data table:<br />
<br />
CREATE FOREIGN TABLE statistical_data (field1 numeric, field2 numeric) server file options (filename '/tmp/statistical_data.csv', format 'csv', delimiter ';') ;<br />
CREATE FOREIGN TABLE<br />
marc=# SELECT * from statistical_data ;<br />
field1 | field2 <br />
--------+--------<br />
0.1 | 0.2<br />
0.2 | 0.4<br />
0.3 | 0.9<br />
0.4 | 1.6<br />
<br />
For now, foreign tables are SELECT-only.<br />
<br />
TODO: does this also work with dblink ?<br />
<br />
=Backward compatibility issues=<br />
<br />
The next items are to be checked when migrating to 9.1.<br />
<br />
* The default value of ''standard_conforming_strings'' changed to ''on''<br />
<br />
Traditionally, PostgreSQL didn't treat ordinary string literals ('..') as the SQL standard specifies: backslashes ('\') were considered an escape character, so what was behind it was interpreted. For instance, \n is a newline character, \\ is a backslash character. It is more C-like.<br />
<br />
With 9.1, ''standard_conforming_strings'' now defaults to ''on'', meaning that ordinary string literals are now treated as the SQL standard specifies. It means that single quotes are to be protected with a second single quote instead of a backslash, and that backslashes aren't an escape character anymore.<br />
<br />
So, where previously it would have been <nowiki>'I can\'t', it now should be 'I can''t'.</nowiki><br />
<br />
There are several things to know:<br />
<br />
:* The old syntax is still available. Just put an E in front of the starting quote: E'I can\'t'<br />
:* ''standard_conforming_strings'' can still be set to ''off''<br />
:* Many programming languages already do what's correct, as long as you ask them to escape the strings for you. For instance, libpq's PQescapeLiteral detects automatically standard_conforming_strings' value.<br />
Still, double check your program is ready for this.<br />
<br />
* function-style and attribute-style data type casts for composite types is disallowed<br />
<br />
Since 8.4, it has been possible to cast almost anything to a text format.<br />
Let's try this with the previous foreign table:<br />
<br />
=# SELECT cast(statistical_data as text) from statistical_data ;<br />
statistical_data <br />
------------------<br />
(0.1,0.2)<br />
(0.2,0.4)<br />
(0.3,0.9)<br />
(0.4,1.6)<br />
(4 rows)<br />
<br />
The problem is that 8.4 and 9.0 gives us 4 syntaxes to do this:<br />
:* SELECT cast(statistical_data as text) from statistical_data ;<br />
:* SELECT statistical_data::text from statistical_data;<br />
:* SELECT statistical_data.text from statistical_data;<br />
:* SELECT text(statistical_data) from statistical_data;<br />
The two latter syntaxes aren't allowed anymore for composite types (such as a table record): they were too easy to accidentally use.<br />
<br />
* Casting checks for domains based on arrays have been tightened<br />
<br />
Now, PostgreSQL double-checks when you update an element of a constraint made upon an array.<br />
<br />
Here is how it behaved in 9.0:<br />
<br />
=#CREATE DOMAIN test_dom as int[] check (value[1] > 0);<br />
CREATE DOMAIN<br />
=#SELECT '{-1,0,0,0,0}'::test_dom;<br />
ERROR: value for domain test_dom violates check constraint "test_dom_check"<br />
<br />
Okay, that's normal<br />
<br />
=#CREATE TABLE test_dom_table (test test_dom);<br />
CREATE TABLE<br />
=# INSERT INTO test_dom_table values ('{1,0,0,0,0}');<br />
INSERT 0 1<br />
=# UPDATE test_dom_table SET test[1]=-1;<br />
UPDATE 1<br />
<br />
This isn't normal… it's not allowed by the check constraint. This is not possible anymore in 9.1, the check is performed correctly.<br />
<br />
* string_to_array() now return an empty array for a zero-length string. Previously this returned NULL.<br />
<br />
=# SELECT string_to_array('','whatever');<br />
string_to_array <br />
-----------------<br />
{}<br />
<br />
* string_to_array() now splits splits the string into characters if the separator is NULL. Previously this returned NULL.<br />
<br />
=# SELECT string_to_array('foo',NULL);<br />
string_to_array <br />
-----------------<br />
{f,o,o}<br />
<br />
* PL/pgSQL's RAISE without parameters changed<br />
<br />
This is a rare case, but one that caught people used to the Oracle way of doing it.<br />
<br />
Here is an example:<br />
<br />
CREATE OR REPLACE FUNCTION raise_demo () returns void language plpgsql as $$<br />
BEGIN<br />
RAISE NOTICE 'Main body';<br />
BEGIN<br />
RAISE NOTICE 'Sub-block';<br />
RAISE EXCEPTION serialization_failure; -- Simulate a problem<br />
EXCEPTION WHEN serialization_failure THEN<br />
BEGIN<br />
-- Maybe we had a serialization error<br />
-- Won't happen here of course<br />
RAISE DEBUG 'There was probably a serialization failure. It could be because of...';<br />
-- ..<br />
-- If I get there let's pretend I couldn't find a solution to the error<br />
RAISE; -- Let's forward the error<br />
EXCEPTION WHEN OTHERS THEN<br />
-- This should capture everything<br />
RAISE EXCEPTION 'Couldn t figure what to do with the error';<br />
END;<br />
END;<br />
END;<br />
$$<br />
;<br />
CREATE FUNCTION<br />
<br />
With 9.0, you get this (with ''client_min_messages'' set to ''debug''):<br />
=# SELECT raise_demo();<br />
NOTICE: Main body<br />
NOTICE: Sub-block<br />
DEBUG: There was probably a serialization failure. It could be because of...<br />
ERROR: serialization_failure<br />
<br />
<br />
With 9.1:<br />
=# SELECT raise_demo();<br />
NOTICE: Main body<br />
NOTICE: Sub-block<br />
DEBUG: There was probably a serialization failure. It could be because of...<br />
ERROR: Couldn t figure what to do with the error<br />
<br />
<br />
The difference is that RAISE without parameters, in 9.0, puts the code flow back to where the EXCEPTION occurred.<br />
In 9.1, the RAISE continues in the block where it occurs, so the inner BEGIN block isn't left when the RAISE is<br />
triggered. Its exception block is performed.<br />
<br />
=Performance improvements=<br />
<br />
* Synchronous writes have been optimized to less stress the filesystem.<br />
<br />
This one is hard to demonstrate. But performance and responsiveness (latency) has been greatly improved on write intensive loads.<br />
<br />
* Inheritance table in queries can now return meaningfully-sorted results, allow optimizations of MIN/MAX for inheritance<br />
<br />
If you're using a lot of inheritance, probably in a partitioning context, you're going to love these optimisations.<br />
<br />
The query planner got much smarter on the following case.<br />
<br />
Let's create a mockup schema:<br />
<br />
=# CREATE TABLE parent (a int);<br />
CREATE TABLE<br />
=# CREATE TABLE children_1 ( check (a between 1 and 10000000)) inherits (parent);<br />
CREATE TABLE<br />
=# CREATE TABLE children_2 ( check (a between 10000001 and 20000000)) inherits (parent);<br />
CREATE TABLE<br />
=# INSERT INTO children_1 select generate_series(1,10000000);<br />
INSERT 0 10000000<br />
=# INSERT INTO children_2 select generate_series(10000001,20000000);<br />
INSERT 0 10000000<br />
=# CREATE INDEX test_1 ON children_1 (a);<br />
CREATE INDEX;<br />
=# CREATE INDEX test_2 ON children_2 (a);<br />
CREATE INDEX;<br />
<br />
Let's ask for the 50 biggest values of a.<br />
<br />
SELECT * from parent order by a desc limit 50;<br />
<br />
It takes, on this small test machine, 13 seconds on a 9.0 database, and 0.8 ms on a 9.1.<br />
<br />
The 9.0 plan is:<br />
<br />
Limit (cost=952993.36..952993.48 rows=50 width=4)<br />
-> Sort (cost=952993.36..1002999.24 rows=20002354 width=4)<br />
Sort Key: public.parent.a<br />
-> Result (cost=0.00..288529.54 rows=20002354 width=4)<br />
-> Append (cost=0.00..288529.54 rows=20002354 width=4)<br />
-> Seq Scan on parent (cost=0.00..34.00 rows=2400 width=4)<br />
-> Seq Scan on children_1 parent (cost=0.00..144247.77 rows=9999977 width=4)<br />
-> Seq Scan on children_2 parent (cost=0.00..144247.77 rows=9999977 width=4)<br />
<br />
The 9.1 plan is:<br />
<br />
Limit (cost=113.75..116.19 rows=50 width=4)<br />
-> Result (cost=113.75..975036.98 rows=20002400 width=4)<br />
-> Merge Append (cost=113.75..975036.98 rows=20002400 width=4)<br />
Sort Key: public.parent.a<br />
-> Sort (cost=113.73..119.73 rows=2400 width=4)<br />
Sort Key: public.parent.a<br />
-> Seq Scan on parent (cost=0.00..34.00 rows=2400 width=4)<br />
-> Index Scan Backward using test_1 on children_1 parent (cost=0.00..303940.35 rows=10000000 width=4)<br />
-> Index Scan Backward using test_2 on children_2 parent (cost=0.00..303940.35 rows=10000000 width=4)<br />
<br />
The 9.0 plan means: I'll take every record from every table, sort them, and then return the 50 biggest.<br />
<br />
The 9.1 plan means: I'll take records from every table sorted, using their indexes if available, <br />
merge them as they come and return the 50 first ones.<br />
<br />
It was a very common trap, this type of queries became dramatically slower when one was partitionning its data.<br />
And it was a bit tricky to work around this by a query rewrite.<br />
<br />
* hash algorithms can now be used for full outer joins, and for arrays.<br />
<br />
This one can be demoed with a very simple example (for full outer joins):<br />
<br />
CREATE TABLE test1 (a int);<br />
CREATE TABLE test2 (a int);<br />
INSERT INTO test1 SELECT generate_series(1,100000);<br />
INSERT INTO test2 SELECT generate_series(100,1000);<br />
<br />
So we have a big test1 and a small test2 table.<br />
<br />
With 9.0, this query is done with this plan:<br />
<br />
EXPLAIN ANALYZE SELECT * FROM test1 FULL OUTER JOIN test2 USING (a);<br />
QUERY PLAN <br />
--------------------------------------------------------------------------------------------------------------------------<br />
Merge Full Join (cost=11285.07..11821.07 rows=100000 width=8) (actual time=330.092..651.618 rows=100000 loops=1)<br />
Merge Cond: (test1.a = test2.a)<br />
-> Sort (cost=11116.32..11366.32 rows=100000 width=4) (actual time=327.926..446.814 rows=100000 loops=1)<br />
Sort Key: test1.a<br />
Sort Method: external sort Disk: 1368kB<br />
-> Seq Scan on test1 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.011..119.246 rows=100000 loops=1)<br />
-> Sort (cost=168.75..174.75 rows=2400 width=4) (actual time=2.156..3.208 rows=901 loops=1)<br />
Sort Key: test2.a<br />
Sort Method: quicksort Memory: 67kB<br />
-> Seq Scan on test2 (cost=0.00..34.00 rows=2400 width=4) (actual time=0.009..1.066 rows=901 loops=1<br />
Total runtime: 733.368 ms<br />
<br />
With 9.1, this is the new plan:<br />
<br />
--------------------------------------------------------------------------------------------------------------------<br />
Hash Full Join (cost=24.27..1851.28 rows=100000 width=8) (actual time=2.536..331.547 rows=100000 loops=1)<br />
Hash Cond: (test1.a = test2.a)<br />
-> Seq Scan on test1 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.014..119.884 rows=100000 loops=1)<br />
-> Hash (cost=13.01..13.01 rows=901 width=4) (actual time=2.505..2.505 rows=901 loops=1)<br />
Buckets: 1024 Batches: 1 Memory Usage: 32kB<br />
-> Seq Scan on test2 (cost=0.00..13.01 rows=901 width=4) (actual time=0.017..1.186 rows=901 loops=1)<br />
Total runtime: 412.735 ms<br />
<br />
The 9.0 plan does 2 sorts. The 9.1 only needs to create a hash on the smallest table.<br />
<br />
Runtime is divided by almost 2 here. Another very interesting property is that the new plan has a much smaller startup cost:<br />
the first row is returned after 2 milliseconds, where it takes 330ms to return the first one using the old plan.<br />
<br />
SELECT * from test1 full outer join test2 using (a) LIMIT 10<br />
<br />
takes 330ms with 9.0, and 3ms with 9.1.<br />
<br />
<br />
=Administration=<br />
<br />
* Auto-tuning of wal_buffers.<br />
The wal_buffers setting is now auto-tuned when set at -1, its new default value. It's automatically set at 1/32 of shared_buffers, with a maximum at 16MB. One less parameter to take care of…<br />
<br />
<br />
* Record last reset in database and background writer-level statistics views.<br />
You can now know when stats have been reset last. For a database, for instance:<br />
<br />
SELECT datname, stats_reset FROM pg_stat_database;<br />
datname | stats_reset <br />
-----------+-------------------------------<br />
template1 | <br />
template0 | <br />
postgres | 2011-05-11 19:22:05.946641+02<br />
marc | 2011-05-11 19:22:09.133483+02<br />
<br />
* Columns showing the number of vacuum and analyze operations in pg_stat_*_tables views.<br />
<br />
It's now much easier to know which table get a lot of autovacuum attention:<br />
<br />
SELECT relname, last_vacuum, vacuum_count, last_autovacuum, autovacuum_count, last_analyze, analyze_count, last_autoanalyze, autoanalyze_count<br />
FROM pg_stat_user_tables <br />
WHERE relname in ('test1','test2');<br />
relname | last_vacuum | vacuum_count | last_autovacuum | autovacuum_count | last_analyze | analyze_count | last_autoanalyze | autoanalyze_count <br />
---------+-------------+--------------+-----------------+------------------+--------------+---------------+-------------------------------+-------------------<br />
test1 | | 0 | | 0 | | 0 | 2011-05-22 15:51:50.48562+02 | 1<br />
test2 | | 0 | | 0 | | 0 | 2011-05-22 15:52:50.325494+02 | 2<br />
<br />
<br />
<br />
=SQL and PL/PgSQL features=<br />
<br />
* Group by can guess some missing columns<br />
<br />
CREATE TABLE entities (entity_name text primary key, entity_address text);<br />
CREATE TABLE employees (employee_name text primary key, entity_name text references entities (entity_name));<br />
INSERT INTO entities VALUES ('HR', 'address1');<br />
INSERT INTO entities VALUES ('SALES', 'address2');<br />
INSERT INTO employees VALUES ('Smith', 'HR');<br />
INSERT INTO employees VALUES ('Jones', 'HR');<br />
INSERT INTO employees VALUES ('Taylor', 'SALES');<br />
INSERT INTO employees VALUES ('Brown', 'SALES');<br />
<br />
One can now write:<br />
<br />
SELECT count(*), entity_name, address<br />
FROM entities JOIN employees using (entity_name)<br />
GROUP BY entity_name;<br />
count | entity_name | address <br />
-------+-------------+----------<br />
2 | HR | address1<br />
2 | SALES | address2<br />
<br />
In 9.0, grouping on address would have been required too. As entity_name is the primary key of entities, address<br />
is functionally dependant on entity_name, so it's obvious PostgreSQL must group on it too.<br />
<br />
* New values can be added to an existing enum type via ALTER TYPE.<br />
<br />
=# CREATE TYPE package_status AS ENUM ('RECEIVED', 'DELIVERED'); ;<br />
CREATE TYPE<br />
=# ALTER TYPE package_status ADD VALUE 'READY FOR DELIVERY' AFTER 'RECEIVED';<br />
ALTER TYPE<br />
<br />
Until 9.0, one had to drop the type and create a new one. And that meant dropping all columns using that type. That was a major reason blocking adoption of the ENUM type.<br />
<br />
* Composite types can be modified through ALTER TYPE ... ADD/DROP/ALTER/RENAME ATTRIBUTE.<br />
<br />
Let's create a simple composite data type:<br />
<br />
=#CREATE TYPE package AS (destination text);<br />
<br />
Let's create a dummy function using this data type:<br />
<br />
=#CREATE FUNCTION package_exists (pack package) RETURNS boolean LANGUAGE plpgsql AS $$<br />
BEGIN<br />
RETURN true;<br />
END<br />
$$<br />
;<br />
<br />
Test this function:<br />
<br />
=#SELECT package_exists(row('test'));<br />
package_exists <br />
----------------<br />
t<br />
<br />
It works.<br />
<br />
Now we can alter the 'package' type:<br />
<br />
=#ALTER TYPE package ADD ATTRIBUTE received boolean;<br />
<br />
The type has changed:<br />
<br />
=#SELECT package_exists(row('test'));<br />
ERROR: cannot cast type record to package<br />
LINE 1: SELECT package_exists(row('test'));<br />
^<br />
DETAIL: Input has too few columns.<br />
=# SELECT package_exists(row('test',true));<br />
package_exists <br />
----------------<br />
t<br />
<br />
* ALTER TABLE ... ADD UNIQUE/PRIMARY KEY USING INDEX<br />
<br />
This will probably be used mostly to create a primary or unique key without locking a table for too long:<br />
<br />
=# CREATE UNIQUE INDEX CONCURRENTLY idx_pk ON test_pk (a);<br />
CREATE INDEX<br />
=# ALTER TABLE test_pk ADD primary key using index idx_pk;<br />
ALTER TABLE<br />
<br />
We'll get a write lock on test_pk only for the duration of the ALTER TABLE. The rest of the work will be done without disrupting users' work.<br />
<br />
This can also be used to rebuild primary key indices without locking the table during the whole rebuild:<br />
<br />
=# CREATE UNIQUE INDEX CONCURRENTLY idx_pk2 ON test_pk (a);<br />
=# BEGIN ;<br />
=# ALTER TABLE test_pk DROP CONSTRAINT idx_pk;<br />
=# ALTER TABLE test_pk ADD primary key using index idx_pk2;<br />
=# COMMIT ;<br />
<br />
* ALTER TABLE ... SET DATA TYPE can avoid table rewrites in appropriate cases.<br />
<br />
For example, converting a varchar column to text no longer requires a rewrite of the table.<br />
<br />
However, increasing the length constraint on a varchar column still requires a table rewrite (excerpt from the Changelog).<br />
<br />
This is self explaining. There are still cases to be covered, but this is a work in progress.<br />
<br />
* New CREATE TABLE IF NOT EXISTS syntax.<br />
<br />
You won't get an error if the table already exists, only a NOTICE.<br />
<br />
Be aware that it won't check your new definition is the one already in place.<br />
<br />
* New ENCODING option to COPY TO/FROM. This allows the encoding of the COPY file to be specified separately from client encoding.<br />
<br />
COPY test1 TO stdout ENCODING 'latin9'<br />
<br />
will now convert the encoding directly. No need to set client_encoding before the COPY anymore.<br />
<br />
* INSTEAD OF triggers on views.<br />
<br />
This feature can be used to implement fully updatable views. Here is an example.<br />
<br />
Let's continue on the employees/entities example.<br />
<br />
=#CREATE VIEW emp_entity AS SELECT employee_name, entity_name, address<br />
FROM entities JOIN employees USING (entity_name);<br />
<br />
To make this view updatable in 9.0, one had to write rules. This could rapidly turn into a nightmare, as rules are<br />
quite complex to write, and even harder to debug. This is how it was done: [http://www.postgresql.org/docs/9.1/static/rules-update.html rules update]<br />
<br />
Now we can do this with a trigger. Here is an example (there is only the INSERT part here):<br />
<br />
=#CREATE OR REPLACE FUNCTION dml_emp_entity () RETURNS trigger LANGUAGE plpgsql AS $$<br />
DECLARE<br />
vrecord RECORD;<br />
BEGIN<br />
IF TG_OP = 'INSERT' THEN<br />
-- Does the record exist in entity ?<br />
SELECT entity_name,address INTO vrecord FROM entities WHERE entity_name=NEW.entity_name;<br />
IF NOT FOUND THEN<br />
INSERT INTO entities (entity_name,address) VALUES (NEW.entity_name, NEW.address);<br />
ELSE<br />
IF vrecord.address != NEW.address THEN<br />
RAISE EXCEPTION 'There already is a record for % in entities. Its address is %. It conflics with your address %',<br />
NEW.entity_name, vrecord.address, NEW.address USING ERRCODE = 'unique_violation';<br />
END IF;<br />
END IF; -- Nothing more to do, the entity already exists and is OK<br />
-- We now try to insert the employee data. Let's directly try an INSERT<br />
BEGIN<br />
INSERT INTO employees (employee_name, entity_name) VALUES (NEW.employee_name, NEW.entity_name);<br />
EXCEPTION WHEN unique_violation THEN<br />
RAISE EXCEPTION 'There is already an employee with this name %', NEW.employee_name USING ERRCODE = 'unique_violation';<br />
END;<br />
RETURN NEW; -- The trigger succeeded<br />
END IF;<br />
END<br />
$$<br />
;<br />
<br />
We just have to declare our trigger now:<br />
<br />
=#CREATE TRIGGER trig_dml_emp_entity INSTEAD OF INSERT OR UPDATE OR DELETE ON emp_entity FOR EACH ROW EXECUTE PROCEDURE dml_emp_entity ();<br />
<br />
There are other advantages: a rule only rewrites the query. With the trigger, we added some logic, we could send more useful error messages. It makes<br />
it much easier to understand what went wrong. We also could trap exceptions. We have all the advantages of triggers over rules.<br />
<br />
* PL/PgSQL FOREACH IN ARRAY.<br />
<br />
It's become much easier to loop over an array in PL/PgSQL. Until now, the FOR construct only worked to loop in recordsets (query results).<br />
<br />
It can now be used to loop in arrays.<br />
<br />
Before 9.1, it could be written like this:<br />
<br />
=# CREATE OR REPLACE FUNCTION test_array (parray int[]) RETURNS int LANGUAGE plpgsql AS $$<br />
DECLARE<br />
vcounter int :=0;<br />
velement int;<br />
BEGIN<br />
FOR velement IN SELECT unnest (parray)<br />
LOOP<br />
vcounter:=vcounter+velement;<br />
END LOOP;<br />
RETURN vcounter;<br />
END<br />
$$<br />
;<br />
<br />
Now:<br />
<br />
=# CREATE OR REPLACE FUNCTION test_array (parray int[]) RETURNS int LANGUAGE plpgsql AS $$<br />
DECLARE<br />
vcounter int :=0;<br />
velement int;<br />
BEGIN<br />
FOREACH velement IN ARRAY parray<br />
LOOP<br />
vcounter:=vcounter+velement;<br />
END LOOP;<br />
RETURN vcounter;<br />
END<br />
$$<br />
;<br />
<br />
It's much easier to read, and it's faster to run.<br />
<br />
There is another benefit: we can slice the array when it is multidimensional. Here is an example, directly from the<br />
documentation:<br />
<br />
=#CREATE FUNCTION scan_rows(int[]) RETURNS void AS $$<br />
DECLARE<br />
x int[];<br />
BEGIN<br />
FOREACH x SLICE 1 IN ARRAY $1<br />
LOOP<br />
RAISE NOTICE 'row = %', x;<br />
END LOOP;<br />
END;<br />
$$ LANGUAGE plpgsql;<br />
<br />
=#SELECT scan_rows(ARRAY[[1,2,3],[4,5,6],[7,8,9],[10,11,12]]);<br />
NOTICE: row = {1,2,3}<br />
NOTICE: row = {4,5,6}<br />
NOTICE: row = {7,8,9}<br />
NOTICE: row = {10,11,12}<br />
<br />
[[Category:PostgreSQL 9.1]]</div>Marco44https://wiki.postgresql.org/index.php?title=What%27s_new_in_PostgreSQL_9.1/fr&diff=15239What's new in PostgreSQL 9.1/fr2011-08-23T14:18:53Z<p>Marco44: display a french title</p>
<hr />
<div>{{Languages}}<br />
<span style="font-size:188%;color:#E65600">Quoi de neuf dans PostgreSQL 9.1</span><br />
<br />
Ce document présente, si possible par l'exemple, un grand nombre des nouveautés de PostgreSQL 9.1, comparé à la version majeure précédente - PostgreSQL 9.0. Il y a de nombreuses nouveautés dans cette version, cette page de wiki ne couvre donc que les changements les plus importants en détail. La liste complète des modifications se trouve dans le chapitre [http://docs.postgresql.fr/9.1/release.html Notes de version] de la documentation officielle.<br />
<br />
===Nouveautés majeures===<br />
<br />
==Réplication synchrone et autres fonctionnalités de réplication==<br />
<br />
Il y a un certain nombre de nouvelles fonctionnalités autour de la réplication en 9.1:<br />
<br />
<br />
* En 9.0, l'utilisateur servant à la réplication devait être superutilisateur. Ce n'est plus le cas, il y a un nouvel attribut appelé 'replication'.<br />
<br />
CREATE ROLE replication_role REPLICATION LOGIN PASSWORD 'pwd_replication'.<br />
<br />
Ce rôle peut alors être ajouté au pg_hba.conf, et être utilisé pour la streaming replication. C'est évidemment préférable, d'un point de vue sécurité, que d'avoir un rôle superutilisateur dédié à cela.<br />
<br />
Maintenant que nous avons une instance créée, ainsi qu'un utilisateur de réplication, nous pouvons mettre en place la streaming replication. Il ne s'agit que d'ajouter la permission de se connecter à la base virtuelle 'replication' dans "pg_hba.conf", positionner ''wal_level'', l'archivage (''archive_mode'' et ''archive_command'') et ''max_wal_senders'', ce qui est déjà traité dans le billet sur les nouveautés de la 9.0.<br />
<br />
Quand l'instance est prête pour le streaming, nous pouvons montrer la seconde nouveauté.<br />
<br />
* pg_basebackup.<br />
<br />
Cet outil permet de cloner une base, ou en faire une sauvegarde, en n'utilisant que le protocole réseau PostgreSQL. Il n'y a pas besoin d'appeler "pg_start_backup()", puis réaliser une copie manuelle et enfin appeler "pg_stop_backup()". pg_basebackup effectue tout ce travail en une seule commande. Pour la démonstration, nous allons cloner l'instance en cours de fonctionnement vers /tmp/newcluster.<br />
<br />
> pg_basebackup -D /tmp/newcluster -U replication -v<br />
Password: <br />
NOTICE: pg_stop_backup complete, all required WAL segments have been archived<br />
pg_basebackup: base backup completed<br />
<br />
Cette nouvelle instance est prête à démarrer: ajoutez simplement un fichier "recovery.conf" avec une "restore_command" pour récupérer les fichiers archivés, et démarrez la nouvelle instance. pg_basebackup peut aussi fabriquer un tar, ou inclure tous les fichiers xlog requis (pour avoir une sauvegarde totalement autonome).<br />
<br />
Comme nous allons maintenant montrer la réplication synchrone, préparons un "recovery.conf" pour se connecter à la base maître et récupérer les enregistrements au fil de l'eau.<br />
<br />
Le fichier va ressembler à ceci<br />
<br />
restore_command = 'cp /tmp/%f %p'<br />
standby_mode = on<br />
primary_conninfo = 'host=localhost port=59121 user=replication password=replication application_name=newcluster'<br />
trigger_file = '/tmp/trig_f_newcluster'<br />
<br />
Puis nous démarrons la nouvelle instance:<br />
<br />
pg_ctl -D /tmp/newcluster start<br />
<br />
LOG: database system was interrupted; last known up at 2011-05-22 17:15:45 CEST<br />
LOG: entering standby mode<br />
LOG: restored log file "00000001000000010000002F" from archive<br />
LOG: redo starts at 1/2F000020<br />
LOG: consistent recovery state reached at 1/30000000<br />
LOG: database system is ready to accept read only connections<br />
cp: cannot stat « /tmp/000000010000000100000030 »: No such file or directory<br />
LOG: streaming replication successfully connected to primary<br />
<br />
Nous avons notre esclave, et il récupère les données provenant du maître par le mode «streaming», mais nous sommes toujours en asynchrone. Notez que nous avons positionné un paramètre "application_name" dans la chaîne de connexion du "recovery.conf".<br />
<br />
* Réplication synchrone<br />
<br />
Pour que la réplication devienne synchrone, c'est très simple, il suffit de positionner ceci dans le postgresql.conf du maître:<br />
<br />
synchronous_standby_names = 'newcluster'<br />
<br />
C'est bien sûr l'"application_name" provenant du "primary_conninfo" de l'esclave. Un «pg_ctl_reload», et le nouveau paramètre est pris en compte. Maintenant, tout «COMMIT» sur le maître ne sera considéré comme terminé que quand l'esclave l'aura écrit sur son propre journal, et l'aura notifié au maître.<br />
<br />
Un petit avertissement: les transactions sont considérées comme validées quand elles sont écrites dans le journal de l'esclave, pas quand elles sont visibles sur l'esclave. Cela veut dire qu'il y a toujours un délai entre le moment où une transaction est validée sur le maître, et le moment où elle est visible sur l'esclave. La réplication est tout de même synchrone: vous ne perdrez pas de données dans le cas du crash d'un maître.<br />
<br />
La réplication synchrone peut être réglée assez finement: elle est contrôlable par session. Le paramètre "synchronous_commit" peut être désactivé (il est évidemment actif par défaut) par session, si celle-ci n'a pas besoin de cette garantie de réplication synchrone. Si, dans votre transaction, vous n'avez pas besoin de la réplication synchrone, faites simplement<br />
SET synchronous_commit TO off<br />
et vous ne paierez pas la pénalité due à l'attente de l'esclave.<br />
<br />
Il y a quelques autres nouveautés à mentionner pour la réplication:<br />
<br />
* Les esclaves peuvent maintenant demander au maître de ne pas nettoyer par VACUUM les enregistrements dont ils pourraient encore avoir besoin.<br />
<br />
C'était une des principales difficultés du paramétrage de la réplication en 9.0, si on souhaitait utiliser l'esclave: un VACUUM pouvait détruire des enregistrements qui étaient encore nécessaires à l'exécution des requêtes de l'esclave, engendrant des conflits de réplication. L'esclave avait alors à faire un choix: soit tuer la requête en cours d'exécution, soit accepter de retarder l'application des modifications générées par le VACUUM (et toutes celles qui le suivent bien sûr), et donc prendre du retard. On pouvait contourner le problème, en positionnant "vacuum_defer_cleanup_age" à une valeur non nulle, mais c'était difficile de trouver une bonne valeur. La nouvelle fonctionnalité est activée en positionnant "hot_standby_feedback", sur les bases de standby. Bien sûr, cela entraîne que la base de standby va pouvoir empêcher VACUUM de faire son travail de maintenance sur le maître, s'il y a des requêtes très longues qui s'exécutent sur l'esclave.<br />
<br />
* pg_stat_replication est une nouvelle vue système.<br />
<br />
Elle affiche, sur le maître, l'état de tous les esclaves: combien de WAL ils ont reçu, s'ils sont connectés, synchrones, où ils en sont de l'application des modifications:<br />
<br />
=# SELECT * from pg_stat_replication ;<br />
procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state <br />
---------+----------+-------------+------------------+-------------+-----------------+-------------+------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------<br />
17135 | 16671 | replication | newcluster | 127.0.0.1 | | 43745 | 2011-05-22 18:13:04.19283+02 | streaming | 1/30008750 | 1/30008750 | 1/30008750 | 1/30008750 | 1 | sync<br />
<br />
Il n'est dont plus nécessaire d'exécuter des requêtes sur les esclaves pour connaître leur état par rapport au maître.<br />
<br />
* pg_stat_database_conflicts est une autre vue système.<br />
<br />
Celle ci est sur la base de standby, et montre combien de requêtes ont été annulées, et pour quelles raisons:<br />
<br />
=# SELECT * from pg_stat_database_conflicts ;<br />
datid | datname | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock <br />
-------+-----------+------------------+------------+----------------+-----------------+----------------<br />
1 | template1 | 0 | 0 | 0 | 0 | 0<br />
11979 | template0 | 0 | 0 | 0 | 0 | 0<br />
11987 | postgres | 0 | 0 | 0 | 0 | 0<br />
16384 | marc | 0 | 0 | 1 | 0 | 0<br />
<br />
* la réplication peut maintenant être mise en pause sur un esclave.<br />
<br />
Appelez tout simplement ''pg_xlog_replay_pause()'' pour mettre en pause, et ''pg_xlog_replay_resume()'' pour reprendre. Cela gèlera la base, ce qui en fait un excellent outil pour réaliser des sauvegardes cohérentes.<br />
<br />
''pg_is_xlog_replay_paused()'' permet de connaître l'état actuel.<br />
<br />
On peut aussi demander à PostgreSQL de mettre l'application des journaux en pause à la fin de la récupération d'instance, sans passer la base en production, pour permettre à l'administrateur d'exécuter des requêtes sur la base. L'administrateur peut alors vérifier si le point de récupération atteint est correct, avant de mettre fin à la réplication. Ce nouveau paramètre est "pause_at_recovery_target", et se positionne dans le recovery.conf.<br />
<br />
* On peut créer des points de récupération (Restore Points)<br />
<br />
Ce ne sont rien de plus que des points nommés dans le journal de transactions.<br />
<br />
Il peuvent être utilisés en spécifiant un "recovery_target_name" à la place d'un "recovery_target_time" ou un "recovery_target_xid" dans le fichier recovery.conf.<br />
<br />
Ils sont créés en appelant "pg_create_restore_point()".<br />
<br />
==Collations par colonne==<br />
<br />
L'ordre de collation n'est plus unique dans une base de données.<br />
<br />
Imaginons que vous utilisiez une base en 9.0, avec un encodage UTF8, et une collation de_DE.utf8 (tri alphabétique), parce que la plupart de vos utilisateurs parlent allemand. Si vous avez des données françaises à stocker aussi, et que vous avez besoin de les trier, les utilisateurs français les plus pointilleux ne seraient pas satisfaits:<br />
<br />
SELECT * from (values ('élève'),('élevé'),('élever'),('Élève')) as tmp order by column1;<br />
column1 <br />
---------<br />
élevé<br />
élève<br />
Élève<br />
élever<br />
<br />
Pour être honnête, ce n'est pas si mal. Mais ce n'est pas l'ordre alphabétique français: les caractères accentués sont considérés comme non accentués durant une première passe de tri. Ensuite, on effectue une seconde passe, où on considère que les caractères accentués sont après les non accentués (dans un ordre bien précis). Mais, pour que la chose soit plus amusante, le tri est fait du dernier au premier caractère dans cette seconde passe, et non plus du premier au dernier. Évidemment, la règle n'est pas la même en allemand.<br />
<br />
En 9.1, vous disposez de deux nouvelles fonctionnalités:<br />
<br />
* Vous pouvez spécifier la collation dans une requête:<br />
<br />
SELECT * FROM (VALUES ('élève'),('élevé'),('élever'),('Élève')) AS tmp ORDER BY column1 COLLATE "fr_FR.utf8";<br />
column1 <br />
---------<br />
élève<br />
Élève<br />
élevé<br />
élever<br />
<br />
* Vous pouvez définir la collation au moment de la déclaration de la table:<br />
<br />
CREATE TABLE french_messages (message TEXT COLLATE "fr_FR.utf8");<br />
INSERT INTO french_messages VALUES ('élève'),('élevé'),('élever'),('Élève');<br />
SELECT * FROM french_messages ORDER BY message;<br />
message <br />
---------<br />
élève<br />
Élève<br />
élevé<br />
élever<br />
<br />
Et bien sûr, vous pouvez créer un index sur la colonne message, qui pourra être utilisé pour trier rapidement en français. Par exemple, avec une table plus grande et sans collation précisée:<br />
<br />
CREATE TABLE french_messages2 (message TEXT);<br />
INSERT INTO french_messages2 SELECT * FROM french_messages, generate_series(1,100000); -- 400k lignes<br />
CREATE INDEX idx_french_ctype ON french_messages2 (message COLLATE "fr_FR.utf8");<br />
EXPLAIN SELECT * FROM french_messages2 ORDER BY message;<br />
QUERY PLAN <br />
-------------------------------------------------------------------------------<br />
Sort (cost=62134.28..63134.28 rows=400000 width=32)<br />
Sort Key: message<br />
-> Seq Scan on french_messages2 (cost=0.00..5770.00 rows=400000 width=32)<br />
<br />
EXPLAIN SELECT * FROM french_messages2 ORDER BY message COLLATE "fr_FR.utf8";<br />
QUERY PLAN <br />
--------------------------------------------------------------------------------------------------<br />
Index Scan using idx_french_ctype on french_messages2 (cost=0.00..17139.15 rows=400000 width=8)<br />
<br />
==Unlogged Tables/Tables non journalisées==<br />
<br />
Ces tables peuvent être utilisées pour stocker des données éphémères. Une table non journalisée est bien plus rapide à écrire, mais elle ne survivra pas à un crash (elle sera tronquée au redémarrage de l'instance en cas de crash).<br />
<br />
Elles n'ont pas pas le coût de maintenance associé à la journalisation, elles sont donc bien plus rapides à écrire.<br />
<br />
Voici un exemple (idiot, l'exemple):<br />
<br />
# CREATE TABLE test (a int);<br />
CREATE TABLE<br />
# CREATE UNLOGGED table testu (a int);<br />
CREATE TABLE<br />
# CREATE INDEX idx_test on test (a);<br />
CREATE INDEX<br />
# CREATE INDEX idx_testu on testu (a);<br />
CREATE INDEX<br />
=# \timing <br />
Timing is on.<br />
=# INSERT INTO test SELECT generate_series(1,1000000);<br />
INSERT 0 1000000<br />
Time: 17601,201 ms<br />
=# INSERT INTO testu SELECT generate_series(1,1000000);<br />
INSERT 0 1000000<br />
Time: 3439,982 ms<br />
<br />
Elles sont donc très efficace pour des données de cache, ou pour n'importe quoi qui puisse être reconstruit après un crash.<br />
<br />
==Extensions==<br />
<br />
Ce point et le suivant sont l'occasion de présenter plusieurs fonctionnalités d'un coup. Nous allons commencer par installer pg_trgm, et c'est maintenant une extension.<br />
<br />
Installons donc pg_trgm. Jusqu'à la 9.0, nous devions lancer un script manuellement. La commande ressemblait à ceci:<br />
<br />
\i /usr/local/pgsql/share/contrib/pg_trgm.sql<br />
<br />
Cela entraînait des problèmes de maintenance: les fonctions créées allaient par défaut dans le schéma public, elles étaient envoyées telles quelles dans les fichiers pg_dump, ne se restauraient souvent pas bien, puisqu'elles dépendaient souvent d'objets binaires externes, ou pouvaient changer de définition entre les différentes versions de PostgreSQL.<br />
<br />
Avec la 9.1, vous pouvez utiliser la commande CREATE EXTENSION:<br />
<br />
CREATE EXTENSION [ IF NOT EXISTS ] extension_name<br />
[ WITH ] [ SCHEMA schema ]<br />
[ VERSION version ]<br />
[ FROM old_version ]<br />
<br />
Les options les plus importantes sont "extension_name", bien sûr, et "schema": les extensions peuvent être stockées dans un schéma.<br />
<br />
Installons donc pg_trgm, pour l'exemple qui va suivre:<br />
<br />
=# CREATE schema extensions;<br />
CREATE SCHEMA<br />
<br />
=# CREATE EXTENSION pg_trgm WITH SCHEMA extensions;<br />
CREATE EXTENSION<br />
<br />
Maintenant, pg_trgm est installé dans un schéma "extensions". Il sera inclus dans les exports de base correctement, avec la syntaxe CREATE EXTENSION. Par conséquent, si quelque chose change dans l'extension, elle sera restaurée avec la nouvelle définition.<br />
<br />
La liste des extensions peut être obtenue comme suit dans psql:<br />
\dx<br />
List of installed extensions<br />
Name | Version | Schema | Description <br />
----------+---------+------------+-------------------------------------------------------------------<br />
pg_trgm | 1.0 | extensions | text similarity measurement and index searching based on trigrams<br />
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language<br />
(2 rows)<br />
<br />
<br />
==K-Nearest-Neighbor Indexing/Indexation des k plus proches voisins==<br />
<br />
Les index GIST peuvent maintenant être utilisés pour retourner des enregistrements triés, si la notion de distance a une signification pour ces données, et qu'on peut en fournir une définition. Pour le moment, ce travail a été effectué pour le type 'point', l'extension 'pg_trgm' et plusieurs types de données btree_gist. Cette fonctionnalité est fournie à tous les types de données, il y en aura donc probablement d'autres qui l'implémenteront dans un futur proche.<br />
<br />
Pour l'heure, voici donc un exemple avec pg_trgm. pg_trgm utilise des trigrammes pour comparer des chaînes. Voici les trigrammes pour la chaîne 'hello':<br />
<br />
SELECT show_trgm('hello');<br />
show_trgm <br />
---------------------------------<br />
{" h"," he",ell,hel,llo,"lo "}<br />
<br />
Les trigrammes sont utilisés pour évaluer la similarité (entre 0 et 1) entre des chaînes. Il y a donc une notion de distance, et on peut la définir par '1-similarité'.<br />
<br />
Voici un exemple. La table contient 5 millions d'enregistrements et pèse 750Mo.<br />
<br />
CREATE TABLE test_trgm ( text_data text);<br />
<br />
CREATE INDEX test_trgm_idx on test_trgm using gist (text_data extensions.gist_trgm_ops);<br />
<br />
Jusqu'à la 9.0, si nous voulons les deux text_data les plus proches de 'hello' dans la table, la requête était celle-ci:<br />
<br />
SELECT text_data, similarity(text_data, 'hello')<br />
FROM test_trgm <br />
WHERE text_data % 'hello'<br />
ORDER BY similarity(text_data, 'hello')<br />
LIMIT 2;<br />
<br />
Sur cette base de test, il faut environ 2 secondes pour obtenir le résultat.<br />
<br />
Avec 9.1 et la nouvelle fonctionnalité KNN, on peut l'écrire comme ceci:<br />
<br />
SELECT text_data, text_data <-> 'hello'<br />
FROM test_trgm <br />
ORDER BY text_data <-> 'hello'<br />
LIMIT 2;<br />
<br />
L'opérateur <-> est l'opérateur de distance. La requête dure 20 millisecondes, et passe par l'index pour récupérer directement les deux meilleurs enregistrements.<br />
<br />
Tant que nous parlons de pg_trgm, une autre fonctionnalité apparaissant en 9.1 est que les opérateurs LIKE et ILIKE peuvent maintenant utiliser automatiquement un index trgm. Toujours sur la même table:<br />
<br />
SELECT text_data<br />
FROM test_trgm<br />
WHERE text_data like '%hello%';<br />
<br />
utilise l'index test_trgm_idx (au lieu de parcourir la table entière).<br />
<br />
Attention tout de même: les index trgm sont très volumineux, et coûteux à maintenir.<br />
<br />
==Serializable Snapshot Isolation/Isolation par Instantanés Sérialisable/SSI==<br />
<br />
Cette fonctionnalité est très utile si vous avez besoin que toutes vos transactions se comportent comme si elles s'exécutaient les unes après les autres, sans trop sacrifier les performances, comme c'est le cas pour la plupart des implémentations d'isolation «sérialisable» actuelles (elles s'appuient habituellement sur le verrouillage de tous les enregistrements accédés).<br />
<br />
Comme cette fonctionnalité est complexe à montrer et à expliquer, voici un lien vers l'explication complète de cette fonctionnalité: http://wiki.postgresql.org/wiki/SSI/fr<br />
<br />
==Writeable Common Table Expressions/Expression de Table Commune en Écriture==<br />
<br />
Cela étend la syntaxe WITH introduite en 8.4. Dorénavant, des requêtes de modification de données peuvent être utilisées dans la partie WITH de la requête, et les données retournées par cet ordre utilisées.<br />
<br />
Imaginons que nous voulons archiver tous les enregistrements correspondant à %hello% de la table test_trgm:<br />
<br />
CREATE TABLE old_text_data (text_data text);<br />
<br />
WITH deleted AS (DELETE FROM test_trgm WHERE text_data like '%hello%' RETURNING text_data)<br />
INSERT INTO old_text_data SELECT * FROM deleted;<br />
<br />
Tout en une seule requête (donc en une seule passe sur test_trgm).<br />
<br />
==SE-Postgres==<br />
<br />
PostgreSQL est la seule base qui propose une intégration complète avec le framework de sécurisation SELinux. Sécurité de niveau militaire pour votre base de données.<br />
TODO<br />
<br />
==PGXN==<br />
<br />
[http://pgxn.org/ PGXN] est le PostgreSQL Extension Network (le réseau d'extensions PostgreSQL), un système de distribution centralisée pour les bibliothèques d'extension PostgreSQL open-source. Les auteurs d'extensions peuvent [http://manager.pgxn.org/ soumettre leur travail] en même temps que [http://pgxn.org/spec/ les métadonnées le décrivant]: les packages et leur documentation sont [http://pgxn.org/ indexés] et distribués sur plusieurs serveurs. Le système peut être utilisé au travers d'une interface web ou en utilisant des clients en ligne de commande grâce à une [https://github.com/pgxn/pgxn-api/wiki API simple].<br />
<br />
Un [http://pgxnclient.projects.postgresql.org/ client PGXN] complet est en cours de développement. Il peut être installé avec:<br />
<br />
$ easy_install pgxnclient<br />
Searching for pgxnclient<br />
...<br />
Best match: pgxnclient 0.2.1<br />
Processing pgxnclient-0.2.1-py2.6.egg<br />
...<br />
Installed pgxnclient-0.2.1-py2.6.egg<br />
<br />
Il permet entre autres de rechercher des extensions sur le site web:<br />
<br />
$ pgxn search pair<br />
pair 0.1.3<br />
... Usage There are two ways to construct key/value *pairs*: Via the<br />
*pair*() function: % SELECT *pair*('foo', 'bar'); *pair* ------------<br />
(foo,bar) Or by using the ~> operator: % SELECT 'foo' ~> 'bar';<br />
*pair*...<br />
<br />
semver 0.2.2<br />
*pair* │ 0.1.0 │ Key/value *pair* data type Note that "0.35.0b1" is less<br />
than "0.35.0", as required by the specification. Use ORDER BY to get<br />
more of a feel for semantic version ordering rules: SELECT...<br />
<br />
Pour compiler et installer sur le système:<br />
<br />
$ pgxn install pair<br />
INFO: best version: pair 0.1.3<br />
INFO: saving /tmp/tmpezwyEO/pair-0.1.3.zip<br />
INFO: unpacking: /tmp/tmpezwyEO/pair-0.1.3.zip<br />
INFO: building extension<br />
...<br />
INFO: installing extension<br />
[sudo] password for piro: <br />
/bin/mkdir -p '/usr/local/pg91b1/share/postgresql/extension'<br />
...<br />
<br />
Et pour les charger en tant qu'extension de base de données:<br />
<br />
$ pgxn load -d mydb pair<br />
INFO: best version: pair 0.1.3<br />
CREATE EXTENSION<br />
<br />
==SQL/MED==<br />
<br />
Le support de SQL/MED (Management of External Data ou Gestion de Données Externes) a été démarré en 8.4. Maintenant, PostgreSQL peut définir des tables externes, ce qui est le but principal de SQL/MED: accéder à des données externes. Voici un exemple, s'appuyant sur l'extension file_fdw.<br />
<br />
Nous allons accéder à un fichier CSV au travers d'une table.<br />
<br />
CREATE EXTENSION file_fdw WITH SCHEMA extensions;<br />
\dx+ file_fdw<br />
Objects in extension "file_fdw"<br />
Object Description <br />
----------------------------------------------------<br />
foreign-data wrapper file_fdw<br />
function extensions.file_fdw_handler()<br />
function extensions.file_fdw_validator(text[],oid)<br />
<br />
L'étape suivante est optionnelle. Elle est là juste pour montrer la syntaxe de 'CREATE FOREIGN DATA WRAPPER' (le foreign data wrapper étant en quelque sorte le connecteur pour un type de données externes):<br />
<br />
=# CREATE FOREIGN DATA WRAPPER file_data_wrapper HANDLER extensions.file_fdw_handler;<br />
CREATE FOREIGN DATA WRAPPER<br />
<br />
L'extension crée déjà un «foreign data wrapper» appelé file_fdw. Nous allons l'utiliser à partir de maintenant.<br />
<br />
Nous avons besoin de créer un 'server'. Comme les données que nous allons récupérer ne proviennent que d'un fichier, cela semble un peu inutile, mais SQL/MED est aussi capable de gérer des bases de données distantes.<br />
<br />
CREATE SERVER file FOREIGN DATA WRAPPER file_fdw ;<br />
CREATE SERVER<br />
<br />
Maintenant, attachons un fichier statistical_data.csv à une table statistical_data:<br />
<br />
CREATE FOREIGN TABLE statistical_data (field1 numeric, field2 numeric) server file options (filename '/tmp/statistical_data.csv', format 'csv', delimiter ';') ;<br />
CREATE FOREIGN TABLE<br />
marc=# SELECT * from statistical_data ;<br />
field1 | field2 <br />
--------+--------<br />
0.1 | 0.2<br />
0.2 | 0.4<br />
0.3 | 0.9<br />
0.4 | 1.6<br />
<br />
Pour le moment, les foreign tables ne sont accessibles qu'en SELECT.<br />
<br />
TODO: does this also work with dblink ?<br />
<br />
=Modifications pouvant entraîner des régressions=<br />
<br />
Les points suivants doivent être vérifiés lors d'une migration vers la version 9.1.<br />
<br />
* La valeur par défaut de ''standard_conforming_strings'' est devenue ''on''<br />
<br />
Traditionnellement, PostgreSQL ne traitait pas les littéraux de type chaîne ('..') comme le spécifie le standard SQL: les anti-slashs ('\') étaient considérés comme des caractères d'échappement, ce qui entraînait que le caractère suivant un '\' était interprété. Par exemple, '\n' est un caractère newline, '\\' est le caractère '\' lui-même. Cela s'apparentait davantage à la syntaxe du C.<br />
<br />
En 9.1, ''standard_conforming_strings'' est maintenant par défaut à ''on'', ce qui signifie que les littéraux de type chaîne sont maintenant traités comme spécifié par le standard SQL. Ce qui signifie que les caractères apostrophe doivent maintenant être protégés avec une deuxième apostrophe plutôt qu'un anti-slash, et que les anti-slashs ne sont plus des caractères d'échappement.<br />
<br />
Par exemple, quand précédemment on écrivait <nowiki>'l\'heure', on doit maintenant écrire 'l''heure'.</nowiki><br />
<br />
Certaines subtilités sont à connaître, même si elles ne sont pas apparues en 9.1:<br />
<br />
:* L'ancienne syntaxe est toujours disponible. Mettez simplement un E devant le guillemet de départ: E'l\'heure'<br />
:* ''standard_conforming_strings'' peut toujours être remis à ''off''<br />
:* Beaucoup de langages de programmation font déjà ce qu'il faut, si vous leur demandez de faire le travail d'échappement pour vous. Par exemple, la fonction PQescapeLiteral de la libpq détecte automatiquement la valeur de standard_conforming_strings et s'y adapte.<br />
Toutefois, vérifiez bien que votre programme est prêt à supporter ce changement de comportement.<br />
<br />
* les conversions de type de données de style 'fonction' ou 'attribut' ne sont plus autorisés pour les types composites<br />
<br />
Depuis la version 8.4, il est possible de convertir à peu près n'importe quoi vers son format texte.<br />
Essayons cela avec la foreign table définie précédemment:<br />
<br />
=# SELECT cast(statistical_data as text) from statistical_data ;<br />
statistical_data <br />
------------------<br />
(0.1,0.2)<br />
(0.2,0.4)<br />
(0.3,0.9)<br />
(0.4,1.6)<br />
(4 rows)<br />
<br />
Le problème c'est que les versions 8.4 et 9.0 nous donnent 4 syntaxes différentes pour effectuer cela:<br />
:* SELECT cast(statistical_data as text) from statistical_data ;<br />
:* SELECT statistical_data::text from statistical_data;<br />
:* SELECT statistical_data.text from statistical_data;<br />
:* SELECT text(statistical_data) from statistical_data;<br />
les deux dernières syntaxes ne sont plus autorisées pour les types composites (comme un enregistrement de table): ils étaient bien trop faciles à utiliser accidentellement.<br />
<br />
* Les vérifications de conversion sur les domaines définis à partir de tableaux ont été renforcées<br />
<br />
Maintenant, PostgreSQL vérifie quand vous faites une mise à jour d'un élément d'une contrainte définie sur un tableau.<br />
<br />
Voici ce qui se passait en 9.0:<br />
<br />
=#CREATE DOMAIN test_dom as int[] check (value[1] > 0);<br />
CREATE DOMAIN<br />
=#SELECT '{-1,0,0,0,0}'::test_dom;<br />
ERROR: value for domain test_dom violates check constraint "test_dom_check"<br />
<br />
Jusque là, tout va bien.<br />
<br />
=#CREATE TABLE test_dom_table (test test_dom);<br />
CREATE TABLE<br />
=# INSERT INTO test_dom_table values ('{1,0,0,0,0}');<br />
INSERT 0 1<br />
=# UPDATE test_dom_table SET test[1]=-1;<br />
UPDATE 1<br />
<br />
Par contre, là, c'est anormal… la contrainte check nous interdit de le faire. C'est maintenant impossible en 9.1, la vérification est faite correctement.<br />
<br />
* string_to_array() retourne maintenant un tableau vide pour une chaîne d'entrée de longueur zéro. Précédemment, cela retournait NULL.<br />
<br />
=# SELECT string_to_array('','whatever');<br />
string_to_array <br />
-----------------<br />
{}<br />
<br />
* string_to_array() découpe maintenant une chaîne en ses caractères si le séparateur est NULL. Précédemment, cela retournait NULL:<br />
<br />
=# SELECT string_to_array('foo',NULL);<br />
string_to_array <br />
-----------------<br />
{f,o,o}<br />
<br />
* PL/pgSQL's RAISE sans paramètre a changé de comportement.<br />
<br />
C'est un cas assez rare, mais qui piégeait les utilisateurs habitués au comportement d'Oracle sur ce point.<br />
<br />
Voici un exemple:<br />
<br />
CREATE OR REPLACE FUNCTION raise_demo () returns void language plpgsql as $$<br />
BEGIN<br />
RAISE NOTICE 'Main body';<br />
BEGIN<br />
RAISE NOTICE 'Sub-block';<br />
RAISE EXCEPTION serialization_failure; -- Simulate a problem<br />
EXCEPTION WHEN serialization_failure THEN<br />
BEGIN<br />
-- Maybe we had a serialization error<br />
-- Won't happen here of course<br />
RAISE DEBUG 'There was probably a serialization failure. It could be because of...';<br />
-- ..<br />
-- If I get there let's pretend I couldn't find a solution to the error<br />
RAISE; -- Let's forward the error<br />
EXCEPTION WHEN OTHERS THEN<br />
-- This should capture everything<br />
RAISE EXCEPTION 'Couldn t figure what to do with the error';<br />
END;<br />
END;<br />
END;<br />
$$<br />
;<br />
CREATE FUNCTION<br />
<br />
En 9.0, vous aurez ce résultat (avec ''client_min_messages'' à ''debug''):<br />
=# SELECT raise_demo();<br />
NOTICE: Main body<br />
NOTICE: Sub-block<br />
DEBUG: There was probably a serialization failure. It could be because of...<br />
ERROR: serialization_failure<br />
<br />
<br />
En 9.1:<br />
=# SELECT raise_demo();<br />
NOTICE: Main body<br />
NOTICE: Sub-block<br />
DEBUG: There was probably a serialization failure. It could be because of...<br />
ERROR: Couldn t figure what to do with the error<br />
<br />
La différence est que RAISE sans paramètres, en 9.0, ramène le déroulement du code à l'endroit où l'EXCEPTION s'est déclenchée. En 9.1, le RAISE continue dans le bloc dans lequel il se produit, le bloc BEGIN intérieur n'est pas quitté quand le RAISE se déclenche. Son bloc d'exception est exécuté.<br />
<br />
=Améliorations liées aux performances=<br />
<br />
* Les écritures synchrones ont été optimisées pour moins charger le système de fichiers.<br />
<br />
Ce point est difficile à mettre en évidence dans ce document. Mais la performance et les temps de réponse (la latence) ont été fortement améliorés quand la charge en écriture est élevée.<br />
<br />
* Les tables filles (par héritage) dans les requêtes peuvent maintenant retourner des résultats triés de façon utile, ce qui permet des optimisations de MIN/MAX pour l'héritage (et donc le partitionnement).<br />
<br />
Si vous utilisez beaucoup d'héritage, dans un contexte de partitionnement en particulier, vous allez adorer cette optimisation.<br />
<br />
Le planificateur de requête est devenu bien plus intelligent dans le cas suivant.<br />
<br />
Créons un schéma factice:<br />
<br />
=# CREATE TABLE parent (a int);<br />
CREATE TABLE<br />
=# CREATE TABLE children_1 ( check (a between 1 and 10000000)) inherits (parent);<br />
CREATE TABLE<br />
=# CREATE TABLE children_2 ( check (a between 10000001 and 20000000)) inherits (parent);<br />
CREATE TABLE<br />
=# INSERT INTO children_1 select generate_series(1,10000000);<br />
INSERT 0 10000000<br />
=# INSERT INTO children_2 select generate_series(10000001,20000000);<br />
INSERT 0 10000000<br />
=# CREATE INDEX test_1 ON children_1 (a);<br />
CREATE INDEX;<br />
=# CREATE INDEX test_2 ON children_2 (a);<br />
CREATE INDEX;<br />
<br />
Et demandons les 50 plus grandes valeurs de a.<br />
<br />
SELECT * from parent order by a desc limit 50;<br />
<br />
Cela prend, sur une petite machine de test, 13 secondes sur une base en 9.0, et 0.8 millisecondes sur une base en 9.1.<br />
<br />
Le plan en 9.0 est:<br />
<br />
Limit (cost=952993.36..952993.48 rows=50 width=4)<br />
-> Sort (cost=952993.36..1002999.24 rows=20002354 width=4)<br />
Sort Key: public.parent.a<br />
-> Result (cost=0.00..288529.54 rows=20002354 width=4)<br />
-> Append (cost=0.00..288529.54 rows=20002354 width=4)<br />
-> Seq Scan on parent (cost=0.00..34.00 rows=2400 width=4)<br />
-> Seq Scan on children_1 parent (cost=0.00..144247.77 rows=9999977 width=4)<br />
-> Seq Scan on children_2 parent (cost=0.00..144247.77 rows=9999977 width=4)<br />
<br />
Le plan en 9.1 est:<br />
<br />
Limit (cost=113.75..116.19 rows=50 width=4)<br />
-> Result (cost=113.75..975036.98 rows=20002400 width=4)<br />
-> Merge Append (cost=113.75..975036.98 rows=20002400 width=4)<br />
Sort Key: public.parent.a<br />
-> Sort (cost=113.73..119.73 rows=2400 width=4)<br />
Sort Key: public.parent.a<br />
-> Seq Scan on parent (cost=0.00..34.00 rows=2400 width=4)<br />
-> Index Scan Backward using test_1 on children_1 parent (cost=0.00..303940.35 rows=10000000 width=4)<br />
-> Index Scan Backward using test_2 on children_2 parent (cost=0.00..303940.35 rows=10000000 width=4)<br />
<br />
Le plan en 9.0 signifie: je vais prendre tous les enregistrements de toutes les tables, les trier, et ensuite retourner les 50 plus grands.<br />
<br />
Le plan en 9.1 signifie: je vais prendre les enregistrements de chaque table dans l'ordre trié, en utilisant leurs index s'il y en a, les fusionner comme ils arrivent, et retourner les 50 premiers.<br />
<br />
C'était un piège très fréquent, ce genre de requête devenait extrêmement lent quand on partitionnait une table. Et il était un peu compliqué de le contourner par réécriture de requête.<br />
<br />
* Les algorithmes de hachage peuvent maintenant être utilisés pour les full outer join, et pour les tableaux.<br />
<br />
Il est très simple de démontrer ce point (pour les full outer join):<br />
<br />
CREATE TABLE test1 (a int);<br />
CREATE TABLE test2 (a int);<br />
INSERT INTO test1 SELECT generate_series(1,100000);<br />
INSERT INTO test2 SELECT generate_series(100,1000);<br />
<br />
Nous avons donc une grosse table test1 et une petite table test2.<br />
<br />
En 9.0, la requête est faite avec ce plan:<br />
<br />
EXPLAIN ANALYZE SELECT * FROM test1 FULL OUTER JOIN test2 USING (a);<br />
QUERY PLAN <br />
--------------------------------------------------------------------------------------------------------------------------<br />
Merge Full Join (cost=11285.07..11821.07 rows=100000 width=8) (actual time=330.092..651.618 rows=100000 loops=1)<br />
Merge Cond: (test1.a = test2.a)<br />
-> Sort (cost=11116.32..11366.32 rows=100000 width=4) (actual time=327.926..446.814 rows=100000 loops=1)<br />
Sort Key: test1.a<br />
Sort Method: external sort Disk: 1368kB<br />
-> Seq Scan on test1 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.011..119.246 rows=100000 loops=1)<br />
-> Sort (cost=168.75..174.75 rows=2400 width=4) (actual time=2.156..3.208 rows=901 loops=1)<br />
Sort Key: test2.a<br />
Sort Method: quicksort Memory: 67kB<br />
-> Seq Scan on test2 (cost=0.00..34.00 rows=2400 width=4) (actual time=0.009..1.066 rows=901 loops=1<br />
Total runtime: 733.368 ms<br />
<br />
Voici le nouveau plan, en 9.1 cette fois-ci:<br />
<br />
--------------------------------------------------------------------------------------------------------------------<br />
Hash Full Join (cost=24.27..1851.28 rows=100000 width=8) (actual time=2.536..331.547 rows=100000 loops=1)<br />
Hash Cond: (test1.a = test2.a)<br />
-> Seq Scan on test1 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.014..119.884 rows=100000 loops=1)<br />
-> Hash (cost=13.01..13.01 rows=901 width=4) (actual time=2.505..2.505 rows=901 loops=1)<br />
Buckets: 1024 Batches: 1 Memory Usage: 32kB<br />
-> Seq Scan on test2 (cost=0.00..13.01 rows=901 width=4) (actual time=0.017..1.186 rows=901 loops=1)<br />
Total runtime: 412.735 ms<br />
<br />
Le plan en 9.0 effectue 2 tris. Celui en 9.1 n'a besoin que d'un hachage de la plus petite table.<br />
<br />
Le temps d'exécution est divisé par presque 2. Une autre propriété intéressante est que le nouveau plan a un coût de démarrage bien plus faible: le premier enregistrement est retourné après 2 millisecondes, alors qu'il en faut 330 à l'ancien plan.<br />
<br />
SELECT * from test1 full outer join test2 using (a) LIMIT 10<br />
<br />
prend 330ms en 9.0, et 3ms en 9.1.<br />
<br />
<br />
=Administration=<br />
<br />
* Paramétrage automatique de wal_buffers.<br />
Le paramètre wal_buffers est maintenant positionné automatiquement quand sa valeur est -1, sa nouvelle valeur par défaut. Il est positionné automatiquement à 1/32ème de shared_buffers, avec un maximum à 16Mo. Un paramètre de moins à gérer…<br />
<br />
* Enregistrement des dernières remises à zéro dans les vues de statistiques de base de données et de background writer.<br />
Vous pouvez maintentant savoir quand les statistiques ont été réinitialisées. Pour une base de données, par exemple:<br />
<br />
SELECT datname, stats_reset FROM pg_stat_database;<br />
datname | stats_reset <br />
-----------+-------------------------------<br />
template1 | <br />
template0 | <br />
postgres | 2011-05-11 19:22:05.946641+02<br />
marc | 2011-05-11 19:22:09.133483+02<br />
<br />
* Nouvelles colonnes montrant le nombre d'opérations de vacuum et d'analyze dans les vues pg_stat_*_tables.<br />
<br />
C'est maintenant bien plus facile de savoir quelle table attire l'attention d'autovacuum:<br />
<br />
SELECT relname, last_vacuum, vacuum_count, last_autovacuum, autovacuum_count, last_analyze, analyze_count, last_autoanalyze, autoanalyze_count<br />
FROM pg_stat_user_tables <br />
WHERE relname in ('test1','test2');<br />
relname | last_vacuum | vacuum_count | last_autovacuum | autovacuum_count | last_analyze | analyze_count | last_autoanalyze | autoanalyze_count <br />
---------+-------------+--------------+-----------------+------------------+--------------+---------------+-------------------------------+-------------------<br />
test1 | | 0 | | 0 | | 0 | 2011-05-22 15:51:50.48562+02 | 1<br />
test2 | | 0 | | 0 | | 0 | 2011-05-22 15:52:50.325494+02 | 2<br />
<br />
<br />
<br />
=Fonctionnalités SQL et PL/PgSQL=<br />
<br />
* Group by peut deviner des colonnes manquantes<br />
<br />
CREATE TABLE entities (entity_name text primary key, entity_address text);<br />
CREATE TABLE employees (employee_name text primary key, entity_name text references entities (entity_name));<br />
INSERT INTO entities VALUES ('HR', 'address1');<br />
INSERT INTO entities VALUES ('SALES', 'address2');<br />
INSERT INTO employees VALUES ('Smith', 'HR');<br />
INSERT INTO employees VALUES ('Jones', 'HR');<br />
INSERT INTO employees VALUES ('Taylor', 'SALES');<br />
INSERT INTO employees VALUES ('Brown', 'SALES');<br />
<br />
On peut maintenant écrire:<br />
<br />
SELECT count(*), entity_name, address<br />
FROM entities JOIN employees using (entity_name)<br />
GROUP BY entity_name;<br />
count | entity_name | address <br />
-------+-------------+----------<br />
2 | HR | address1<br />
2 | SALES | address2<br />
<br />
En 9.0, il aurait fallu grouper aussi sur address. Comme entity_name est la clé primaire d'entities, address est fonctionnellement dépendant d'entity_name, il est donc évident que PostgreSQL doit aussi regrouper sur address.<br />
<br />
* De nouvelles valeurs peuvent être ajoutées à un type enum par ALTER TYPE.<br />
<br />
=# CREATE TYPE package_status AS ENUM ('RECEIVED', 'DELIVERED'); ;<br />
CREATE TYPE<br />
=# ALTER TYPE package_status ADD VALUE 'READY FOR DELIVERY' AFTER 'RECEIVED';<br />
ALTER TYPE<br />
<br />
Jusqu'à la 9.0, il était nécessaire de détruire le type et en créer un nouveau. Cela impliquait de détruire toutes les colonnes utilisant ce type. C'était une des principales raisons pour lesquelles les enum étaient peu utilisés.<br />
<br />
* Les types composites peuvent être modifiés par ALTER TYPE ... ADD/DROP/ALTER/RENAME ATTRIBUTE.<br />
<br />
Créons un type composite simple:<br />
<br />
=#CREATE TYPE package AS (destination text);<br />
<br />
Créons une fonction vide utilisant ce type:<br />
<br />
=#CREATE FUNCTION package_exists (pack package) RETURNS boolean LANGUAGE plpgsql AS $$<br />
BEGIN<br />
RETURN true;<br />
END<br />
$$<br />
;<br />
<br />
Testons cette fonction:<br />
<br />
=#SELECT package_exists(row('test'));<br />
package_exists <br />
----------------<br />
t<br />
<br />
Cela fonctionne.<br />
<br />
Il est maintenant possible de modifier le type 'package':<br />
<br />
=#ALTER TYPE package ADD ATTRIBUTE received boolean;<br />
<br />
le type a changé:<br />
<br />
=#SELECT package_exists(row('test'));<br />
ERROR: cannot cast type record to package<br />
LINE 1: SELECT package_exists(row('test'));<br />
^<br />
DETAIL: Input has too few columns.<br />
=# SELECT package_exists(row('test',true));<br />
package_exists <br />
----------------<br />
t<br />
<br />
* ALTER TABLE ... ADD UNIQUE/PRIMARY KEY USING INDEX<br />
<br />
Cela sera certainement utilisé principalement pour créer une clé unique ou primaire sans verrouiller une table pendant trop longtemps:<br />
<br />
=# CREATE UNIQUE INDEX CONCURRENTLY idx_pk ON test_pk (a);<br />
CREATE INDEX<br />
=# ALTER TABLE test_pk ADD primary key using index idx_pk;<br />
ALTER TABLE<br />
<br />
La table test_pk ne sera verrouillée en écriture que pendant la durée de l'ALTER TABLE. Le reste du travail sera fait sans bloquer le travail des utilisateurs.<br />
<br />
On peut bien sûr utiliser cela pour reconstruire l'index d'une clé primaire sans verrouiller la table pendant toute l'opération:<br />
<br />
=# CREATE UNIQUE INDEX CONCURRENTLY idx_pk2 ON test_pk (a);<br />
=# BEGIN ;<br />
=# ALTER TABLE test_pk DROP CONSTRAINT idx_pk;<br />
=# ALTER TABLE test_pk ADD primary key using index idx_pk2;<br />
=# COMMIT ;<br />
<br />
* ALTER TABLE ... SET DATA TYPE peut éviter la ré-écriture de toute la table pour les cas les plus appropriés.<br />
<br />
Par exemple, convertir une colonne varchar en texte ne demande plus de réécrire la table.<br />
<br />
Par contre, augmenter la taille d'une colonne varchar nécessite toujours une réécriture de la table<br />
<br />
Il reste encore un certain nombre de cas non gérés, qui déclenchent une réécriture. Il y aura probablement des améliorations dans les prochaines versions de PostgreSQL, ce travail se poursuivant.<br />
<br />
* New CREATE TABLE IF NOT EXISTS syntax.<br />
<br />
Vous n'aurez pas d'erreur si une table existe déjà, seulement un NOTICE.<br />
<br />
Attention au fait qu'il ne vérifiera pas que la définition de votre CREATE TABLE et la définition de la table sont identiques.<br />
<br />
* Nouvelle option ENCODING à COPY TO/FROM. Cela permet de spécifier un encodage à COPY qui soit indépendant du client_encoding.<br />
<br />
COPY test1 TO stdout ENCODING 'latin9'<br />
<br />
convertira l'encodage directement. Il n'est donc pas nécessaire de changer le client_encoding avant le COPY.<br />
<br />
* triggers INSTEAD OF sur vues.<br />
<br />
Cette fonctionnalité peut être utilisée pour implémenter des vues en mise à jour. Voici un exemple:<br />
<br />
Continuons sur l'exemple employees/entities.<br />
<br />
=#CREATE VIEW emp_entity AS SELECT employee_name, entity_name, address<br />
FROM entities JOIN employees USING (entity_name);<br />
<br />
Pour rendre cette vue modifiable en 9.0, il fallait utiliser des RULES (règles). Cela pouvait vite tourner au cauchemar, les rules sont assez complexes à écrire, et encore pire à déboguer. Voici comment on procédait: [http://www.postgresql.org/docs/9.1/static/rules-update.html mises à jour par règles]<br />
<br />
On peut maintenant faire tout cela avec un trigger. Voici un exemple, en PL/PgSQL (il n'y a que la partie INSERT ici):<br />
<br />
=#CREATE OR REPLACE FUNCTION dml_emp_entity () RETURNS trigger LANGUAGE plpgsql AS $$<br />
DECLARE<br />
vrecord RECORD;<br />
BEGIN<br />
IF TG_OP = 'INSERT' THEN<br />
-- Does the record exist in entity ?<br />
SELECT entity_name,address INTO vrecord FROM entities WHERE entity_name=NEW.entity_name;<br />
IF NOT FOUND THEN<br />
INSERT INTO entities (entity_name,address) VALUES (NEW.entity_name, NEW.address);<br />
ELSE<br />
IF vrecord.address != NEW.address THEN<br />
RAISE EXCEPTION 'There already is a record for % in entities. Its address is %. It conflics with your address %',<br />
NEW.entity_name, vrecord.address, NEW.address USING ERRCODE = 'unique_violation';<br />
END IF;<br />
END IF; -- Nothing more to do, the entity already exists and is OK<br />
-- We now try to insert the employee data. Let's directly try an INSERT<br />
BEGIN<br />
INSERT INTO employees (employee_name, entity_name) VALUES (NEW.employee_name, NEW.entity_name);<br />
EXCEPTION WHEN unique_violation THEN<br />
RAISE EXCEPTION 'There is already an employee with this name %', NEW.employee_name USING ERRCODE = 'unique_violation';<br />
END;<br />
RETURN NEW; -- The trigger succeeded<br />
END IF;<br />
END<br />
$$<br />
;<br />
<br />
Il ne reste plus qu'à déclarer notre trigger maintenant:<br />
<br />
=#CREATE TRIGGER trig_dml_emp_entity INSTEAD OF INSERT OR UPDATE OR DELETE ON emp_entity FOR EACH ROW EXECUTE PROCEDURE dml_emp_entity ();<br />
<br />
Il y a d'autres avantages: une rule ne fait que réécrire la requête. Avec le trigger, nous avons rajouté de la logique, et nous pouvons retourner des messages d'erreur pertinents. Cela permet bien plus facilement de comprendre ce qui a échoué. On peut aussi gérer des exceptions. Nous avons tous les avantages des triggers sur les rules.<br />
<br />
* PL/PgSQL FOREACH IN ARRAY.<br />
<br />
C'est devenu bien plus simple de faire une boucle sur un tableau en PL/PgSQL. Jusqu'à maintenant, le mot clé FOR ne fonctionnait que pour boucler sur des recordsets (résultat de requête).<br />
<br />
On peut maintenant s'en servir pour boucler sur un tableau.<br />
<br />
Avant la 9.1, il aurait fallu écrire quelque chose de ce genre:<br />
<br />
=# CREATE OR REPLACE FUNCTION test_array (parray int[]) RETURNS int LANGUAGE plpgsql AS $$<br />
DECLARE<br />
vcounter int :=0;<br />
velement int;<br />
BEGIN<br />
FOR velement IN SELECT unnest (parray)<br />
LOOP<br />
vcounter:=vcounter+velement;<br />
END LOOP;<br />
RETURN vcounter;<br />
END<br />
$$<br />
;<br />
<br />
Maintenant:<br />
<br />
=# CREATE OR REPLACE FUNCTION test_array (parray int[]) RETURNS int LANGUAGE plpgsql AS $$<br />
DECLARE<br />
vcounter int :=0;<br />
velement int;<br />
BEGIN<br />
FOREACH velement IN ARRAY parray<br />
LOOP<br />
vcounter:=vcounter+velement;<br />
END LOOP;<br />
RETURN vcounter;<br />
END<br />
$$<br />
;<br />
<br />
C'est bien plus facile à lire, et plus performant à l'exécution.<br />
<br />
Il y a un autre avantage: nous pouvons découper le tableau quand il est multi-dimensionnel. Voici un exemple, tiré directement de la documentation:<br />
<br />
=#CREATE FUNCTION scan_rows(int[]) RETURNS void AS $$<br />
DECLARE<br />
x int[];<br />
BEGIN<br />
FOREACH x SLICE 1 IN ARRAY $1<br />
LOOP<br />
RAISE NOTICE 'row = %', x;<br />
END LOOP;<br />
END;<br />
$$ LANGUAGE plpgsql;<br />
<br />
=#SELECT scan_rows(ARRAY[[1,2,3],[4,5,6],[7,8,9],[10,11,12]]);<br />
NOTICE: row = {1,2,3}<br />
NOTICE: row = {4,5,6}<br />
NOTICE: row = {7,8,9}<br />
NOTICE: row = {10,11,12}<br />
<br />
[[Category:PostgreSQL 9.1]]<br />
[[Category:Français]]</div>Marco44