Is distinct from

From PostgreSQL wiki
Jump to navigationJump to search

Pg supports two comparison statements IS DISTINCT FROM and IS NOT DISTINCT FROM, these essentially treat NULL as if it was a known value, rather than a special case for unknown.

Mnemonic: DISTINCT means different which is what '<>' tests for

Syntax

SELECT foo,bar FROM table WHERE foo IS DISTINCT FROM bar;
SELECT foo,bar FROM table WHERE foo IS NOT DISTINCT FROM bar;

Truth Table

\pset null '<<NULL>>'

select a.a, b.b, a.a IS DISTINCT FROM b.b AS "Is Distinct From" FROM (VALUES (1), (2), (NULL)) AS a (a), (VALUES (1), (2), (NULL)) AS b (b);

    a     |    b     | Is Distinct From 
----------+----------+------------------
        1 |        1 | f
        1 |        2 | t
        1 | <<NULL>> | t
        2 |        1 | t
        2 |        2 | f
        2 | <<NULL>> | t
 <<NULL>> |        1 | t
 <<NULL>> |        2 | t
 <<NULL>> | <<NULL>> | f

select a.a, b.b, a.a IS NOT DISTINCT FROM b.b AS "Is Not Distinct From" FROM (VALUES (1), (2), (NULL)) AS a (a), (VALUES (1), (2), (NULL)) AS b (b);

    a     |    b     | Is Not Distinct From 
----------+----------+----------------------
        1 |        1 | t
        1 |        2 | f
        1 | <<NULL>> | f
        2 |        1 | f
        2 |        2 | t
        2 | <<NULL>> | f
 <<NULL>> |        1 | f
 <<NULL>> |        2 | f
 <<NULL>> | <<NULL>> | t

Writing with CASE statements

Both of these sql comparison macros can be written with a CASE statement:

foo IS NOT DISTINCT FROM bar

CASE
  -- Act like a normal '=' on non-null values.
  WHEN foo IS NOT NULL AND bar IS NOT NULL
    THEN foo = bar
  -- foo = 'notnull' , bar = 'null' ; then foo <> bar so false
  WHEN foo IS NOT NULL AND bar IS NULL
    THEN false
  -- foo = 'null' , bar = 'null' ; then foo = bar so true
  WHEN foo IS NULL AND bar IS NULL
    THEN true
  ELSE false
END

foo IS DISTINCT FROM bar

CASE
  -- Act like a normal '<>' on non-null values.
  WHEN foo IS NOT NULL AND bar IS NOT NULL
    THEN foo <> bar
  -- foo = 'notnull' , bar = 'null' ; then foo <> bar so true
  WHEN foo IS NOT NULL AND bar IS NULL
    THEN true
  -- foo = 'null' , bar = 'null' ; then foo = bar so false
  WHEN foo IS NULL AND bar IS NULL
    THEN false
  ELSE true
END

Determining different data in tables

Lets say you have two tables, t1, and t2, and they both have columns "foo","bar","baz" and you want to see all rows that are in t1, that aren't in t2. Normally, if you wanted the effect of '=' you could make this really simple:

SELECT foo,bar,baz
FROM t1
LEFT OUTER JOIN t2 USING (foo,bar,baz)
WHERE t1.foo IS NULL

USING (foo,bar,baz) is just a macro for: t1.foo = t2.foo AND t1.bar = t2.bar AND t1.baz = t2.baz

However, if you wanted the treat-null-as-a-known-value alternative you would need: IS NOT DISTINCT FROM. The task because slightly more verbose and daunting when joining a table, because there are no shorthands for the IS NOT DISTINCT FROM form.

SELECT t1.foo,t1.bar,t1.baz
FROM t1
LEFT OUTER JOIN t2 ON (
 t1.foo IS NOT DISTINCT FROM t2.foo
 AND t1.bar IS NOT DISTINCT FROM t2.bar
 AND t1.baz IS NOT DISTINCT FROM t2.baz
)
WHERE ( t2.foo IS NULL )