Soundex
From PostgreSQL wiki
Jump to navigationJump to searchCompute string soundex
Works with PostgreSQL
Tested on 9.0, 9.1. Probably works in earlier versions
Written in
PL/pgSQL
Depends on
Nothing
Soundex is a simple algorithm for comparing/normalizing English words based on phonetic similarity.
PostgreSQL 8.3 and later include a soundex(text) function in the fuzzystrmatch extension. Note that this function behaves differently from some other implementations, as it does not skip similar consonants that are separated by 'w' and 'h'.
PL/pgSQL version
If you can't install the fuzzystrmatch extension (due to limited privieges or other reasons), here is an alternative version in PL/pgSQL. This function behaves identically to the fuzzystrmatch implementation. (Tested on PostgreSQL 9.0 and 9.1, using dictionaries in many different languages).
CREATE OR REPLACE FUNCTION soundex(input text) RETURNS text
IMMUTABLE STRICT COST 500 LANGUAGE plpgsql
AS $$
DECLARE
soundex text = '';
char text;
symbol text;
last_symbol text = '';
pos int = 1;
BEGIN
WHILE length(soundex) < 4 LOOP
char = upper(substr(input, pos, 1));
pos = pos + 1;
CASE char
WHEN '' THEN
-- End of input string
IF soundex = '' THEN
RETURN '';
ELSE
RETURN rpad(soundex, 4, '0');
END IF;
WHEN 'B', 'F', 'P', 'V' THEN
symbol = '1';
WHEN 'C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z' THEN
symbol = '2';
WHEN 'D', 'T' THEN
symbol = '3';
WHEN 'L' THEN
symbol = '4';
WHEN 'M', 'N' THEN
symbol = '5';
WHEN 'R' THEN
symbol = '6';
ELSE
-- Not a consonant; no output, but next similar consonant will be re-recorded
symbol = '';
END CASE;
IF soundex = '' THEN
-- First character; only accept strictly English ASCII characters
IF char ~>=~ 'A' AND char ~<=~ 'Z' THEN
soundex = char;
last_symbol = symbol;
END IF;
ELSIF last_symbol != symbol THEN
soundex = soundex || symbol;
last_symbol = symbol;
END IF;
END LOOP;
RETURN soundex;
END;
$$;
Example
db=# select column1, soundex(column1) from (values ('Robert'), ('Rupert'), ('Rubin'), ('Ashcroft')) data; column1 | soundex ----------+--------- Robert | R163 Rupert | R163 Rubin | R150 Ashcroft | A226 db=# create table words(word text); db=# copy words from '/usr/share/dict/words'; db=# create index on words(soundex(word)); db=# select * from words where soundex(word)=soundex('hotel'); word -------- hotel hotly huddle