JSON API Brainstorm
This page is a brainstorming session to develop the API for JSON support in PostgreSQL. Feel free to edit this article with reckless abandon, even if (and especially if) you aren't an "expert", but do so respectfully.
The goals of this page are to:
- Gather ideas
- See if they are feasible to implement
- See if they make sense
- See if they let users do what they need to do
Overview
The data type for storing and manipulating JSON documents is called "JSON". The JSON type can be thought of as a specialization of TEXT in that it stores and retrieves TEXT verbatim. However, a JSON can only contain a valid JSON value (as defined at http://www.json.org/ ). This means that the JSON data type can hold values of the following types:
string | '"hello world"' |
number | '3.14159' |
object | '{"a": 1, "b": 2}' |
array | '[0,1,4,9,16,25]' |
boolean | 'true' |
null | 'null' (not to be confused with SQL's NULL) |
Type checking
- jsontype AS ENUM ('null', 'string', 'number', 'boolean', 'object', 'array')
- json_type(json) RETURNS jsontype
Value conversion
- from_json(json) RETURNS TEXT
Examples:
from_json('3.14159')::FLOAT => '3.14159'::FLOAT
from_json('"string"') => 'string'::TEXT
from_json('null') => NULL
from_json(NULL) => NULL
from_json('[1,2,3,4,5]') => error: not implemented yet
- to_json(ANYELEMENT) RETURNS json
Examples:
to_json(3.14159::FLOAT) => '3.14159'::JSON
to_json('string'::TEXT) => '"string"'::JSON
to_json('"double"'::JSON) => E'"\\"double\\""'::JSON
to_json(NULL::TEXT) => 'null'::JSON
to_json('{1,2,3,4,5}'::INT[]) => error: not implemented yet
Note that casts are needed on the inside for to_json when a literal is given, just as casts are needed on the outside for from_json. This is because the PostgreSQL parser requires arguments to a polymorphic function (a function that takes ANYELEMENT or similar as an argument; see [1]) to have known types. You cannot say:
to_json('123')
because the parser doesn't know what type the literal '123' is. Is it TEXT? Is it INT? If it were TEXT, to_json would yield '"123"'::JSON, while if it were INT, to_json would yield '123'::JSON. Thus, I would consider the cast requirement to be an advantage. Also, don't forget that if to_json's argument is a column of a table with a known type, a cast won't be needed.
Question: Why do we need from/to_json functions? It might be reasonable if they have two or more arguments, but one argument version can be replaced with casting to json type. (itagaki 01:56, 26 May 2010 (UTC))
'Answer: The JSON type is a specialization of TEXT. Consider the case where you want to decode a JSON-encoded string: '"string"'::JSON . If we say '"string"'::JSON::TEXT, we will get '"string"', not 'string'. However, if we say from_json('"string"'), we'll get 'string'. Similarly, from_json unwraps quotes and converts 'null' to NULL. Thanks for asking! Joeyadams 17:05, 27 May 2010 (UTC)
Array/object conversion
Idea 1: This syntax would require altering the grammar, if I'm not mistaken.
- json_object([content [AS name] [, ...]] | *) RETURNS json
Idea 2: Might be easier to implement, but might be harder to use.
- json_object(RECORD) RETURNS json
Examples:
-- CREATE TABLE foo (pi FLOAT, e FLOAT); INSERT INTO foo VALUES (3.14159, 2.71828)
SELECT json_object(foo) FROM foo => '{"pi":3.14159,"e":2.71828}'
SELECT json_object(row(pi,e)) FROM foo => '{"f1":3.14159,"f2":2.71828}'
-- f1, f2 based on how hstore handles this case; I'm not sure they'd be useful
SELECT json_object(???(pi,e)) FROM foo => '{"pi":3.14159,"e":2.71828}'
-- Is there a way to make this work (??? is a placeholder)? Example using hstore:
-- SELECT hstore(???(pi,e)) FROM foo; => '"pi"=>"3.14159", "e"=>"2.71828"'
- json_array([content [, ...]]) RETURNS json
- AGGREGATE json_agg(json) RETURNS json
- json_keys(json) RETURNS TEXT[]
- json_values(json) RETURNS JSON[]
Member access
json_get and json_set use JavaScript-style paths inspired by http://goessner.net/articles/JsonPath/ . However, this implementation doesn't require users to prefix paths with '$'. In my opinion, it's boilerplate and shouldn't be required.
The first release will only support basic subscripting; later versions may implement things like set-returning variants of json_get and json_set .
- json_get(json, json_path text) RETURNS JSON
Examples:
json_get('[0,1,4,9,16,25]', '[2]') => '4'::JSON
json_get('[0,1,4,9,16,25]', '.2') => '4'::JSON
-- subscripting a numbered item with . works
json_get('{"pi":3.14159,"e":2.71828}', '.pi') => '3.14159'::JSON
json_get('{"pi":3.14159,"e":2.71828}', '["pi"]') => '3.14159'::JSON
-- subscripting with a quoted string works
json_get('{"pi":3.14159,"e":2.71828}', '."pi"') => '3.14159'::JSON
-- as does using a dot subscript with a quoted string
json_get('{"pi":3.14159,"e":2.71828}', '[pi]') => ERROR
-- This does not work because the identifier in brackets
-- syntax is reserved for future use.
json_get('[0,1,4,9,16,25]', '[6]') => NULL::JSON
json_get('{"a":[0,1,2,3],"b":[4,5,6,7]}', '.b[2]') => '6'::JSON
json_get('{"5":"five","10":"ten"}', '[5]') => NULL::JSON
json_get('{"5":"five","10":"ten"}', '["5"]') => '"five"'::JSON
json_get($${"back\\slash":"\\"}$$, $$["back\\slash"]$$) => $$"\\"$$::JSON
json_get('{"key":"value"}', '.' || to_json(var)) => '"value"'::JSON if var = 'key'::TEXT
-- Use to_json to safely embed parameters in a
-- JSONPath expression.
- json_set(json, json_path text, json) RETURNS JSON
Examples:
json_set('[0,1,2,9,16,25]', '[2]', '4') => '[0,1,4,9,16,25]'::JSON
json_set('[0,1,2,9,16,25]', '.' || to_json(var), '4') => '[0,1,4,9,16,25]'::JSON if var = '2'::INT
-- Use to_json to safely embed parameters in a
-- JSONPath expression
Idea 2
- json_expand(js json) returns setof(id integer, parent_id integer, type jsontype, key text, value text, path text)
Examples:
select * from json_expand($JSON$
{ "friends" : [ { "id" : 10001,
"name" : "First User"
},
{ "id" : 10002,
"name" : "Second User"
},
{ "id" : 10003,
"name" : "Third User"
},
{ "id" : 10004,
"name" : "Fourth User"
}
]
}
$JSON$::json
)
returns
id | parent_id | type | key | value | path |
---|---|---|---|---|---|
1 | NULL | object | NULL | NULL | |
2 | 1 | array | friends | NULL | friends |
3 | 2 | object | 0 | NULL | friends[0] |
4 | 3 | number | id | 10001 | friends[0].id |
5 | 3 | string | name | First User | friends[0].name |
6 | 2 | object | 1 | NULL | friends[1] |
7 | 6 | number | id | 10002 | friends[1].id |
8 | 6 | string | name | Second User | friends[1].name |
9 | 2 | object | 2 | NULL | friends[2] |
10 | 9 | number | id | 10003 | friends[2].id |
11 | 9 | string | name | Third User | friends[2].name |
12 | 2 | object | 3 | NULL | friends[3] |
13 | 12 | number | id | 10004 | friends[3].id |
14 | 12 | string | name | Fourth User | friends[3].name |
And using it:
with tq as(
select * from json_expand($JSON$
{ "friends" : [ { "id" : 10001,
"name" : "First User"
},
{ "id" : 10002,
"name" : "Second User"
},
{ "id" : 10003,
"name" : "Third User"
},
{ "id" : 10004,
"name" : "Fourth User"
}
]
}
$JSON$::json
)
)
select t1.value::integer as id, t2.value as name
from tq t1, tq t2
where t1.path ~ $RE$^friends.\[\d+\].id$RE$,
and t2.path ~ $RE$^friends.\[\d+\].name$RE$
and t1.parent_id=t2.parent_id
returns
id | name |
---|---|
10001 | First User |
10002 | Second User |
10003 | Third User |
10004 | Fourth User |
- json_compact( jss json_expanded[] ),
where
create type json_expanded as(
value text,
path text
)
Miscellaneous
- json_cleanup(TEXT) RETURNS json
json_cleanup accepts a superset of JSON and, if it can, cleans it up and returns a valid JSON string. This superset of JSON supports the following extra features:
- Comments:
- Single-line comments with // and #
- C-style comments: /* comment */
- Unquoted object keys: {key: "value"}
- Single quote strings: 'single quotes; "double quotes" do not need
to be escaped here'
- Single quote escape allowed: "It\'s allowed, but it's not necessary"
- Lax number format (+ sign allowed; digits may be omitted on one
side of the decimal point).