JSON API Brainstorm

From PostgreSQL wiki
Jump to navigationJump to search

This page is a brainstorming session to develop the API for JSON support in PostgreSQL. Feel free to edit this article with reckless abandon, even if (and especially if) you aren't an "expert", but do so respectfully.

The goals of this page are to:

  • Gather ideas
  • See if they are feasible to implement
  • See if they make sense
  • See if they let users do what they need to do


Overview

The data type for storing and manipulating JSON documents is called "JSON". The JSON type can be thought of as a specialization of TEXT in that it stores and retrieves TEXT verbatim. However, a JSON can only contain a valid JSON value (as defined at http://www.json.org/ ). This means that the JSON data type can hold values of the following types:

string '"hello world"'
number '3.14159'
object '{"a": 1, "b": 2}'
array '[0,1,4,9,16,25]'
boolean 'true'
null 'null' (not to be confused with SQL's NULL)

Type checking

  • jsontype AS ENUM ('null', 'string', 'number', 'boolean', 'object', 'array')
  • json_type(json) RETURNS jsontype

Value conversion

  • from_json(json) RETURNS TEXT

Examples:

	from_json('3.14159')::FLOAT => '3.14159'::FLOAT
	from_json('"string"')       => 'string'::TEXT
	from_json('null')           => NULL
	from_json(NULL)             => NULL
	from_json('[1,2,3,4,5]')    => error: not implemented yet
  • to_json(ANYELEMENT) RETURNS json

Examples:

	to_json(3.14159::FLOAT)       => '3.14159'::JSON
	to_json('string'::TEXT)       => '"string"'::JSON
	to_json('"double"'::JSON)     => E'"\\"double\\""'::JSON
	to_json(NULL::TEXT)           => 'null'::JSON
	to_json('{1,2,3,4,5}'::INT[]) => error: not implemented yet

Note that casts are needed on the inside for to_json when a literal is given, just as casts are needed on the outside for from_json. This is because the PostgreSQL parser requires arguments to a polymorphic function (a function that takes ANYELEMENT or similar as an argument; see [1]) to have known types. You cannot say:

to_json('123')

because the parser doesn't know what type the literal '123' is. Is it TEXT? Is it INT? If it were TEXT, to_json would yield '"123"'::JSON, while if it were INT, to_json would yield '123'::JSON. Thus, I would consider the cast requirement to be an advantage. Also, don't forget that if to_json's argument is a column of a table with a known type, a cast won't be needed.

Question: Why do we need from/to_json functions? It might be reasonable if they have two or more arguments, but one argument version can be replaced with casting to json type. (itagaki 01:56, 26 May 2010 (UTC))

'Answer: The JSON type is a specialization of TEXT. Consider the case where you want to decode a JSON-encoded string: '"string"'::JSON . If we say '"string"'::JSON::TEXT, we will get '"string"', not 'string'. However, if we say from_json('"string"'), we'll get 'string'. Similarly, from_json unwraps quotes and converts 'null' to NULL. Thanks for asking! Joeyadams 17:05, 27 May 2010 (UTC)

Array/object conversion

Idea 1: This syntax would require altering the grammar, if I'm not mistaken.

  • json_object([content [AS name] [, ...]] | *) RETURNS json

Idea 2: Might be easier to implement, but might be harder to use.

  • json_object(RECORD) RETURNS json

Examples:

	-- CREATE TABLE foo (pi FLOAT, e FLOAT); INSERT INTO foo VALUES (3.14159, 2.71828)
	SELECT json_object(foo)       FROM foo => '{"pi":3.14159,"e":2.71828}'
	SELECT json_object(row(pi,e)) FROM foo => '{"f1":3.14159,"f2":2.71828}'
		-- f1, f2 based on how hstore handles this case; I'm not sure they'd be useful
	SELECT json_object(???(pi,e)) FROM foo => '{"pi":3.14159,"e":2.71828}'
		-- Is there a way to make this work (??? is a placeholder)?  Example using hstore:
		-- SELECT hstore(???(pi,e)) FROM foo; => '"pi"=>"3.14159", "e"=>"2.71828"'
  • json_array([content [, ...]]) RETURNS json
  • AGGREGATE json_agg(json) RETURNS json
  • json_keys(json) RETURNS TEXT[]
  • json_values(json) RETURNS JSON[]

Member access

json_get and json_set use JavaScript-style paths inspired by http://goessner.net/articles/JsonPath/ . However, this implementation doesn't require users to prefix paths with '$'. In my opinion, it's boilerplate and shouldn't be required.

The first release will only support basic subscripting; later versions may implement things like set-returning variants of json_get and json_set .

  • json_get(json, json_path text) RETURNS JSON

Examples:

	json_get('[0,1,4,9,16,25]',               '[2]')         => '4'::JSON
	json_get('[0,1,4,9,16,25]',               '.2')          => '4'::JSON
		-- subscripting a numbered item with . works
	json_get('{"pi":3.14159,"e":2.71828}',    '.pi')         => '3.14159'::JSON
	json_get('{"pi":3.14159,"e":2.71828}',    '["pi"]')      => '3.14159'::JSON
		-- subscripting with a quoted string works
	json_get('{"pi":3.14159,"e":2.71828}',    '."pi"')       => '3.14159'::JSON
		-- as does using a dot subscript with a quoted string
	json_get('{"pi":3.14159,"e":2.71828}',    '[pi]')        => ERROR
		-- This does not work because the identifier in brackets
		-- syntax is reserved for future use.
	json_get('[0,1,4,9,16,25]',               '[6]')         => NULL::JSON
	json_get('{"a":[0,1,2,3],"b":[4,5,6,7]}', '.b[2]')       => '6'::JSON
	json_get('{"5":"five","10":"ten"}',       '[5]')         => NULL::JSON
	json_get('{"5":"five","10":"ten"}',       '["5"]')       => '"five"'::JSON
	json_get($${"back\\slash":"\\"}$$, $$["back\\slash"]$$)  => $$"\\"$$::JSON
	json_get('{"key":"value"}', '.' || to_json(var))         => '"value"'::JSON if var = 'key'::TEXT
		-- Use to_json to safely embed parameters in a
		-- JSONPath expression.
  • json_set(json, json_path text, json) RETURNS JSON

Examples:

	json_set('[0,1,2,9,16,25]', '[2]', '4')                => '[0,1,4,9,16,25]'::JSON
	json_set('[0,1,2,9,16,25]', '.' || to_json(var), '4')  => '[0,1,4,9,16,25]'::JSON if var = '2'::INT
		-- Use to_json to safely embed parameters in a
		-- JSONPath expression


Idea 2

  • json_expand(js json) returns setof(id integer, parent_id integer, type jsontype, key text, value text, path text)

Examples:

select * from json_expand($JSON$
     { "friends" : [ { "id" : 10001,
                       "name" : "First User"
                     },
                     { "id" : 10002,
                       "name" : "Second User"
                     },
                     { "id" : 10003,
                       "name" : "Third User"
                     },
                     { "id" : 10004,
                       "name" : "Fourth User"
                     }
                   ]
     }
     $JSON$::json
   )

returns

id parent_id type key value path
1 NULL object NULL NULL
2 1 array friends NULL friends
3 2 object 0 NULL friends[0]
4 3 number id 10001 friends[0].id
5 3 string name First User friends[0].name
6 2 object 1 NULL friends[1]
7 6 number id 10002 friends[1].id
8 6 string name Second User friends[1].name
9 2 object 2 NULL friends[2]
10 9 number id 10003 friends[2].id
11 9 string name Third User friends[2].name
12 2 object 3 NULL friends[3]
13 12 number id 10004 friends[3].id
14 12 string name Fourth User friends[3].name

And using it:

with tq as(
   select * from json_expand($JSON$
        { "friends" : [ { "id" : 10001,
                          "name" : "First User"
                        },
                        { "id" : 10002,
                          "name" : "Second User"
                        },
                        { "id" : 10003,
                          "name" : "Third User"
                        },
                        { "id" : 10004,
                          "name" : "Fourth User"
                        }
                      ]
        }
        $JSON$::json
      )
)

select t1.value::integer as id, t2.value as name
  from tq t1, tq t2
  where t1.path ~ $RE$^friends.\[\d+\].id$RE$,
    and t2.path ~ $RE$^friends.\[\d+\].name$RE$
    and t1.parent_id=t2.parent_id

returns

id name
10001 First User
10002 Second User
10003 Third User
10004 Fourth User
  • json_compact( jss json_expanded[] ),

where

create type json_expanded as(
 value text,
 path text
)

Miscellaneous

  • json_cleanup(TEXT) RETURNS json

json_cleanup accepts a superset of JSON and, if it can, cleans it up and returns a valid JSON string. This superset of JSON supports the following extra features:

  • Comments:
    • Single-line comments with // and #
    • C-style comments: /* comment */
  • Unquoted object keys: {key: "value"}
  • Single quote strings: 'single quotes; "double quotes" do not need

to be escaped here'

  • Single quote escape allowed: "It\'s allowed, but it's not necessary"
  • Lax number format (+ sign allowed; digits may be omitted on one

side of the decimal point).