Automatically updatable security barrier views

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(How to proceed)
(How to proceed)
Line 294: Line 294:
here the target relation is still a RTE_RELATION.
here the target relation is still a RTE_RELATION.
See [ mailing list post with proposed approach]
See [ mailing list post with proposed approach]
Need to:
Need to:

Revision as of 07:47, 12 November 2013


Automatically updatable security barrier views

As part of the Row-security work, and as a useful feature on its own, it is desirable to support automatically updatable security_barrier views. Ordinary views are updatable in 9.3, but security_barrier views aren't considered "simple" views and are not updatable.

Status in 9.3

"simple" views are made updatable in 9.3 by flattening the view quals into the outer query. So given:

 SELECT n AS id, 'secret'||n AS secret FROM generate_series(1,10) n;
 CREATE VIEW t_even AS SELECT * FROM t WHERE id % 2 = 0;
 CREATE VIEW t_even_sb WITH (security_barrier) AS SELECT * FROM t WHERE id % 2 = 0;

you can update the simple non-security-barrier view, but can also steal values:

   RAISE NOTICE 'Saw secret=%',$1;
   RETURN true;
 $$ LANGUAGE plpgsql COST 1;
 test=>   UPDATE t_even SET id = id WHERE f_leak(secret);
 NOTICE:  Saw secret=secret1
 NOTICE:  Saw secret=secret3
 NOTICE:  Saw secret=secret5
 NOTICE:  Saw secret=secret7
 NOTICE:  Saw secret=secret9
 NOTICE:  Saw secret=secret2
 NOTICE:  Saw secret=secret4
 NOTICE:  Saw secret=secret6
 NOTICE:  Saw secret=secret8
 NOTICE:  Saw secret=secret10

You can't update the security barrier view at all:

 test=> UPDATE t_even_sb SET id = id WHERE f_leak(secret);
 ERROR:  cannot update view "t_even_sb"
 DETAIL:  Security-barrier views are not automatically updatable.
 HINT:  To enable updating the view, provide an INSTEAD OF UPDATE trigger or an unconditional ON UPDATE DO INSTEAD rule.

so there's no secure way to give a user update on only some rows of table. That's something we need for row-security and it's a feature that would be useful for users in general.

Issue with security barrier support

The existing updatable view code relies on flattening a view's quals into the outer query. So with the above example:

 UPDATE t_even SET id = id WHERE f_leak(secret);

gets view-expanded into something logically like:

 UPDATE (SELECT * FROM t WHERE id % 2 = 0) SET id = id WHERE f_leak(secret)

but the executor doesn't know how to deal with that so it can't be passed through as-is. Instead the subquery gets flattened out, producing:

 UPDATE t SET id = id WHERE id % 2 = 0 AND f_leak(secret)

which the executor can deal with, since it's a simple update of a relation. You can see this in the plan:

 test=# explain  UPDATE t_even SET id = id WHERE f_leak(secret);
                        QUERY PLAN                        
  Update on t  (cost=0.00..31.53 rows=2 width=42)
    ->  Seq Scan on t  (cost=0.00..31.53 rows=2 width=42)
          Filter: (f_leak(secret) AND ((id % 2) = 0))
 (3 rows)

See how the f_leak(secret) qual and id % 2 = 0 are at the same level? The executor will tend to pick f_leak(secret) to run first because its COST is artificially low.

This won't work with security-barrier views, since we have to make sure that the predicate id % 2 = 0 filters rows before any are passed to the user-supplied predicate, in this case f_leak(secret). So the current code refuses to operate on security_barrier views.

How automatically updatable views work in 9.3

Automatically updatable views were introduced by Dean Rasheed's code, committed by Tom in a99c42f291421572aef2b0a9360294c7d89b8bc7.

This code extends src/backend/rewrite/rewriteHandler.c, adding functions to:

  • get_view_query: Get the _RETURN rule from a view
  • view_has_instead_trigger: Check whether there's an INSTEAD trigger that should supersede auto-upd view
  • view_is_auto_updatable: Check whether a view is "simple" enough to be auto-upd. Rejects sb views.
  • relation_is_updatable: Used by information_schema. Only interesting for real views.
  • adjust_view_column_set: Map column permissions from view to sub-table. Only interesting if it is a real view.
  • rewriteTargetView: The guts. Rewrite a view into the outer query. It:
    • Makes sure the view is auto-updatable with view_is_auto_updatable
    • Finds the RTE for the view in the outer query
    • Gets the query that defines the view from its _RETURN rule using get_view_query
    • Locks the base relation (might be another view)update resnos in the targetlist to refer to columns of the base relation
    • Pull up the view (should be merged with optimizer pull-up code if possible):
    • Create a new target RTE describing the base rel and add it to the outer query's range table
      • Copy the quals to the outer query's qual list, fixing up varnos to point to the new target
      • Deal with permissions where view owner != query caller
      • Deal with column permissions
    • Fix up vars in outer rel to point to vars in new base rel instead of view, using ReplaceVarsFromTargetList (TODO: can we re-use this approach for RLS?)
    • Fix up everything else that references the view to point to the new base rel using ChangeVarNodes
    • Update resnos from target list to point to cols of base rel (UPDATE / INSERT) only.
    • Add the rewritten update/insert/delete, either to the beginning of the list of rewritten queries + rules (for insert) or after (for update/delete).

Also changes fireRIRrules

Notably, this means that the updatable view code doesn't actually add support for updating a view. Instead, it adds support for rewriting simple views to pull their quals up into the outer query and flatten the subquery.

We can't do that for updatable security barrier views.

The path to updatable s.b. views

To support updatable security barrier views we have to support UPDATE directly on a subquery without flattening the query. That's because we must still enforce the order of qual execution imposed by the security_barrier flag on the subquery. Anything else would require implementing a different approach to security barriers and introduce its own problems.

How does updating a join work?

UPDATE ... FROM already supports acting on a join, as does DELETE ... USING.

We need to work out how this is executed, and determine how the update/delete is executed using the join node. Multiple join types are supported:

regress=> explain delete from t using t t2 where =;
                                QUERY PLAN                                 
 Delete on t  (cost=170.85..290.46 rows=7564 width=12)
   ->  Merge Join  (cost=170.85..290.46 rows=7564 width=12)
         Merge Cond: ( =
         ->  Sort  (cost=85.43..88.50 rows=1230 width=10)
               Sort Key:
               ->  Seq Scan on t  (cost=0.00..22.30 rows=1230 width=10)
         ->  Sort  (cost=85.43..88.50 rows=1230 width=10)
               Sort Key:
               ->  Seq Scan on t t2  (cost=0.00..22.30 rows=1230 width=10)
(9 rows)
regress=> explain delete from t using t t2 where =;
                               QUERY PLAN                                
 Delete on t  (cost=3.25..6.62 rows=100 width=12)
   ->  Hash Join  (cost=3.25..6.62 rows=100 width=12)
         Hash Cond: ( =
         ->  Seq Scan on t  (cost=0.00..2.00 rows=100 width=10)
         ->  Hash  (cost=2.00..2.00 rows=100 width=10)
               ->  Seq Scan on t t2  (cost=0.00..2.00 rows=100 width=10)
(6 rows)
regress=> SET enable_hashjoin = off;
regress=> SET enable_mergejoin = off;
regress=> explain delete from t using t t2 where =;
                               QUERY PLAN                                
 Delete on t  (cost=0.00..154.25 rows=100 width=12)
   ->  Nested Loop  (cost=0.00..154.25 rows=100 width=12)
         Join Filter: ( =
         ->  Seq Scan on t  (cost=0.00..2.00 rows=100 width=10)
         ->  Materialize  (cost=0.00..2.50 rows=100 width=10)
               ->  Seq Scan on t t2  (cost=0.00..2.00 rows=100 width=10)
(6 rows)

so it's clearly at least somewhat generic.

How does this work at the code level, though?

SET debug_print_plan = on;
SET debug_print_parse = on;
SET debug_print_rewritten = on;
SET client_min_messages = debug;

produces [1] - parse, rewrite and plan tree set.

The parse tree shows two range-table entries, one for t and one for t2. They have different :requiredPerms but are otherwise the same. The parse-tree has :commandType 4 (CMD_DELETE)

By contrast, in a simple delete doesn't differ except for :requiredPerms 8.

The diff between the two parse, plan and rewrite trees may be informative.

Parse tree

  • :requiredPerms on the first RTE, for t, is 8 for the simple delete, 10 for the join delete. The extra bit, 0x02, is defined in include/nodes/parsenodes.h as #define ACL_SELECT (1<<1).
  • :selectedCols on the first RTE has an extra entry 9 in the join delete. This is a resjunk column used to hold the join key?
  • :modifiedCols (b) is on the first RTE for both parse trees
  • There's a second RTE added by the join plan, for t aliased to t2. Other than the alias it is the same as the first RTE, for the base table.

... and that's it for the parse tree. It looks like it really is a simple delete on a join. The only indication that the target table is the first RTE appears to be that it's RTE index 1.

Rewritten tree

diff starts here.

The rewritten tree is the same as the parse tree in both cases; no change.

Plan tree

diff starts here.

Here we should see how the planner intends on actually executing this delete.

The costs and estimates are different, but we can ignore that, it's not interesting.

At the top level QUERY node the key entry (thanks RhodiumToad) appears:

   :resultRelation 1 

where 1 is a RTI.

  • In both cases the top node is MODIFYTABLE with identical parameters, the interesting ones being:
      :extParam (b)
      :allParam (b)
      :operation 4 
      :canSetTag true 
      :resultRelations (i 1)
      :resultRelIndex 0

IOW it refers to the result relation(s). It's the same for both the simple and join deletes. To focus on are :resultRelations and resultRelIndex. TODO

There is a HASHJOIN node instead of a SEQSCAN node at the first sublevel of the plan.

The HASHJOIN node has two TARGETENTRY nodes, not just the one for a SEQSCAN. The only differences are :varnoold 2 instead of 1 for the expr of the 2nd node, resno 2 instead of 1, resname is ctid1 instead of ctid. Both are resjunk columns. Under the HASHJOIN are left and right plan trees, each containing a seqscan node. The nested hash op and scans are included under it.

Finally, the join plan adds:

   :rowMarks (
      :rti 2 
      :prti 2 
      :rowmarkId 1 
      :markType 4 
      :noWait false 
      :isParent false
    :relationOids (o 16387 16402)

Here :relationOids are for t and t2 respectively.

The :rowMarks clause appears to refer to a FOR UPDATE clause. markType is defined in include/nodes/plannodes.h as RowMarkType. markType 4 is ROW_MARK_REFERENCE /* just fetch the TID */. The comment there says that a rowMark is added for each non-target relation and that if it isn't FOR UPDATE it's flagged ROW_MARK_REFERENCE. So this is the ref to t2, which fits given rti being the index of range table entry 2, t2.

The rowmarkId is for resjunk cols, referring to the unique resjunk col id :resno? If so, it seems to refer to the left side of the join.

In the code

  • resultRelation (per RhodiumToad)
  • preprocess_targetlist and expand_targetlist in backend/optimizer/prep/preptlist.c.
  • src/backend/executor/README.

IRC chat:

<RhodiumToad> ringerc: update is basically planned like this: given  update x set a = ... from y where ...;
<RhodiumToad> ringerc: it's treated as a query  select, ...,, x.ctid, ... from x,y where ...;
<RhodiumToad> ringerc: where the initial result columsn of the select are matched up to the new row of x
<RhodiumToad> ringerc: so fields that don't change are fetched from the old x.* row, and ones that do change just have the new value expression in the select list
<RhodiumToad> ringerc: in preptlist, the select list is adjusted to add the necessary entries

Current limitations:

  • in preprocess_targetlist an explicit check is made to reject a result_relation that is a subquery.
  • heap_form_tuple for the new tuple requires that the tlist be in the same order as the tuple attributes. (Should not affect join).

How to proceed

* :targetRelation (rti 1)
* ModifyTable
** SubqueryScan (rti 2)
*** SeqScan (ref rti 1)
**** Filter
* RangeTable
** RTE_RELATION underlying target-relation
** RTE_SUBQUERY subquery source

here the target relation is still a RTE_RELATION.

See mailing list post with proposed approach

Need to:

  • Remove security barrier check in view_query_is_auto_updatable
  • Omit/replace/rewrite rewriteTargetView
    • After new_rt_index = list_length(parsetree->rtable);, assign new index to parsetree->resultRelation
    • Add missing columns (marked resjunk?) to the view subquery plan, so we can see all old values re-used in UPDATE. Can omit any values already being set, but won't bother with that. (TODO: Ensure that this doesn't leak the column values via whole-tuple reference to inner subquery passed to function).
      • Delete copying of view targetlist. Should be rewritten in place instead.
      • Adapt the per-column permission bits code so it sets the bits in-place on the view query, not on the copy
      • Delete all the rest of the flattening code

Can it really be that easy? Think. This should crash, surely. Yes, it must fail because we haven't injected the ctid into the view, so when the ModifyRelation node is reached it'll fail to find the ctid in the input.


What's a var? RangeTable? RTE? Attr? See Source Glossary and Tom's Intro to hacking the query planner.

Adding security_barrier support

Doing it cleanly

To add security_barrier support for updatable views we need to teach UPDATE and DELETE to operate on a subquery. This would be very useful for row-security.

This isn't unprecedented; we support UPDATE ... FROM already:

test=# explain UPDATE t SET id = gs.n FROM (SELECT generate_series(1,10)) gs(n) WHERE = gs.n;
                                  QUERY PLAN                                  
 Update on t  (cost=150.26..247.51 rows=6150 width=70)
   ->  Merge Join  (cost=150.26..247.51 rows=6150 width=70)
         Merge Cond: (gs.n =
         ->  Sort  (cost=64.84..67.34 rows=1000 width=32)
               Sort Key: gs.n
               ->  Subquery Scan on gs  (cost=0.00..15.01 rows=1000 width=32)
                     ->  Result  (cost=0.00..5.01 rows=1000 width=0)
         ->  Sort  (cost=85.43..88.50 rows=1230 width=42)
               Sort Key:
               ->  Seq Scan on t  (cost=0.00..22.30 rows=1230 width=42)
(10 rows)

where the root node of the UPDATE is a MergeJoin not a direct table or index scan.

Can that be extended to a SubqueryScan? Robert thinks it probably can.

Prior approaches

The 9.4 RLS patch implements an equivalent feature by internally replacing the RTE_RELATION range-table entry for the RLS-affected table with an RTE_SUBQUERY. It then has to do fixup for resjunk columns (temporary sort keys not output in the final result set, ctid, etc). A bunch of fixups are required to remap Vars and attribute numbers between the base table and the subquery. Additionally, changes are required to teach UPDATE that the relation it is scanning isn't necessarily the same as the relation it is updating.

Dean Rasheed separated the RLS code into a prototype patch to implement updatable s.b. views. This patch has the same issues as RLS.

Personal tools