Automatically updatable security barrier views
From PostgreSQL wiki
Automatically updatable security barrier views
As part of the Row-security work, and as a useful feature on its own, it is desirable to support automatically updatable security_barrier views. Ordinary views are updatable in 9.3, but security_barrier views aren't considered "simple" views and are not updatable.
Status in 9.3
"simple" views are made updatable in 9.3 by flattening the view quals into the outer query. So given:
CREATE TABLE t AS SELECT n AS id, 'secret'||n AS secret FROM generate_series(1,10) n;
CREATE VIEW t_even AS SELECT * FROM t WHERE id % 2 = 0;
CREATE VIEW t_even_sb WITH (security_barrier) AS SELECT * FROM t WHERE id % 2 = 0;
you can update the simple non-security-barrier view, but can also steal values:
CREATE OR REPLACE FUNCTION f_leak(text) RETURNS boolean AS $$ BEGIN RAISE NOTICE 'Saw secret=%',$1; RETURN true; END; $$ LANGUAGE plpgsql COST 1;
test=> UPDATE t_even SET id = id WHERE f_leak(secret); NOTICE: Saw secret=secret1 NOTICE: Saw secret=secret3 NOTICE: Saw secret=secret5 NOTICE: Saw secret=secret7 NOTICE: Saw secret=secret9 NOTICE: Saw secret=secret2 NOTICE: Saw secret=secret4 NOTICE: Saw secret=secret6 NOTICE: Saw secret=secret8 NOTICE: Saw secret=secret10 UPDATE 5
You can't update the security barrier view at all:
test=> UPDATE t_even_sb SET id = id WHERE f_leak(secret); ERROR: cannot update view "t_even_sb" DETAIL: Security-barrier views are not automatically updatable. HINT: To enable updating the view, provide an INSTEAD OF UPDATE trigger or an unconditional ON UPDATE DO INSTEAD rule.
so there's no secure way to give a user update on only some rows of table. That's something we need for row-security and it's a feature that would be useful for users in general.
Issue with security barrier support
The existing updatable view code relies on flattening a view's quals into the outer query. So with the above example:
UPDATE t_even SET id = id WHERE f_leak(secret);
gets view-expanded into something logically like:
UPDATE (SELECT * FROM t WHERE id % 2 = 0) SET id = id WHERE f_leak(secret)
but the executor doesn't know how to deal with that so it can't be passed through as-is. Instead the subquery gets flattened out, producing:
UPDATE t SET id = id WHERE id % 2 = 0 AND f_leak(secret)
which the executor can deal with, since it's a simple update of a relation. You can see this in the plan:
test=# explain UPDATE t_even SET id = id WHERE f_leak(secret); QUERY PLAN --------------------------------------------------------- Update on t (cost=0.00..31.53 rows=2 width=42) -> Seq Scan on t (cost=0.00..31.53 rows=2 width=42) Filter: (f_leak(secret) AND ((id % 2) = 0)) (3 rows)
See how the f_leak(secret) qual and id % 2 = 0 are at the same level? The executor will tend to pick f_leak(secret) to run first because its COST is artificially low.
This won't work with security-barrier views, since we have to make sure that the predicate id % 2 = 0 filters rows before any are passed to the user-supplied predicate, in this case f_leak(secret). So the current code refuses to operate on security_barrier views.
How automatically updatable views work in 9.3
Automatically updatable views were introduced by Dean Rasheed's code, committed by Tom in a99c42f291421572aef2b0a9360294c7d89b8bc7.
This code extends src/backend/rewrite/rewriteHandler.c, adding functions to:
- get_view_query: Get the _RETURN rule from a view
- view_has_instead_trigger: Check whether there's an INSTEAD trigger that should supersede auto-upd view
- view_is_auto_updatable: Check whether a view is "simple" enough to be auto-upd. Rejects sb views.
- relation_is_updatable: Used by information_schema. Only interesting for real views.
- adjust_view_column_set: Map column permissions from view to sub-table. Only interesting if it is a real view.
- rewriteTargetView: The guts. Rewrite a view into the outer query. It:
- Makes sure the view is auto-updatable with view_is_auto_updatable
- Finds the RTE for the view in the outer query
- Gets the query that defines the view from its _RETURN rule using get_view_query
- Locks the base relation (might be another view)update resnos in the targetlist to refer to columns of the base relation
- Pull up the view (should be merged with optimizer pull-up code if possible):
- Create a new target RTE describing the base rel and add it to the outer query's range table
- Copy the quals to the outer query's qual list, fixing up varnos to point to the new target
- Deal with permissions where view owner != query caller
- Deal with column permissions
- Fix up vars in outer rel to point to vars in new base rel instead of view, using ReplaceVarsFromTargetList (TODO: can we re-use this approach for RLS?)
- Fix up everything else that references the view to point to the new base rel using ChangeVarNodes
- Update resnos from target list to point to cols of base rel (UPDATE / INSERT) only.
- Add the rewritten update/insert/delete, either to the beginning of the list of rewritten queries + rules (for insert) or after (for update/delete).
Also changes fireRIRrules
Notably, this means that the updatable view code doesn't actually add support for updating a view. Instead, it adds support for rewriting simple views to pull their quals up into the outer query and flatten the subquery.
We can't do that for updatable security barrier views.
The path to updatable s.b. views
To support updatable security barrier views we have to support UPDATE directly on a subquery without flattening the query. That's because we must still enforce the order of qual execution imposed by the security_barrier flag on the subquery. Anything else would require implementing a different approach to security barriers and introduce its own problems.
Updatable security barrier views via updatable subqueries
One approach is to allow the rewriter to expand views that're subquery targets. The rather extensive history on this wiki page covers an attempt to implement that.
The two fundamental issues that've arisen out of that effort are:
resultRelation vs source relation
Parts of the parser and executor are used to the resultRelation being the same RTE as the relation we get rows to feed into ExecModifyTable from. If updating a subquery this is no longer true. Instead of, after target view flattening:
SeqScan<baseRel> WHERE quals1 AND quals2 AND quals3 | ^ | | | ------------- v | ModifyTable | -> heap_modify_tuple( | ) -> RETURNING (mix of refs to baserel and other expressions)
(SELECT ctid, ... FROM ( (SELECT ctid, ... FROM ( (SELECT ctid, ... FROM SeqScan<baseRel> WHERE quals1) WHERE quals2) WHERE quals3) | v ModifyTable -> resultRelation(baseRel) -> RETURNING (mix of refs to baserel and other expressions)
The RETURNING list may contain expressions referencing the result relation, which must reflect the final rel after the effect of any BEFORE triggers etc. Expressions may also contain subqueries, may refer to expressions in the view queries that aren't based on the result relation, etc. So we must rewrite the tlist to replace references to the result relation while leaving everything else untouched.
The following must point to the resultRelation with their 'relation' argument:
- heap_modify_tuple(...) for storing new tuples
- heap_lock_tuple(...) invoked via RowMarks
- Any Var in the RETURNING list that refers to the relation being updated
The following must point to the source relation:
- "ctid" resjunk Var on top level query
- "oid" resjunk Var on top level query
- All TargetList entries referring to the target view
- "ctid1" etc for RowMark ctid sources
We need to inject a resjunk "ctid" col into each subquery to get the ctid out of the subquery layers and into ExecModifyTable. If oids are used on the table, we need to fetch the "oid" as resjunk too.
Additionally, any tlist entries for anything not specified explicitly in SET must be added to the tlist of each of the nested subqueries, pointing to the inner-most query.
It isn't necessarily right to do this at rewrite time. We might choose to do it at preprocess_targetlist time instead, using 'sourceRelation' as guidance on where to point the next level of Vars. The issue here is that we must preprocess from inner query out, where currently I think we do it the other way around.
Doing this in the rewriter might be easy. The rewriter is recursive, not iterative. So the stack state is preserved and the stack unwinds. After RewriteQuery returns, we can examine what got returned and modify it using the knowledge of the next-outer-most layer.
Earlier explanation of same.
1. preprocess_targetlist in the planner assumes that the resultRelation is the correct RTE to set as the varno in a new Var it adds to fetch the row ctid (with label "ctid1") as a resjunk attr for row-marking. This causes the tlist to have entries pointing to different RTE to the one being scanned by the eventual seqscan / indexscan, though the underlying Relation is the same. tlist validation checks don't like that.
There may be other places that need to add tlist entries pointing to the relation we're reading rows from. They'll also need to be able to deal with the fact that this no longer the resultRelation.
2. Despite bashing my head against it for ages, I haven't figured out how to inject references to the base-rel's ctid, oid (if WITH OIDS), and any tlist entries not specified in the DML statement into the subquery tree.
Today I've been looking at two ideas but haven't figured out how to make either work yet:
- Pass the next-outer-most Query down into the RewriteQuery invocation.