Add MERGE command GSoC 2010

From PostgreSQL wiki

Jump to: navigation, search

Contents

MERGE Command

This was never integrated into PostgreSQL, and requires significant work to be production quality


The SQL MERGE command is under a urge requirement for PosgreSQL. MERGE is used to combine the content of two tables into one table according to user defined matching conditions.


As GSoC 2010 student, I proposed to add the MERGE command into PostgreSQL. Now the job is almost done.


If you are interested in this project, you can find our MergeImplementationDetails and MergeTestExamples here.


The latest patch file can be downloaded at


File:Merge v104.tar;


File:Merge v201.tar;

Syntax of Merge

I refer to the syntax proposed by Mr. Simon Riggs[[1]] as the following:

MERGE INTO table [[AS] alias]

USING [table-ref | query]

ON join-condition

[WHEN MATCHED [AND condition] THEN MergeUpdate | DELETE | DO NOTHING | RAISE ERROR]

[WHEN NOT MATCHED [AND condition] THEN MergeInsert | DO NOTHING | RAISE ERROR]

MergeUpdate is

UPDATE SET { column = { expression | DEFAULT } |

( column [, ...] ) = ( { expression | DEFAULT } [, ...] ) }

[, ...]

(yes, there is no WHERE clause here)

MergeInsert is

INSERT [ ( column [, ...] ) ]

{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] )

[, ...]}

(no subquery allowed)


NOTE:

1. Actions of the same type can appear more than once. Namely, you can have two UPDATE in one MERGE.

2. Each tuple of the SOURCE table will only be matched once and result in only one action. If a tuple fits the qualification of more than one action, the first action will be taken and the remaining one(s) will be ignored.

3. The INSERT actions cannot be applied when MATCHED. And, UPDATE and DELETE actions are for MATCHED cases only.

4. DO NOTHING and RAISE ERROR can work for both MATCHED and NOT MATCHED situation.

Proposal

General idea of the implementation

The command definition is clear: we try to join the table to be modified (target table) and the table as input (source table), do different actions according to the MATCH or NOT MATCH conditions.

My general idea for doing this is as following: We construct a cross join query on the target table and source table in the parser, which is called the Main Query. And packed the list of "WHEN" actions as update, delete or insert queries (side queries), where the MATCHED is replaced by the ON join‐condition.

Use the original planner to generate query plans for the main query and side queries. Put all these plans in a outer plan structure and attach a plan label as a new type of MERGE command.

In the executor, we check each tuple returned by the main query against all the conditions of each side query. If a condition is fulfilled, then do the action accordingly.

For example, if we receive a MERGE query as

MERGE INTO Stock S

USING DailySales DS ON S.Item = DS.Item

WHEN MATCHED AND (QtyOnHand ‐ QtySold = 0) THEN DELETE

WHEN MATCHED THEN UPDATE SET QtyOnHand = QtyOnHand ‐ QtySold

WHEN NOT MATCHED THEN INSERT VALUES (Item, QtySold);

We can do a main query like:

SELECT * FROM Stock S, DailySales DS;

And the side queries are:

1. DELETE FROM Stock S USING DailySales DS WHERE S.item = DS.item AND (QtyOnHand ‐ QtySold = 0);

2. UPDATE Stock S SET QtyOnHand = QtyOnHand – QtySold FROM DailySales DS WHERE S.item = DS.item;

3. INSERT INTO Stock S SELECT Item, QtySold FROM DailySalse DS WHERE not (S.item = DS.item);

Note that we are not run the side queries directly one by one. We just run the main query and execute the side queries within the join loop of the main query.


Job Plan

1. Update the backend/parser/gram.y. Add the SQL style MERGE command in the parser (follow the definition above). One new “MergeStmt” structure should be designed to hold the transformed command information. Besides the target table name and the ON join‐condition, we need at least one “RangeSubselect” or “RangeVar” structure to hold the source table and a List of UpdateStmt, InsertStmt and/or DeleteStmt for the side queries.

2. In the analyze.c file we need to add a function to transform this MergeStmt into a Query node.

It is necessary to add a new command type for MERGE, which is a plannable command. We need to check the semantical correctness of the statement. What I am thinking about is to combine the target table and the source table as a whole SELECT query.

If there is no NOT MATCH option, we can generate a normal query node of something like

“SELECT * FROM target, source WHERE match‐condition;”

or, we have to do a cross join if we want to handle some NOT MATCH actions, which will do a query like “SELECT * FROM target, source;”

The benefit is that we can almost fully reuse the rewriter and planner to transform this generated query as an executor‐accepted structure.

3. A plan is needed for the query. The planner should accept this new plannable command. However, as mentioned above, the real work will be: do a traditional query plan on the formatted main query, which joins the target and source table together. And the side queries are also processed in this way. Finally pack these plans within an outer “PlannedStmt”, which is designed for the MERGE command specifically.

4. In the executor, the basic operation is: for each returned tuple of the main query (the tuple contains all the attributes in both the source and the target table) we can test it against the conditions in different side queries, and do corresponding actions based on the test results.


Time line

Week1: Modify the gram.y, add the syntax define of the command

Week2‐3: add transform program in the parser

Week4‐5: modify planner, let it accept the MERGE command query

Week6‐8: do the executor implementation, One kind of actions for one week.

Week9‐10: flexible time for dogging and improvement.

TODO

1. inheritance for merge


2. Returning ?

Personal tools