Difference between revisions of "Add MERGE command GSoC 2010"

From PostgreSQL wiki
Jump to: navigation, search
(Merge Command)
Line 1: Line 1:
= Merge Command =
+
= MERGE Command =
MERGE command is under a urge requirment for PosgreSQL. Many have been talked about it [http://wiki.postgresql.org/wiki/SQL_MERGE].
+
 
In a simple word, MERGE is used to combine the content of two tables into one table according to
+
The [[SQL MERGE]] command is under a urge requirement for PosgreSQL.
the user defined matching condition.  
+
MERGE is used to combine the content of two tables into one table according to user defined matching conditions.
  
  
Line 11: Line 11:
  
  
The latese patch file can be downloaded at [[Image:Merge_tested.tar]].  
+
The latest patch file can be downloaded at [[Image:Merge_tested.tar]].  
  
 
TODO:  
 
TODO:  
Line 23: Line 23:
 
= '''Syntax of Merge''' =
 
= '''Syntax of Merge''' =
  
I refer to the syntax proposed by Mr. Simon Riggs[[http://archives.postgresql.org/pgsql-hackers/2008-04/msg01157.php ]] as the
+
I refer to the syntax proposed by Mr. Simon Riggs[[http://archives.postgresql.org/pgsql-hackers/2008-04/msg01157.php]] as the
 
following:
 
following:
  
Line 60: Line 60:
 
'''NOTE:'''
 
'''NOTE:'''
  
1. The action of the same type can appeare more than once. Namely, you can have two UPDATE in one MERGE.
+
1. Actions of the same type can appear more than once. Namely, you can have two UPDATE in one MERGE.
  
2. Eact tuple of the SOURCE table will only be matched for once and take only one action. If a tuple fits
+
2. Each tuple of the SOURCE table will only be matched once and result in only one action. If a tuple fits
the qualificaion of more than one action, the first action will be taken and the remain one(s) will be ignored.
+
the qualification of more than one action, the first action will be taken and the remaining one(s) will be ignored.
  
 
3. The INSERT actions cannot be applied when MATCHED. And, UPDATE and DELETE actions are for MATCHED cases only.
 
3. The INSERT actions cannot be applied when MATCHED. And, UPDATE and DELETE actions are for MATCHED cases only.
Line 70: Line 70:
  
 
= '''Proposal''' =
 
= '''Proposal''' =
== '''gernal idea of the implementation''' ==
+
== '''General idea of the implementation''' ==
  
The command definition is clear: we try to join the table to be modified (target table) and the
+
The command definition is clear: we try to join the table to be modified (target table) and the table as input (source table),
table as input (source table), do different actions according to the MATCH or NOT MATCH
+
do different actions according to the MATCH or NOT MATCH
 
conditions.
 
conditions.
  
Line 116: Line 116:
 
== '''Job Plan''' ==
 
== '''Job Plan''' ==
  
1 Update the backend/parser/gram.y. Add the SQL style MERGE command in the parser (follow the definition above). One new “MergeStmt” structure should be designed to hold the transformed command information. Besides the target table name and the ON join‐condition, we need at least one “RangeSubselect” or “RangeVar” structure to hold the source table and a List of UpdateStmt, InsertStmt and/or DeleteStmt for the side queries.
+
1. Update the backend/parser/gram.y. Add the SQL style MERGE command in the parser (follow the definition above). One new “MergeStmt” structure should be designed to hold the transformed command information. Besides the target table name and the ON join‐condition, we need at least one “RangeSubselect” or “RangeVar” structure to hold the source table and a List of UpdateStmt, InsertStmt and/or DeleteStmt for the side queries.
  
2. In the Analyze.c file we need to add a function to transform this MergeStmt into a Query node.
+
2. In the analyze.c file we need to add a function to transform this MergeStmt into a Query node.
  
 
It is necessary to add a new command type for MERGE, which is a plannable command.
 
It is necessary to add a new command type for MERGE, which is a plannable command.
We need to check the semantics correctness of the statement. What I am thinking about is to
+
We need to check the semantical correctness of the statement. What I am thinking about is to
 
combine the target table and the source table as a whole SELECT query.
 
combine the target table and the source table as a whole SELECT query.
  
Line 128: Line 128:
 
“SELECT * FROM target, source WHERE match‐condition;”  
 
“SELECT * FROM target, source WHERE match‐condition;”  
  
or , We have to do a cross join if we
+
or, we have to do a cross join if we
 
want to handle some NOT MATCH actions, which will do a query like  
 
want to handle some NOT MATCH actions, which will do a query like  
 
“SELECT * FROM target, source;”
 
“SELECT * FROM target, source;”
  
 
The benefit is that we can almost fully reuse the rewriter and planner to transform this generated
 
The benefit is that we can almost fully reuse the rewriter and planner to transform this generated
query as an executor‐accepting structure.
+
query as an executor‐accepted structure.
  
3. A plan is need for the query. The planner should accept this new plannable command.
+
3. A plan is needed for the query. The planner should accept this new plannable command.
However, as motioned above, the real work will be: do a traditional query plan on the formatted
+
However, as mentioned above, the real work will be: do a traditional query plan on the formatted
main query which joins the target and source table together. And the side queries are also
+
main query, which joins the target and source table together. And the side queries are also
processed in this way. Finally pack these plans within an outer “PlannedStmt” , which is designed
+
processed in this way. Finally pack these plans within an outer “PlannedStmt”, which is designed
for MERGE command specifically.
+
for the MERGE command specifically.
  
4. In executor, the basic operation is: for each returned tuple of the main query (the tuple
+
4. In the executor, the basic operation is: for each returned tuple of the main query (the tuple
contains all the attributes in both source and target table) we can test it against the conditions in
+
contains all the attributes in both the source and the target table) we can test it against the conditions in
different side queries, and do corresponding actions base the testing result.
+
different side queries, and do corresponding actions based on the test results.
  
  
Line 149: Line 149:
 
== '''Time line''' ==
 
== '''Time line''' ==
  
Week1: Modify the gram.y , add the syntax define of the command
+
Week1: Modify the gram.y, add the syntax define of the command
  
 
Week2‐3: add transform program in the parser
 
Week2‐3: add transform program in the parser

Revision as of 14:53, 19 August 2010

MERGE Command

The SQL MERGE command is under a urge requirement for PosgreSQL. MERGE is used to combine the content of two tables into one table according to user defined matching conditions.


As GSoC 2010 student, I proposed to add the MERGE command into PostgreSQL. Now the job is almost done.


If you are interested in this project, you can find our MergeImplementationDetails and MergeTestExamples here.


The latest patch file can be downloaded at File:Merge tested.tar.

TODO:

1. Add regress auto test

2. Add sgml instruction

3. Returning ?

Syntax of Merge

I refer to the syntax proposed by Mr. Simon Riggs[[1]] as the following:

MERGE INTO table [[AS] alias]

USING [table-ref | query]

ON join-condition

[WHEN MATCHED [AND condition] THEN MergeUpdate | DELETE | DO NOTHING | RAISE ERROR]

[WHEN NOT MATCHED [AND condition] THEN MergeInsert | DO NOTHING | RAISE ERROR]

MergeUpdate is

UPDATE SET { column = { expression | DEFAULT } |

( column [, ...] ) = ( { expression | DEFAULT } [, ...] ) }

[, ...]

(yes, there is no WHERE clause here)

MergeInsert is

INSERT [ ( column [, ...] ) ]

{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] )

[, ...]}

(no subquery allowed)


NOTE:

1. Actions of the same type can appear more than once. Namely, you can have two UPDATE in one MERGE.

2. Each tuple of the SOURCE table will only be matched once and result in only one action. If a tuple fits the qualification of more than one action, the first action will be taken and the remaining one(s) will be ignored.

3. The INSERT actions cannot be applied when MATCHED. And, UPDATE and DELETE actions are for MATCHED cases only.

4. DO NOTHING and RAISE ERROR can work for both MATCHED and NOT MATCHED situation.

Proposal

General idea of the implementation

The command definition is clear: we try to join the table to be modified (target table) and the table as input (source table), do different actions according to the MATCH or NOT MATCH conditions.

My general idea for doing this is as following: We construct a cross join query on the target table and source table in the parser, which is called the Main Query. And packed the list of "WHEN" actions as update, delete or insert queries (side queries), where the MATCHED is replaced by the ON join‐condition.

Use the original planner to generate query plans for the main query and side queries. Put all these plans in a outer plan structure and attach a plan label as a new type of MERGE command.

In the executor, we check each tuple returned by the main query against all the conditions of each side query. If a condition is fulfilled, then do the action accordingly.

For example, if we receive a MERGE query as

MERGE INTO Stock S

USING DailySales DS ON S.Item = DS.Item

WHEN MATCHED AND (QtyOnHand ‐ QtySold = 0) THEN DELETE

WHEN MATCHED THEN UPDATE SET QtyOnHand = QtyOnHand ‐ QtySold

WHEN NOT MATCHED THEN INSERT VALUES (Item, QtySold);

We can do a main query like:

SELECT * FROM Stock S, DailySales DS;

And the side queries are:

1. DELETE FROM Stock S USING DailySales DS WHERE S.item = DS.item AND (QtyOnHand ‐ QtySold = 0);

2. UPDATE Stock S SET QtyOnHand = QtyOnHand – QtySold FROM DailySales DS WHERE S.item = DS.item;

3. INSERT INTO Stock S SELECT Item, QtySold FROM DailySalse DS WHERE not (S.item = DS.item);

Note that we are not run the side queries directly one by one. We just run the main query and execute the side queries within the join loop of the main query.


Job Plan

1. Update the backend/parser/gram.y. Add the SQL style MERGE command in the parser (follow the definition above). One new “MergeStmt” structure should be designed to hold the transformed command information. Besides the target table name and the ON join‐condition, we need at least one “RangeSubselect” or “RangeVar” structure to hold the source table and a List of UpdateStmt, InsertStmt and/or DeleteStmt for the side queries.

2. In the analyze.c file we need to add a function to transform this MergeStmt into a Query node.

It is necessary to add a new command type for MERGE, which is a plannable command. We need to check the semantical correctness of the statement. What I am thinking about is to combine the target table and the source table as a whole SELECT query.

If there is no NOT MATCH option, we can generate a normal query node of something like

“SELECT * FROM target, source WHERE match‐condition;”

or, we have to do a cross join if we want to handle some NOT MATCH actions, which will do a query like “SELECT * FROM target, source;”

The benefit is that we can almost fully reuse the rewriter and planner to transform this generated query as an executor‐accepted structure.

3. A plan is needed for the query. The planner should accept this new plannable command. However, as mentioned above, the real work will be: do a traditional query plan on the formatted main query, which joins the target and source table together. And the side queries are also processed in this way. Finally pack these plans within an outer “PlannedStmt”, which is designed for the MERGE command specifically.

4. In the executor, the basic operation is: for each returned tuple of the main query (the tuple contains all the attributes in both the source and the target table) we can test it against the conditions in different side queries, and do corresponding actions based on the test results.


Time line

Week1: Modify the gram.y, add the syntax define of the command

Week2‐3: add transform program in the parser

Week4‐5: modify planner, let it accept the MERGE command query

Week6‐8: do the executor implementation, One kind of actions for one week.

Week9‐10: flexible time for dogging and improvement.

TODO

  • change the way DO INSTEAD rules are handled [2]
  • Handle after statement triggers
  • write a reference page in the user manual
    • Simon Riggs wrote one, review and include that [3]
  • tab-completion support for psql
  • wrap long comments to multiple lines
  • clean up spurious whitespace