Join Order Search

This is currently under development. See the ToDo list.

There was a join order search hook introduced in PostgreSQL version 8.3 which allows users to gain the control over the algorithm used to search for the most suitable join order during the planning phase for the query. We are working on implementing various heuristic algorithms to address join order problem and would like to perform comparisons of those algorithms. Based on the results and feedback from the community some of these algorithms could become a part of PostgreSQL core distribution.

Join Order Search Hook

There is a function type join_search_hook_type defined in paths.h reflecting the function prototype to be used as an entry to join order search algorithms. The pointer to the function to be called is stored in join_search_hook variable defined in paths.h as extern symbol and allpaths.c. If this variable is not NULL the specified function will be called at make_rel_from_joinlist, otherwise the standard algorithms would be used based on the settings (dynamic programming or genetic query optimizer).

To create a loadable module which will activate the probe and point it to the desired function you should do the following:

Create the function to be called for search for the best join order.
Create the PostgreSQL external module - using PG_MODULE_MAGIC macro and linking the binaries as a shared library.
Assign the desired function to join_search_hook in module initialization function - void _PG_init(void).
Assign NULL to join_search_hook in module finalization function - void _PG_fini(void).

If you wish to call the standard algorithm (dynamic programming only, not geqo!!!) for the join order search problem you can call standard_join_search function to do that. Here is the simple module that just calls the standard algorithm to do the job:

#include "postgres.h"

#include "fmgr.h"
#include "optimizer/geqo.h"
#include "optimizer/paths.h"
#include "optimizer/pathnode.h"


PG_MODULE_MAGIC;

void    _PG_init(void);
void    _PG_fini(void);

static
RelOptInfo *
dummy_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
{
        return standard_join_search(root, levels_needed, initial_rels);
}


void
_PG_init(void)
{
        join_search_hook = dummy_join_search;
}

void
_PG_fini(void)
{
        join_search_hook = 0;
}

Goal

We are aiming to implement various different algorithms to solve the join order problem during the query planning in PostgreSQL. The standard dynamic programming optimizer works well for smaller number of relations, so we would like to focus on the cases where the number of relations is going to be too high for extensive search over the whole space and some heuristics have to be employed for the job.

PostgreSQL currently tries to avoid calling the genetic query optimizer (GeQO) by setting join_collapse_limit and from_collapse_limit parameters far from the GeQO execution threshold.

If the algorithms would be good enough we could increase these parameters and allow the planner to consider "all the available" options for join order.

Project

There is a related project created on pgfoundry.org to cover the implementation. You can download the code from the CVS. The name of the module you are looking for is jos.

Simply grab the source from CVS..

cvs -d :pserver:anonymous@cvs.pgfoundry.org:/cvsroot/optimizer login
cvs -z3 -d :pserver:anonymous@cvs.pgfoundry.org:/cvsroot/optimizer checkout -P jos

and just run...

bash-3.00$ ./configure
bash-3.00$ make --prefix=/path/to/pgsql/prefix
bash-3.00$ make install

Now, all the module should be installed in the same directory as PostgreSQL libraries.

Modules

There are these modules

dummy - dummy module just executing the standard_join_search function.
compare - module executing various algorithms and printing out the plan cost for each.
simgre - greedy algorithm with just depth of 1.
greedy - greedy algorithm with the configurable depth.

To use the specified module load the specified library with

postgres=# LOAD 'josdummy.so';
LOAD

dummy

Just a dummy module that coudl be used as a template. It always executes the dynamic programming optimizer.

simgre

This is the simple greedy algorithm with the depth of one.

greedy

Disabled for now, implementation is not finished.

It is supposed to be a greedy algorithm with a configurable depth and/or lookahead in the future.

compare

Module that executes various available functions with the resulting values. Currently, it exeutes only standard optimizer and genetic query optimizer,

Algorithms

Greedy

The idea of greedy algorithm is very simple. In every step we have n relations that we would like to join. We would check all the possible combinations of relation pairs that we could perform a join on. We calculate the cost of joining every pair and we would pickup the cheapest one and make the corresponding join in this step. This would reduce the number of relations to be joined and now we have just n-1 relations. Finally, if the number of relations is reduced we can execute the standard dynamic programming optimizer.

We could enhance the algorithm and in every step we could try joining more relations (specified by the configuration parameter). We could even look for more relations and perform just the first few of them, e.g. we could try looking for a cheapest join of 5 relations but finally joining just 3 relations in one step.

Issuses/Concerns

In some cases make_join_rel function might return false for the case where creating such a join is not possible. We have to make it clear whether the greedy approach might end in a situation with no further solution. If so, we have to make (find out ;-) an appropriate modifications.

ToDo

Implement the greedy module.
Dispose RelOptInfo structures that are not needed anymore.
Implement other modules.
Test the code.

Feedback

If you have any comments, questions or suggestions, please feel free to write to

Julius Stroffek - julo at stroffek dot net; julius dot stroffek at sun dot com
Tomas Kovarik - tkovarik at gmail dot com

Links

Join Order Search

Contents