Parallel Query Execution

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(Purpose of Parallelism: update wording)
(Purpose of Parallelism: wording)
Line 5: Line 5:
 
Postgres currently supports full parellism in client-side code.  Applications can open multiple database connections and manage them asyncronously, or via threads.
 
Postgres currently supports full parellism in client-side code.  Applications can open multiple database connections and manage them asyncronously, or via threads.
  
On the server-side, there is some parallelism:
+
On the server-side, there is already some parallelism:
  
 
*  [http://www.postgresql.org/docs/9.2/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR effective_io_concurrency] allows index page prefetch requests to the kernel, for bitmap joins
 
*  [http://www.postgresql.org/docs/9.2/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR effective_io_concurrency] allows index page prefetch requests to the kernel, for bitmap joins

Revision as of 02:54, 15 January 2013

This is currently under development. See the ToDo list.

Contents

Purpose of Parallelism

Postgres currently supports full parellism in client-side code. Applications can open multiple database connections and manage them asyncronously, or via threads.

On the server-side, there is already some parallelism:

  • server-side languages can potentially do parallel operations

Challenges of Parallelism

For parallelism to be added to a single-threaded task, the task must be able to be broken into sufficiently-large parts and executed independently. (If the sub-parts are too small, the overhead of doing parallelism overwhelms the benefits of parallelism.) Unfortunately, unlike a GUI application, the Postgres backend executes a query by performing many small tasks that must be executed in sequence, e.g. parser, planner, executor.

This means that databases allow parallelism only in limited situations, mostly for large queries that can become CPU or I/O bound. For example, it is unlikely that selecting a row based on a primary key would benefit from parallelism.

Project Goal

  • Implement parallel query
  • Implementation will use one master process (current backend) and multiple slaves processes forked from postmaster as a result of masters signal to postmaster.

Issues

  • Shared memory
    • new shared memory context which uses ossp mm library
    • limitation – so far we do not bother with attaching to shared in execbackend case, so slaves can only be forked from postmaster
  • Slave process
    • initialization almost the same as standard backend, only username and database is from master process
    • limitation – additional pg modules loaded in backend are not reloaded in slave

ToDo

  • parallel sort using multiple processes
    • in nodesort distribute incoming tuples to slaves using hash function
    • implement producer consument structure in shared memory to allow sending data between processes
    • implement final merge phase of slave results

Process vs Thread

  • Process +
    • Existing code does not need to be rewritten to be thread safe
  • Thread +
    • No special effort to share data between threads
  • Process -
    • Speed issues in switching context
  • Thread -
    • Not thread safe code
Personal tools