Parallel Query Execution
From PostgreSQL wiki
(→Parallelism Opportunties: add multi-table) |
(update headings) |
||
| Line 2: | Line 2: | ||
---- | ---- | ||
| - | ==Purpose | + | ==Purpose== |
Postgres currently supports full parellism in client-side code. Applications can open multiple database connections and manage them asyncronously, or via threads. | Postgres currently supports full parellism in client-side code. Applications can open multiple database connections and manage them asyncronously, or via threads. | ||
| Line 13: | Line 13: | ||
* server-side languages can potentially do parallel operations | * server-side languages can potentially do parallel operations | ||
| - | ==Challenges | + | ==Challenges== |
For parallelism to be added to a single-threaded task, the task must be able to be broken into sufficiently-large parts and executed independently. (If the sub-parts are too small, the overhead of doing parallelism overwhelms the benefits of parallelism.) Unfortunately, unlike a GUI application, the Postgres backend executes a query by performing many small tasks that must be executed in sequence, e.g. parser, planner, executor. | For parallelism to be added to a single-threaded task, the task must be able to be broken into sufficiently-large parts and executed independently. (If the sub-parts are too small, the overhead of doing parallelism overwhelms the benefits of parallelism.) Unfortunately, unlike a GUI application, the Postgres backend executes a query by performing many small tasks that must be executed in sequence, e.g. parser, planner, executor. | ||
This means that databases allow parallelism only in limited situations, mostly for large queries that can become CPU or I/O bound. For example, it is unlikely that selecting a row based on a primary key would benefit from parallelism. In contrast, large queries can often benefit from parallelism. | This means that databases allow parallelism only in limited situations, mostly for large queries that can become CPU or I/O bound. For example, it is unlikely that selecting a row based on a primary key would benefit from parallelism. In contrast, large queries can often benefit from parallelism. | ||
| - | ==Benefits | + | ==Benefits== |
There are three possible benefits of parallelism: | There are three possible benefits of parallelism: | ||
| Line 27: | Line 27: | ||
* using multiple CPUs and I/O channels | * using multiple CPUs and I/O channels | ||
| - | == | + | ==Approaches== |
There are several methods to add parallelism: | There are several methods to add parallelism: | ||
| Line 36: | Line 36: | ||
* create full backends that can execute parts of a query in parallel and return results | * create full backends that can execute parts of a query in parallel and return results | ||
| - | == | + | ==Opportunties== |
Parallel opportunities include: | Parallel opportunities include: | ||
Revision as of 04:29, 15 January 2013
Contents |
Purpose
Postgres currently supports full parellism in client-side code. Applications can open multiple database connections and manage them asyncronously, or via threads.
On the server-side, there is already some parallelism:
- effective_io_concurrency allows index page prefetch requests to the kernel, for bitmap joins
- helper processes like background writer and wal writer offload I/O requirements from the main query execution process
- server-side languages can potentially do parallel operations
Challenges
For parallelism to be added to a single-threaded task, the task must be able to be broken into sufficiently-large parts and executed independently. (If the sub-parts are too small, the overhead of doing parallelism overwhelms the benefits of parallelism.) Unfortunately, unlike a GUI application, the Postgres backend executes a query by performing many small tasks that must be executed in sequence, e.g. parser, planner, executor.
This means that databases allow parallelism only in limited situations, mostly for large queries that can become CPU or I/O bound. For example, it is unlikely that selecting a row based on a primary key would benefit from parallelism. In contrast, large queries can often benefit from parallelism.
Benefits
There are three possible benefits of parallelism:
- using multiple CPUs
- using multiple I/O channels
- using multiple CPUs and I/O channels
Approaches
There are several methods to add parallelism:
- user fork (or a thread on Windows) and only call libc and parallel-specific functions to do parallel computation or I/O. This avoids the problem of trying to make the existing backend code thread-safe.
- same as above, but modify some existing backend modules to be fork/thread-safe, with or without shared memory access; this might allow entire executor node trees to be run in parallel
- create full backends that can execute parts of a query in parallel and return results
Opportunties
Parallel opportunities include:
- sorting
- tablespaces
- partitions
- multi-table queries
