Parallel Query Execution

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(Remove mention of 8.3 test branch.)
(Remove dead 8.3 branch mentions)
Line 18: Line 18:
 
** implement producer consument structure in shared memory to allow sending data between processes
 
** implement producer consument structure in shared memory to allow sending data between processes
 
** implement final merge phase of slave results
 
** implement final merge phase of slave results
 
==Progress==
 
* 04/09 - <code>PGTHREAD</code> structure which will hold information about locks
 
** they have to be stored in shared memory as <code>PGPROC</code>
 
** proc array remains unchanged because thread will not be creating its own transactions
 
* 04/09 - Lwlock.c - lowlock are granted to threads
 
* 05/09 - signal handling
 
** what to use in protecting access to <code>num_held_lwlocks</code> and <code>held_lwlocks[]</code> - pg spinlocks or <code>pthread_mutex_t</code>?
 
** it has to be initialized for each thread, fast, secure
 
* August–September 2009 - switching to implementation with processess as a result of Zdenek’s discussion with Tom and Simon
 
* September 2009 - implementing new shared memory context based on multiplatform ossp mm library
 
* October 2009 - figuring out the architecture of processes
 
* November 2009 - implementing fully initialized slave backend process, created after master process send signal to postmaster
 
* December 2009 - distributing tuples to worker processes in nodesort.c to performsort in them, final merge in master process
 
  
 
==Process vs Thread==
 
==Process vs Thread==

Revision as of 00:41, 15 January 2013

This is currently under development. See the ToDo list.

Contents

Project Goal

  • Implement parallel query
  • Implementation will use one master process (current backend) and multiple slaves processes forked from postmaster as a result of masters signal to postmaster.

Issues

  • Shared memory
    • new shared memory context which uses ossp mm library
    • limitation – so far we do not bother with attaching to shared in execbackend case, so slaves can only be forked from postmaster
  • Slave process
    • initialization almost the same as standard backend, only username and database is from master process
    • limitation – additional pg modules loaded in backend are not reloaded in slave

ToDo

  • parallel sort using multiple processes
    • in nodesort distribute incoming tuples to slaves using hash function
    • implement producer consument structure in shared memory to allow sending data between processes
    • implement final merge phase of slave results

Process vs Thread

  • Process +
    • Existing code does not need to be rewritten to be thread safe
  • Thread +
    • No special effort to share data between threads
  • Process -
    • Speed issues in switching context
  • Thread -
    • Not thread safe code
Personal tools