Parallel Query Execution

From PostgreSQL wiki

Revision as of 16:17, 19 May 2012 by Boshomi (Talk | contribs)

Jump to: navigation, search
This is currently under development. See the ToDo list.

Contents

Project Goal

  • Implement parallel query
  • Implementation will use one master process (current backend) and multiple slaves processes forked from postmaster as a result of masters signal to postmaster.

Issues

  • Shared memory
    • new shared memory context which uses ossp mm library
    • limitation – so far we do not bother with attaching to shared in execbackend case, so slaves can only be forked from postmaster
  • Slave process
    • initialization almost the same as standard backend, only username and database is from master process
    • limitation – additional pg modules loaded in backend are not reloaded in slave

ToDo

  • parallel sort using multiple processes
    • in nodesort distribute incoming tuples to slaves using hash function
    • implement producer consument structure in shared memory to allow sending data between processes
    • implement final merge phase of slave results

Progress

  • 04/09 - PGTHREAD structure which will hold information about locks
    • they have to be stored in shared memory as PGPROC
    • proc array remains unchanged because thread will not be creating its own transactions
  • 04/09 - Lwlock.c - lowlock are granted to threads
  • 05/09 - signal handling
    • what to use in protecting access to num_held_lwlocks and held_lwlocks[] - pg spinlocks or pthread_mutex_t?
    • it has to be initialized for each thread, fast, secure
  • August–September 2009 - switching to implementation with processess as a result of Zdenek’s discussion with Tom and Simon
  • September 2009 - implementing new shared memory context based on multiplatform ossp mm library
  • October 2009 - figuring out the architecture of processes
  • November 2009 - implementing fully initialized slave backend process, created after master process send signal to postmaster
  • December 2009 - distributing tuples to worker processes in nodesort.c to performsort in them, final merge in master process

Process vs Thread

  • Process +
    • Existing code does not need to be rewritten to be thread safe
  • Thread +
    • No special effort to share data between threads
  • Process -
    • Speed issues in switching context
  • Thread -
    • Not thread safe code

Notes

  • Developing under 8.3.7 source code (plan to migrate to newest source code version in pg git repository)
Personal tools