Backend flowchart
This material is referenced by a flowchart.
Initialization
bootstrap
Creates initial template database via initdb Because PostgreSQL requires access to system tables for almost every operation, getting those system tables in place is a problem. You can't just create the tables and insert data into them in the normal way, because table creation and insertion requires the tables to already exist. This code jams the data directly into tables using a special syntax used only by the bootstrap procedure.
main
Passes control to postmaster or postgres This checks the process name(argv[0]) and various flags, and passes control to the postmaster or postgres backend code.
postmaster
Controls postgres server startup/termination This creates shared memory, and then goes into a loop waiting for connection requests. When a connection request arrives, a postgres backend is started, and the connection is passed to it.
libpq
Backend libpq library routines This handles communication to the client processes.
Main Query Flow
tcop
Traffic cop, dispatches request to proper module This contains the postgres backend main handler, as well as the code that makes calls to the parser, optimizer, executor, and commands functions.
parser
Converts SQL query to query tree This converts SQL queries coming from libpq into command-specific structures to be used by the optimizer/executor, or commands routines. The SQL is lexically analyzed into keywords, identifiers, and constants, and passed to the parser. The parser creates command-specific structures to hold the elements of the query. The command-specific structures are then broken apart, checked, and passed to commands processing routines, or converted into Listsof Nodes to be handled by the optimizer and executor.
rewrite
Rule and view support
optimizer
Creates path and plan
This uses the parser output to generate an optimal plan for the executor.
optimizer_path
Creates path from parser output This takes the parser query output, and generates all possible methods of executing the request. It examines table join order, where clause restrictions, and optimizer table statistics to evaluate each possible execution method, and assigns a cost to each.
optimizer_geqo
Genetic query optimizer optimizer/path evaluates all possible ways to join the requested tables. When the number of tables becomes great, the number of tests made becomes great too. The Genetic Query Optimizer considers each table separately, then figures the most optimal order to perform the join. For a few tables, this method takes longer, but for a large number of tables, it is faster. There is an option to control when this feature is used.
optimizer_plan
Optimizes path output This takes the optimizer/path output, chooses the path with the least cost, and creates a plan for the executor.
optimizer_prep
Handle special plan cases This does special plan processing.
optimizer_util
Optimizer support routines This contains support routines used by other parts of the optimizer.
executor
Executes complex node plans from optimizer This handles select, insert, update, and delete statements. The operations required to handle these statement types include heap scans, index scans, sorting, joining tables, grouping, aggregates, and uniqueness.
Command Support
commands
Commands that do not require the executor These process SQL commands that do not require complex handling. It includes vacuum, copy, alter, create table, create type, and many others. The code is called with the structures generated by the parser. Most of the routines do some processing, then call lower-level functions in the catalog directory to do the actual work.
catalog
System catalog manipulation This contains functions that manipulate the system tables or catalogs. Table, index, procedure, operator, type, and aggregate creation and manipulation routines are here. These are low-level routines, and are usually called by upper routines that pre-format user requests into a predefined format.
access
Various data access methods These control the way data is accessed in heap, indexes, and transactions.
access_common
Common access routines
access_gin
Generalized inverted index access method
access_gist
generalized search tree access method
access_hash
hash access method
access_heap
heap is use to store data rows
access_index
used by all index types
access_nbtree
Lehman and Yao's btree management algorithm
access_spgist
Space-Partitioned GiST access method
access_transam
transaction manager (BEGIN/ABORT/COMMIT)
nodes
creation/manipulation of nodes and lists PostgreSQL stores information about SQL queries in structures called nodes. Nodes are generic containers that have a type field and then a type-specific data section. Nodes are usually placed in Lists. A List is container with an elem element, and a next field that points to the next List. These List structures are chained together in a forward linked list. In this way, a chain of List s can contain an unlimited number of Node elements, and each Node can contain any data type. These are used extensively in the parser, optimizer, and executor to store requests and data.
storage
Manages various storage systems These allow uniform resource access by the backend.
storage_buffer
Shared buffer pool manager
storage_file
File manager
storage_freespace
Free space map
storage_ipc
Semaphores and shared memory
storage_large_object
Large objects
storage_lmgr
Lock manager
storage_page
Page manager
storage_smgr
Storage/disk manager
utils
support routines
utils_adt
built-in data type routines This contains all the PostgreSQL builtin data types.
utils_cache
system/relation/function cache routines PostgreSQL supports arbitrary data types, so no data types are hard-coded into the core backend routines. When the backend needs to find out about a type, is does a lookup of a system table. Because these system tables are referred to often, a cache is maintained that speeds lookups. There is a system relation cache, a function/operator cache, and a relation information cache. This last cache maintains information about all recently-accessed tables, not just system ones.
error
error reporting routines Reports backend errors to the front end.
utils_fmgr
function manager This handles the calling of dynamically-loaded functions, and the calling of functions defined in the system tables.
utils_hash
hash routines for internal algorithms These hash routines are used by the cache and memory-manager routines to do quick lookups of dynamic data storage structures maintained by the backend.
utils_init
various initialization stuff
utils_mb
single and multibyte encoding
utils_misc
miscellaneous stuff
utils_mmgr
memory manager(process-local memory) When PostgreSQL allocates memory, it does so in an explicit context. Contexts can be statement-specific, transaction-specific, or persistent/global. By doing this, the backend can easily free memory once a statement or transaction completes.
utils_resowner
resource owner tracking
utils_sort
sort routines for internal algorithms When statement output must be sorted as part of a backend operation, this code sorts the tuples, either in memory or using disk files.
utils_time
transaction time qualification routines These routines do checking of tuple internal columns to determine if the current row is still valid, or is part of a non-committed transaction or superseded by a new row.
Support Facilities
include
include files There are include directories for each subsystem.
lib
support library This houses several generic routines.
port
compatibility routines
regex
regular expression library This is used for regular expression handling in the backend, i.e. '~'.
snowball
Snowball support This is used to support full text Snowball stemming library.
replication
streaming replication Supports streaming replication via log shipping.
tsearch
text search library This is used to support full text searching.