From PostgreSQL wiki
This material is referenced by a flowchart.
Creates initial template database via initdb Because PostgreSQL requires access to system tables for almost every operation, getting those system tables in place is a problem. You can't just create the tables and insert data into them in the normal way, because table creation and insertion requires the tables to already exist. This code jams the data directly into tables using a special syntax used only by the bootstrap procedure.
Passes control to postmaster or postgres This checks the process name(argv) and various flags, and passes control to the postmaster or postgres backend code.
Controls postgres server startup/termination This creates shared memory, and then goes into a loop waiting for connection requests. When a connection request arrives, a postgres backend is started, and the connection is passed to it.
Backend libpq library routines This handles communication to the client processes.
Main Query Flow
Traffic cop, dispatches request to proper module This contains the postgres backend main handler, as well as the code that makes calls to the parser, optimizer, executor, and commands functions.
Converts SQL query to query tree This converts SQL queries coming from libpq into command-specific structures to be used by the optimizer/executor, or commands routines. The SQL is lexically analyzed into keywords, identifiers, and constants, and passed to the parser. The parser creates command-specific structures to hold the elements of the query. The command-specific structures are then broken apart, checked, and passed to commands processing routines, or converted into Listsof Nodes to be handled by the optimizer and executor.
Rule and view support
Creates path and plan
This uses the parser output to generate an optimal plan for the executor.
Creates path from parser output This takes the parser query output, and generates all possible methods of executing the request. It examines table join order, where clause restrictions, and optimizer table statistics to evaluate each possible execution method, and assigns a cost to each.
Genetic query optimizer optimizer/path evaluates all possible ways to join the requested tables. When the number of tables becomes great, the number of tests made becomes great too. The Genetic Query Optimizer considers each table separately, then figures the most optimal order to perform the join. For a few tables, this method takes longer, but for a large number of tables, it is faster. There is an option to control when this feature is used.
Optimizes path output This takes the optimizer/path output, chooses the path with the least cost, and creates a plan for the executor.
Handle special plan cases This does special plan processing.
Optimizer support routines This contains support routines used by other parts of the optimizer.
Executes complex node plans from optimizer This handles select, insert, update, and delete statements. The operations required to handle these statement types include heap scans, index scans, sorting, joining tables, grouping, aggregates, and uniqueness.
Commands that do not require the executor These process SQL commands that do not require complex handling. It includes vacuum, copy, alter, create table, create type, and many others. The code is called with the structures generated by the parser. Most of the routines do some processing, then call lower-level functions in the catalog directory to do the actual work.
System catalog manipulation This contains functions that manipulate the system tables or catalogs. Table, index, procedure, operator, type, and aggregate creation and manipulation routines are here. These are low-level routines, and are usually called by upper routines that pre-format user requests into a predefined format.
Various data access methods These control the way data is accessed in heap, indexes, and transactions.
Common access routines
Generalized inverted index access method
generalized search tree access method
hash access method
heap is use to store data rows
used by all index types
Lehman and Yao's btree management algorithm
Space-Partitioned GiST access method
transaction manager (BEGIN/ABORT/COMMIT)
creation/manipulation of nodes and lists PostgreSQL stores information about SQL queries in structures called nodes. Nodes are generic containers that have a type field and then a type-specific data section. Nodes are usually placed in Lists. A List is container with an elem element, and a next field that points to the next List. These List structures are chained together in a forward linked list. In this way, a chain of List s can contain an unlimited number of Node elements, and each Node can contain any data type. These are used extensively in the parser, optimizer, and executor to store requests and data.
Manages various storage systems These allow uniform resource access by the backend.
Shared buffer pool manager
Free space map
Semaphores and shared memory
built-in data type routines This contains all the PostgreSQL builtin data types.
system/relation/function cache routines PostgreSQL supports arbitrary data types, so no data types are hard-coded into the core backend routines. When the backend needs to find out about a type, is does a lookup of a system table. Because these system tables are referred to often, a cache is maintained that speeds lookups. There is a system relation cache, a function/operator cache, and a relation information cache. This last cache maintains information about all recently-accessed tables, not just system ones.
error reporting routines Reports backend errors to the front end.
function manager This handles the calling of dynamically-loaded functions, and the calling of functions defined in the system tables.
hash routines for internal algorithms These hash routines are used by the cache and memory-manager routines to do quick lookups of dynamic data storage structures maintained by the backend.
various initialization stuff
single and multibyte encoding
memory manager(process-local memory) When PostgreSQL allocates memory, it does so in an explicit context. Contexts can be statement-specific, transaction-specific, or persistent/global. By doing this, the backend can easily free memory once a statement or transaction completes.
resource owner tracking
sort routines for internal algorithms When statement output must be sorted as part of a backend operation, this code sorts the tuples, either in memory or using disk files.
transaction time qualification routines These routines do checking of tuple internal columns to determine if the current row is still valid, or is part of a non-committed transaction or superseded by a new row.
include files There are include directories for each subsystem.
support library This houses several generic routines.
regular expression library This is used for regular expression handling in the backend, i.e. '~'.
Snowball support This is used to support full text Snowball stemming library.
streaming replication Supports streaming replication via log shipping.
text search library This is used to support full text searching.