Multithreading
Resources
pgsql-hackers thread: https://www.postgresql.org/message-id/31cc6df9-53fe-3cd9-af5b-ac0d801163f4%40iki.fi
PGConf.eu Presentation: https://www.postgresql.eu/events/pgconfeu2023/schedule/session/4845-multi-threaded-postgresql/
Work-in-progress git branch: https://github.com/hlinnaka/postgres/tree/threading
Ongoing work
These patches are contributing to the effort:
- SendProcSignal() → SendInterrupt(), using latches: https://commitfest.postgresql.org/49/5118/
Done:
- replace strtok(): https://commitfest.postgresql.org/48/5071/
- thread-safety: gmtime_r(), localtime_r() https://commitfest.postgresql.org/48/5084/
- Refactoring postmaster's code to cleanup after child exit: https://www.postgresql.org/message-id/8f2118b9-79e3-4af7-b2c9-bd5818193ca4%40iki.fi
- Remove dependency on setlocale() for collation. https://commitfest.postgresql.org/48/5023/
TODO
Global variables
Global variables in PostgreSQL fall into a few different categories:
- per-session state
- pointers to shared memory areas
- other variables that are initialized at postmaster startup and never change after that
- constants
Many global variables are used to hold GUCs, and they can fall into any of those categories depending on whether they're PGC_POSTMASTER or PGC_USERSET.
The plan:
- Add annotations to all global variables, to mark which category they fall into. Variables holding per-session state are turned into thread-local variables. The distinction between the other categories is just for documentation purposes.
- Provide a tool that can list all global variables that are missing the annotations (pgguclifetimes). Add that as a compile-time check ideally. This will also be useful for extension authors to find missing annotations in extensions.
- alternative idea: could we use a simple perl script for this, like we do for the PGDLLIMPORT case?
- Potential standalone value:
- "modified once" annotated variables could be put in a separate binary section which would improve performance
- Could help with the engineering overhead of figuring out where to initialize at postmaster startup
Extensions
Extensions will make the transition at their own pace
- Add field to control file or the PG_MODULE magic to mark whether an extension supports multi-process mode only, or multi-threaded model only, or both
- other considerations: good documentation will be key, also need to work with the extension community to build understanding and consensus
- Python has traditionally been single-threaded, but see https://peps.python.org/pep-0554/
- Python >= 3.12 can spawn non-shared interpreters
- What about Perl, other PLs?
- Robert volunteered to look into this at 2024.pgconf.dev
PIDs in user-facing interfaces
A few places expose PIDs to users:
- pg_terminate_backend(<pid>)
- pg_stat_activity
- query cancellation
Plan:
- Replace PID with a thread id.
- Could use OS thread ID, but behavior isn't totally consistent across platforms
- Consensus at 2024.pgconf.dev seemed to be that inventing our own 32-bit thread ID would be better
- Possibly pgprocno + counter
- Need to make sure we don't reuse thread IDs too quickly, else we might terminate the wrong backend.
- Potential standalone value: decoupling the relationship between thread and session could enable other features
- Robert volunteered to look into this at 2024.pgconf.dev
Unix Signals
Separate page with more details on this topic.
We use Unix signals between processes currently. It's hard to use them for inter-thread communication within the same process. When you send a signal, you send it to a process, and any thread in the process can handle it.
- Refactor inter-process signals with something like Procsignal and latches
- With a single-process, you cannot easily "kill <pid>" to send SIGTERM or SIGUSR1 to a single backend anymore. We can provide a "pg_ctl signal" command to replace that.
- Description/Tasks/Components:
- timers (which use sigalarm)
- deadlock detection? (already done with waitlatch so nothing happening in signal handlers so nothing to do here?)
- Sub-sub projects
- multiplex waiting on a latch and don't rely on signals (latchification project -- replacing setlatch with interrupt driven concept)
- Abstraction for the C11 thread functions that is portable (think about C11 features and C99 compiler)
- Thomas and Heikki volunteered to look into this at 2024.pgconf.dev
Replace non-thread safe library functions with thread-safe variants
- strerror() -> strerror_r()
- setlocale() -> uselocale()
- getopt_long()
- getopt_long implementation used for windows could be used for other platforms
- rearchitect postmaster startup to change options passing
- Make pg_strtok() re-entrant
- not all string functions with _l have been implemented in glibc
- Peter and Nathan volunteered to look into this at 2024.pgconf.dev
Misc
- Add a GUC to enable multi-threaded mode
- Replace fork() with posix_create(). Yay!
- Make virtual filedescriptors (fd.c) work with threads
- Heikki volunteered to look into this at 2024.pgconf.dev (with Andres identifying performance fallout)
- increase the limit for max number of file descriptors
- diagnose and fix the performance fallout (due to sharing VFD cache and maybe because some OSes don't cope well with large numbers of file descriptors)
- Make bootparse.y and other bison/flex generated code re-entrant
- main parser is already done, but others need work, which is thought to be simple
- Peter volunteered to look into this at 2024.pgconf.dev
- Refactoring guc_tables.c
- currently uses memory address of global GUC variable (and that won't work anymore with threads)
- Refactor connection acceptance
- Currently the postmaster accepts connections and then spawns a process
- In the threaded model we'll want 2 processes, a supervisor that only restarts the other process, and the main process with all the threads
- But that means connection acceptance needs to be split out from supervisor tasks
- verify_cb in be-secure-openssl.c uses a (static) global cert_errdetail variable to pass information. Refactor to use x509_store_ctx_get_ex_data
Open Questions
- If an extension launches threads today, it might call SPI or other PostgreSQL functions from a different thread. That works today, if you're careful and only do that from one thread at at time. After replacing global variables with thread-local variables, that no longer works. Or did it work? We have things like longjmp() and stack depth checks that would already break that?