Notes for PGConf.dev 2025 unconference sessions.

Scaling PostgreSQL Development

Autovacuum Pacing, Prioritization, and Parallelization Considerations

Can the Community Support an Additional Batch Executor?

(session and notes by Amit Langote, David Rowley)

Discussion on introducing batching and vectorized execution into PostgreSQL. Three main tracks were identified as possible starting points, each with different levels of disruption and benefit.

Current executor model:

Iterator-style Volcano model with ExecProcNode() returning one tuple at a time.
Reliable and modular, but not CPU-friendly on modern hardware.
Per-tuple overhead from branching, cache misses, and slot access patterns is high in analytical queries.

Three proposed directions (not mutually exclusive):

Batching inside ExecSeqScan()
- Keep plan and executor interfaces unchanged.
- Loop inside ExecSeqScan fetches and filters multiple tuples before returning.
- No planner changes required.
- Uses existing TupleTableSlot and expression evaluation paths.
- Straightforward speedup for simple OLAP queries.

Specialized executor for known batch-friendly plan shapes
- Separate executor that handles only specific patterns (e.g., SeqScan -> Filter -> Agg).
- Could be push-based or use columnar storage internally.
- Planner marks eligible plans for dispatch to this executor.
- Allows experimenting with limited expression evaluation for now.
- Could eventually integrate with opcode-based execution.
- Idea is to avoid complexity of general executor batching early on.

General batch-mode infrastructure
- Adds ExecProcNodeBatch() alongside existing ExecProcNode().
- Introduces TupleBatch abstraction (column-wise storage for N tuples).
- Requires adapter nodes:
  - BatchToTupleNode: unwraps batch for row-based consumers.
  - TupleToBatchNode: collects rows to form batches for batch-based consumers.
- Expression evaluation remains a big challenge—currently row-based.
- Planner will need to cost and insert adapters where needed.
- Not every plan node will benefit from batching.

Multithreaded PostgreSQL (2025 edition)

(notes by Peter Eisentraut)

The ongoing work is covered on the wiki page Multithreading. That page needs a bit of updating, focusing around specific tasks rather than problem areas.

A few things were mentioned in the session:

Many global variables should be combined into structs. There was some mild disagreement about whether it should be one big "session" struct or many smaller per-module structs, but everyone was in favor of more structs. It was also mentioned that this might help performance a bit in general by making related variables live closer together.
Still a bit unclear on the best way to handle variables backing GUC settings.
Some uncertainty about performance of all of this, will need testing.
- In particular, concerns about possible performance impact of cancellation point handling in glibc (see this article for some explanation: https://lwn.net/Articles/683118/)
pg_duckdb is an example of an extension that currently uses threads on its own. We should keep that working.

A few more things were raised during discussions afterwards:

Garbage collection: typmod cache, syscache, relcache currently rely on process exit for cleanup. With threads we need an explicit garbage collection mechanism.

The Future of pg_stat_statements

pg_stat_statements - lots of people are using it, let’s discuss potential changes for PG19 cycle

Query text storage will potentially change in 19

Sami working on patch for moving query text to the shared memory
pg_stat_statements scalability problem: query itself is currently stored in the disk (to avoid truncation of the query text)
- pg_stat_statements constantly needs to garbage collect query text file
- Running into exclusive lock during the truncation
- Garbage collection is expensive
Want to remove disk from the performance critical path
- Could be hybrid approach: both memory and disk
Keeping data across crashes/reboots
- Proposed changes don’t change the current behavior
- System crash -> lose the stats (the same as pg_stat_statements today and pg_stats)
What about truncating long query text?
- In practical terms, if we were to truncate (examples provided by Lukas):
  - Today we take about 2MB of shared memory by default (5000 entries @ ~400 bytes per entry)
  - For just the query text alone, if we stored it in shared memory:
  - 5000 entries @ 1024 query length = 5MB shared memory
  - 5000 entries @ 10240 query length = 50MB shared memory
  - 100000 entries @ 1024 query length = 100MB shared memory
- Sami notes he does not want to truncate query text with the proposed changes
  - Audience feedback agrees: From operations perspective, it’s gonna make a life a bit difficult (since we can’t pin point the exact query)
  - Additional audience feedback someone agreeing to need the whole query (to copy the query text, to identify the query)
- Could have a hybrid approach to have full query text on disk still in some way
- Could have the limit be on the memory being used, and if someone has very large query texts, only a small number of query texts would be retained
- When storing the query text in memory, we could also compress it
- Audience question: Could we store them (query text) in a system table?
  - Postgres is a great database to store things!
  - => As the main approach does not seem realistic, because of all the extra overhead that brings (i.e. we do need the shared memory storage for performance) - but maybe as part of a hybrid approach?
(Additional post session discussion in a smaller group)
- For configuring how many statements to keep, we could either have a per-entry maximum, or a total maximum of memory used
  - Total maximum of memory used seems more intuitive for users (e.g. "allocate 10MB of shared memory to query text storage, in total")
- Should we have the memory limit apply to both query texts and stats, or should we limit them separately?
  - There is benefit to keeping stats and dropping query texts if needed, e.g. when you have an external tool/script in place that can retain the queryid to query text mapping for query texts that were already removed
  - Agreement that this seems to be how other database systems have handled this (separating query texts and statistics)
- Should we allow querying for statistics without text?
  - Still useful, because text might be large and not needed (e.g. for a tool that diffs the counters and only infrequently gets the text)
  - Retrieving text may also be CPU intensive if compression is used

We have a lot of fields in pg_stat_statements, do we keep adding more?

pg_stat_statements has 55 fields right now, ~400 bytes per stats entry
Why do we care:
- Additional fields each add a small amount of overhead
- Additional fields make pg_stat_statements hard to use / understand when querying it directly (rest of the discussion focused on this aspect)
We recently added 10 fields for JIT, but do people actually use them?
- => People generally agreed these don’t seem to be used, one comment noting more generally that they recommend turning off JIT completely
We could change the current view for easier use
- e.g. categorize better, different views for different categories
- or alternatively split out rarely used fields (e.g. JIT) for easier usability
Worth improving documentation here, to talk about use cases for the different fields
Do people use std dev in pg_stat_statements?
- Only one person among ~40 people in the room
- Percentiles would be more useful (P95, P98, P99)
- Could implement via a sketch data structure (tdigest, etc)
Are pg_stat_statements and pg_stat_activity overlapping?
- => Not really in terms of what they track
- But slightly different names/meanings, etc. (datid vs dbid, query_id vs queryid, usesysid vs userid), which can be confusing

What are other cases (besides query text) where pg_stat_statements doesn't work well today?

Spin lock on statistics
- Doing tests on a beefy AWS machine, can show visible lock contention
- Not easy to resolve, Sami did some tests with atomics, but not clear how to make that work
Tracking planning time
- Lots of overhead due to the same spinlock (which is worse for planning use case), turned off by default now
- Problem with partitions
- pg_stat_statements exec time doesn’t include planning time
Lots of tables and lots of schemas, or short-lived roles (creating lots of entries)
- => Cause lots of dealloc cycles because of entries churning
- (did not discuss any solutions to this particular problem)
Improved normalization
- With ARRAY - creates many distinct pg_stat_statements entries
- Fixed in 18 for IN lists (squash), but there are other cases that are not covered (e.g. VALUES)

Other feature requests for pg_stat_statements:

Could we have a time series of pg_stat_statements? (counters are hard to use)
- Unclear how we’d persist it
- Could imagine having a generic snapshot of stats mechanism, once pluggable cumulative statistics are used for pg_stat_statements
- => Overall not much interest in the room in solving this in a pg_stat_statements specific way in the short term
Feature request: Example of unnormalized query, example of parameters to use
- Oracle has this for the first entry of the query, with best effort, capture params
- (not much discussion on this, but some agreement on the benefit)
Cancelled queries
- No stats in pg_stat_statements, would be nice to have stats
- Could we count up timeouts?
  - Should technically be possible, but need to investigate
- Audience comment: Every log line (something happened), we should count somewhere
pg_stat_statements.max GUC requires a restart, could we make it reloadable?
- => Yes, could do it when we move it to the new pluggable cumulative statistics infrastructure added in 18

Should pg_stat_statements be in core?

Why:
- Core is already defining the queryid logic / node jumbling
- Would make it easier for end users
- Could help with having consistency (when adding new stats counters, also add them to pg_stat_statements)
But, generally project policy has been to treat contrib as being almost the same as core
- Not moving things in core unless there are strong reasons to do so (e.g. text search)
- admin pack, gin, gist index, full text search -> became core
Why not in core: There are some cases that don’t work well (see earlier)
=> Overall tendency in the room seemed to not want to change this

Better Logging

- currently, logging is limited / not extensible

- multiple destinations, one hook, not working exactly the way we'd want

- more hooks?

- what to do with log lines, what format, etc.

- automation, allowing more stuff to the logs

- have been using log for various purposes, e.g. sending log lines over UDP somewhere else

- ready-made log line may be discarded, maybe we should move it after the log to save CPU

- easier way to determine from where the log line originates? we have source line, but not stable

- logging should not be expensive by default, e.g. plan logging/serialization is expensive

- pgaudit - lots of comments about improving logging

- some should be fixed in postgres: tagging of purpose (pg_stat_statements vs. admin stuff)

- some should be provided as a hook: more exotic stuff, sending over UDP somewhere, too many different possibilities, would need access to raw data, indicate what kind of data the hook needs (as a flag bitmap)

- improvement vs. extensibility point

- would be good to allow extensions to "opt in" into behavior (e.g. where to log stuff)

- we're also mixing pieces with different purposes (error messages + metadata + ...) in a single place

- structured logging - we're removing structure and logging that

- long discussion Hannu vs. Andreas about when to materialize the JSON format, how to use pre-log hooks and pass the dynamic structured data to it, etc. unclear on details

- maybe we want to do what java does, keeping the format + values until the very last moment

- some of these ideas were proposed before, never got done, question is why, maybe we should investigate

- maybe: this is the oldest part of the code, would require a lot refactoring

- but maybe we could do at least some of it without refactoring

- logging is tightly coupled with error handling, there may be some chellenges in that?

- sometimes confusing why errors / messages end up in the log file

- intentional that backand stderr goes to the same place as everything else (to catch errors from non-pg code - libraries etc.)

- format of log vs. format of log message / data - people have to grep stuff to get useful data

- facilities tagging vs. message tagging

- we have top-level format, but that still has the whole formatted string, without format

- maybe we could have some sort of mapping of places logging stuff to "format"

- there's a standard for this stuff / telemetry stuff, allowing to stitch together log streams, etc.

- with enough telemetry instrumentation, we don't need to go to logs that often

- encoding - we have problems with mixing encodings, maybe we should force log lines to be in utf-8

- not everything can be encoded in utf-8, has to fallback to hex-encoded bytes

Fixing TOAST

Commit Order on Standbys/CSN Snapshots

Expand HOT Updates

Making Planner Decisions Clear

(session led by Dian Fey, notes by Peter Eisentraut)

The request is to allow users to somehow understand planner decisions, what kinds of plans or scans or indexes the planner rejected or preferred over others. In general, this is hard because the planner does not construct all possible plans but does a lot of pruning. But there was a general agreement in the room that we could somehow produce more information when the planner makes its decisions along the way.

A simple example: print rejected indexes on a table (for example, to allow debugging type or collation mismatches)

Some new useful features in PG18:
- pg_overexplain / extensible EXPLAIN
- possible to experiment by loading different statistics (part of pg_upgrade stats upgrade feature)

There have been some failed attempts in the past, such as printing all generated subplans. This produces too much information to be useful.

There are two general reasons why a plan is not chosen: 1) the planner thought it was too expensive, 2) the plan could not be generated at all for implementation reasons; the different between these two should be made clearer.

Tom Lane explained a recent case where a pushdown into a UNION failed for complex reasons. Those kinds of cases might be very hard to explain using the kinds of mechanisms contemplated here.
Indirect cases like that are hard. We can start with direct cases.

Various questions of interest:
- Just show what was thrown away.
- Trace why indexes were rejected.
- Which access ways were considered.
- Why is join order wrong.
- Indexes and costs.

A facility for this would help users understand the planner better and avoid using cargo-culting techniques.

How would you display this?
Server log is punishing the user
Can extensible EXPLAIN be used?
Maybe a user can indicate interest in a particular object (table, index) and tracing would be limited to that.

Also show misestimations.

Two interesting cases for indexes: 1) index didn't match, 2) indexe was too expensive, in which case show dominating path
No path was generated is also useful info.

Use case: I added an index and it wasn't used, why.

Could there be some extension hooks? [Lukas from pganalyze]

hypopg extension allows analyzing hypothetical indexes, is there a way to do the opposite?
The hook that hypopg uses can do that.

Do we have a list of common questions, common problems?

A given query could have multiple problems with multiple reasons, output format etc. need to consider that.

Indexes can be not chosen for several reasons: parameterization, cost optimization
index scan vs. index-only scan
cardinality misestimation

Log more planner inputs, like stats used, what the planner though index min/max are.

There are too many different options.
Users can hint what they had in mind.
interesting index -> interesting path

Users are upset if the database doesn't do what they had in mind. Let them see possible bad paths/plans, so that get what the database is thinking.

Tracing can be expensive.
Maybe not, would be opt-in anyway.

There are many places in the code where tracing would need to be added.
Sometimes costing is done before even making a path.
Suggest putting this into EXPLAIN, start with simple things.

Note from Peter: Generally everyone in the room was positive about this discussion, and there are probably some simple and useful ways to move some things forward.

Global Indexes

Things We Love about PGConf.dev + Ideas for the Future

23 people attended this unconference session

Things We Loved / Things to Keep (in no particular order)

Food both at venue & at social event = +1
Talks
Meeting people
Hallway Track
- Lots of +1 for hallway track, from both Schedule & Venue angles
- That we could HEAR each other in hallway track
- Lots of different places to talk in hallways
Schedule
- 25 min talk timeslots (lots of +1s to this)
- Plentiful breaks for hallway track
- Thurs schedule started later in AM (given previous Wed night social)
Tuesday activities
- liked that conference was more than just talks
- good to try new things
- existence of workshops & summits
- that it was a dedicated day before full conferemce began
Community Booth
- liked Booth sessions, including Hacking/Building Postgres
- liked using Community Booth as general meeting area
Social @ Time-Out Montreal (lots of +1s to this)
- variety of food & drink
- ability to move around & talk to different groups
- best-ever social at a conference

Ideas for Ways to Improve (in no particular order)

Games to interact with people @ social event
Announce Tue Schedule earlier, to influence attendee travel plans
Meet & Eat Dinner groups on Tue
- would be better if group fits on 1 table & is not split up
- there should be a dedicated owner to organize this in advance
- some people found out about this too late
Community Booth could benefit from
- clear mission & owner
- promotion before/digitally & during/print
- ease of discovery (especially on Schedule page)
Community Booth new types of content ideas
- having "open space" sessions
- AMAs in Community Booth with PG experts & committers
- 15-min short deep-dive trainings by experts
Poster Session could benefit from
- office hours dedicated time where makers present their posters
Lightning Talks
- should have MORE of them!
- perhaps have lightning talks on both Wed & Thu afternoons
- lightning talks earlier in conference week can serve as conversation starters
Kickoff Ideas:
- Display Visuals (Bio pic + Talk Titles) for Each Day's Talks/Speakers at Morning Kickoffs
Breakfast Ideas to Serve as Icebreakers for New Folks
- Speaker breakfast—or New Speaker Breakfast
- First-Time Attendee breakfast—or New to Postgres breakfast
Unconference Schedule
- ideally don't smash unconference sessions together
- allow for coffeebreaks in between & hallway conversations
Microphones
- please have lavalier mics (not just handheld mics)
Survey Enticements
- Maybe raffle off a 50% discount to next year's ticket
Ideas about Talks
- Bit more Q&A time for Talks
- Maybe more Introductory sessions?
- Investor session, with someone presenting on Industry Trends
Schedule Announcement
- notes say "get run on Schedule early (to influence flights)"
- ^^^ if the Tuesday schedule came out later than the regular schedule, maybe that was the concern? that people had booked flights before discovering Tuesday content that interested them?

Vote Results from "Show of Hands" Votes

Majority Voted that current 4 days is optimal length of conf

Questions Discussed (with no resolution recorded)

Should Tuesday be targeted for all attendees OR just subset?
What is the target goal (for community booth?), should it be new content?

PGConf.dev 2025 Developer Unconference

Contents