PGConf.dev 2024 Extension Summit

From PostgreSQL wiki
Jump to navigationJump to search

Notes from Break-out Sessions

The 2024 Extension Ecosystem Summit took place at PGconf.dev in Vancouver, BC, on May 28, 2024. This was an Open Space Technology/Unconference-style session in which attendees identified topics to address, self-selected into teams, and discussed them. This page records the notes taken from each session.

Extension Metadata

Participants

  • Abigale Kim, Carnegie Mellon University
  • David Wagoner, EnterpriseDB
  • Jeremy Schneider, AWS
  • John Chen, AWS
  • Samay Sharma, Tembo


Topics Discussed

  • Taxonomy
  • Versioning
  • Metadata
  • System Dependencies


Key Points

  • VSCode Model for Extensions
    • Advantages:
      • Extension marketplace
      • Ratings
      • Developer engagement
      • Searchability

Another Example: www.pgt.dev (trunk)


Packaging and Discoverability

  • Packaging: Packaging is focused on Debian for cloud usage at Tembo. The responsibility of platform management (builds for different operating systems) needs clarification (potentially with input from Devrim)
  • User Comments: This was the most highly valued feature in user research on extensions. People wanted to read other users' comments.
  • Rating Analytics: Include rating analytics in extension metadata? Or keep ratings on the repo website?
  • Discoverability Goal: Encourage users to use two more extensions within a specified time frame
  • Proposed Feature: Make extensions easy to test by using database branching to quickly and easily create a Postgres instance with the extension on the repo website, perhaps with a demo or tutorial of how to use the extension


Extension Development and Compatibility

  • PGXS: Standard build system for extensions
  • Compatibility Testing: Abigale Kim is working on an automated extension test system at Carnegie Mellon University
  • Usage Insights: Many extensions are installed but rarely used
  • Metadata Sources:
    • Developer-provided meta.json files with tags
    • User Ratings, potentially managed via a website


Taxonomies

  • Tag Utilization:
    • Emphasis on tags over extension names for use-case identification


Versioning

  • Compatibility Information: Users should know which extension versions build against which PG versions
  • Standards:
    • Prefer semantic versioning (semver)
    • Define source registry standards
  • Existing Extensions: Adjustments like adding zeros to the version number for compatibility in PGXN version format
  • Version Tracking: PostgreSQL tracks extension versions in the catalog
  • Version Comparison: Necessary to determine newer or older versions
  • Downgrade Function: Timescale can downgrade extensions, for example


Binary Distribution Format

  • Participants
    • David W. (Tembo)
    • Devrim G. (EDB, RPM Packaging)
    • Tomasz R. (Debian Developer, Package Maintainer)
    • Ruohang F.
    • Andreas S.
  • login hook became event triggers
    • older extensions stop working
  • On Debian, one distribution aims to have one version of each
  • On RPM, multiple versions are available
  • PostgreSQL doesn't offer a way to have multiple versions of one extension
  • Use the Python Wheel format
  • PostgreSQL has no way to have two versions of an extension installed
    • And use a newer library version during extension upgrade
    • And package managers can't install two versions
  • OCI is using multiple tar files and overlays to build an image
    • Either repositories subscribe to feeds, and build the package from it
    • Or depend on the OCI files to build the packages
  • Currently there is no way to determine which version number is newer
    • Something like semver requirement will solve this

ABI/API discussion

Executive summary

Some amount of time was spent on what the definition of the Postgres ABI/API: specific API calls, headers, data structures. Minor versions are really still their own semantic major version in the way specific things change.

Postgres internals are generally developed for itself; extensions are secondary. "Dirty tricks" are symptoms of needed functionality for extensions and a good indicator of where we might need additional work.

Some code archeaeology could help provide some useful information about which APIs could be called and perform different actions; example: allocate memory or throw errors.

Individual function calls could be qualified depending on how "safe" they are; green, red, and in-between. Sanctioned APIs are useful, but extensions should also be able to call deeper hooks where required.

Current hooks are less than complete system for extensions; a lot of interactions and trust for individual extensions' good behavior. Better might be a solution which uses hook registration with more complex behaviors possible outside of the extension's own knowledge.

Additional improvements to extension capabilities would be more hooks in more places, custom relation type handling, better background worker support per database, improved SPI inside background workers, better support for generic Node walking/visitor pattern, better depedency handling, even including specific minor postgres version.

Detailed Discussion Notes

 * Raw notes: ABI/API discussion
 - "Am I allowed to do that"
   - generally no, but no other way
   - postgres is developed for postgres, not for extensions
     - a problem for extensions, not postgres
     - but also for postgres
   - is .h file the API?
   - not static, a lot of hacks needed to get things going
     - might as well use it
 - major release vs minor release as far as API support
   - is it really needed in core?
     - using a lot of DSM/DSA APIs currently, but is completely replaceable, doesn't depend on core
     - can use APIs today, but if things change we can rework things in the future
   - doesn't work for things that are core to the database: relation, catalogs, etc
 - data structures
   - some minor versions can change structures
     - have to compile against minor versions for each release
     - can copy things that have releases, those won't change
 - some changes in major versions, #ifdefs kind of unavoidable
 - some specific difficulties seen in JIT: opcodes renumbering in the JIT compiler
 - a need for postgres archeaology:
   - which functions call which other functions, what might allocate memory, throw errors, etc
   - formal verification methods
   - how to get knowledge out of tribal knowledge and into formal knowledge
 - core is necessarily focussed on their own problems, but have a need for extensions pushing back
 - what about large number of header files that support specific ABI for 5 versions in a similar fashion
   - any C extensions of any complexity end up working on their own compatibility layer for many pieces
   - "there are no best practices, just hard work" -- timescale guy
 - what does the API mean?
   - there is a lot of legacy; good and bad outcomes
   - if there is a sanctioned API, that's good, but should also be okay for other extensions to do deeper work/hooks, etc
     - also don't want to break things unnecessarily
   - some level of data gathering related to moving code from red zone to green zone
   - extensions need help promoting changes into core; need popularity/enough code/common ground with multiple extensions to push missing layer into core
 - axes for extension code awareness
   - popularity
   - utility
   - will give data
 - qualifying function/API calls via safety/preferredness
 - testing/buildfarm for extensions
   - compare/build extension scripts to be able to automate testing against every commit
   - opt-in to extension integrations
     - can build on a regular cadence
     - maybe weekly
     - at least before release, can see about resolution
     - might be in everyone's best interests to work together
 - hooks
   - Timescale says missing a lot, specific ones
     - partitioning, etc
     - some issues: no read contract
       - see about establishing a contract for each hook
       - signature of the function
   - are extensions supposed to be ABI-compatible?
     - can't really treat them as such; need to compile separately
     - see Yurii's talk about details
     - generally minor changes, but still major in terms of semantic versioning
   - the current system is very insufficient
     - stack-based is very limiting
     - unmanageable, cannot remove just one hook
       - hook private
     - Omnigres worked on linear system for hooks approach, aye/nay, etc, moving the call outside of the extension itself
       - also want ability to show which hook was responsible for what amount of time being spent, etc
       - currently solved outside of postgres, but proposing this solution for core as-of specific major version
 - other ABI extension issues:
   - no way to tie an extension version specific to the server version
     - a way to blow up at build or runtime without
   - need interim and long-term plans for changes for core
   - is this really just a better build solver?
     - can solve today by erroring out in the migration script
     - or breaking in Makefile
 - what is better than hooks?
   - should you register your hooks instead of just shared library load
     - pg could take care of the details when multiple hooks are provided
     - could have extensions loaded, introspection, say yes/no
     - (omni extension says it does a lot of that)
     - are already some number of extensions with custom scans, etc
   - why aren't all scans CustomScans by default?
 - other compatibility:
   - nodes are changing between versions
     - visitor pattern doesn't work
   - general code reability: naming functions lfirst() vs linitial(), say
 - what channels for feedback for specific subsystems?
   - good question... :D
   - which core member to sacrifice to find out info?
   - when they are aware of the pain that extensions developers are going through, they can be making different choices
 - want to get away with less tricky things in the future
   - no hook for when the xact is committed and when visible to other backends
   - dirty tricks are a measure of the level of the pain
 - missing extension API:
   - (timescale wishlist)
     - partitioning
     - extending relation types to be able to have things
   - partition boundaries data types and end points
   - background workers:
     - some issues with associating per database
     - can't work across cluster/can't "reconnect"
     - want better ways to communicate/see boundaries
     - using SPI interface is jankety; not everything is properly setup

Including/Excluding Extensions in Core

  • Participants
    • Keith F. (Crunchy Data)
    • Yurii R. (Omnigres Corp.)
    • Alexandra W.
    • Abigale K. (TileDB)
    • Matt B. (Evently)
    • David C. (Crunchy Data)
    • Chen H. (AWS)
    • Pierre Du. (Entr'ouvert)
    • Mats K. (Timescale)
    • Ruohang F.
    • Yugo N. (SRA OSS LLC)
    • David W. (Tembo)
    • Atsushi T. (NTTDATA Group)
    • Andreas S.
    • Jeremy S. (AWS)

Should an extension be brought into core?

   When/Why?
       * Not necessarily bring the extension into core raw. Bring the functionality that makes the extension irrelevant
       * Make things easier for users so they don't have to search for third party solutions
       * "just make it an extension" - core might never implement that extension's feature simply because it exists as an extension
       
   Why not?
       * Need more maintainers in core for those features
       * Faster development cycle
       * Developing extensions encourages more third-party extensibility in Postgres (better core API/ABI)
       

Should contrib just be brought into core?

   Why?
       * Tests interfaces and API
           * Why not just make a test for it then vs keeping a contrib module?
           * Someone from this meeting may look into doing a patch to move some of these tests out of contrib to testing modules
   Which ones?
       * Example: Why is pg_stat_statements not in core? Legacy reasons of resources isn't quite as relevant anymore. Make it like track_io_timing that it's just a switch.
       * Range type is in core but needs b-treegist indexes it needs is in contrib. Why?
       * Some may still have dependencies on perl. Definitely do not want to require perl in core PG. Rewrite in C like pgbackrest did
       * Proposal to bring the extensions listed below into core as a first step. Been attending PGCon for years and constantly heard that people want to get rid of contrib but very little work has really been done to do that. Maybe this can get the snowball rolling
       * These extensions are often referenced for common maintenance operations and have no shared_preload_library requirements. Why can't their definitions simply just be brought into the catalog with proper default permissions set.
       
       amcheck
       pageinspect
       pg_buffercache
       pg_freespacemap
       pg_visibility
       pg_walinspect
       pgstattuple
  • Most metadata (control file, update scripts, ...) should be in catalog
    • Libraries still on disk
    • Even extension catalog can be in catalog
    • Old model:
      • Different roles (DBA, Sysadmin, Packager)
    • Multiple directories on disk for files (Second dir)
      • Or make it a tarball
  • Multiple versions
    • Multiple extension versions in the same database (cluster)
    • Upgrade from one version to another
    • Nested namespaces solve this problem
      • Very common extension names (core, contrib) dont need a namespace prefix

Potential core changes for extensions, namespaces, etc

We discussed the organization/layout of the physical files of an extension, and what changes we could make to core extension support going forward. Examples of alternate approaches include SQL-only scripts being stored in a catalog (so not on the filesystem itself), or alternate distribution methods of such extensions other than basic packaging (consider a registry for "safe" extensions which could be loaded directly into a database without requiring administrator action).

Other core alternatives/use cases include building read-only images of postgres (docker, etc) while still allowing for user-provided extensions by having a second extensions directory GUC that would point to an additional place the database files can live other than read-only mount points for share, etc. While this could be seen as a more short-term fix, it could still be considerably useful.

We spent a considerable amount of time also discussing supporting multiple versions of extensions in the same database. This would largely require at the very least require hierarchical namespaces to be able to support multiple database objects beign found in different search_paths.

Also discussed: namespaces and collisions; it would be nice to support extensions that share a common name with differing user/company namespaces, so could install hydra/columnar or citus/columnar with the same extension name, but fundamentally different extensions. Specific schemes for this were discussed. The problem of namespace collision effectively comes down to issues with modules that cannot be relocated.