Extension Ecosystem: Jobs and Tools

From PostgreSQL wiki
Jump to navigationJump to search

Extension Ecosystem: Jobs and Tools

This document sketches an idealized Postgres extension ecosystem.

Why?

The past year has seen a surge in interest in Postgres extension distribution. A number of people have noted in particular the challenges in finding and installing extensions.

PGXN, released way back in 2011, aimed to be the canonical registry for the community, but currently indexes only the 350 distributions that developers have made the effort to publish --- perhaps a third of all [known extensions][extension-gist]

Furthermore, PGXN distributes source code packages, requiring developers to download, compile, install, and test them. Binary packagers and cloud providers have added a limited number of extensions to their registries, such as apt.postgresql.org, yum.postgresql.org, and the like, but neither PGXN nor the pgxn client provide integration into such tooling.

As a result of these challenges, a number of new entrants have emerged in recent months, all aiming to be the source of record for extensions and a focus on discovery and ease of installation. These include dbdev, trunk, pgxman, pgpm. They too exhibit limitations, such as minimal OS and platform availability and, notably, the manual process to add extensions, further constraining inclusion. None has even reached the number provided by PGXN.

These challenges and the interest and energy put into exploring new solutions make clear that the time has come to revisit the whole idea of the PostgreSQL extension ecosystem: to work though the jobs to be done, specify the tools to do those jobs, and outline a plan for the broader Postgres community to design and build them.

🎬 Let's get started.

Jobs to be Done

Theodore Levitt cites a famous adage:

"Last year 1 million quarter-inch drills were sold," Leo McGivena once said, "not because people wanted quarter-inch drills but because they wanted quarter-inch holes."

People don't buy products; they buy the expectation of benefits.

Today the most cited descriptor underpinning Jobs to be Done Theory, thanks to Clayton Christensen, we bear it in mind to think through the "jobs" of an idealized Postgres extension ecosystem --- without reference to existing Postgres solutions. However, we cite examples for the jobs from other projects and communities, both to clarify the jobs to be done and to reference prior art from which to borrow.

So what are the jobs to be done by the ideal Postgres extension ecosystem?

Authoritative

Job: Be the official, canonical source of record for all Postgres extensions

Developers need to know the one place to reliably release and find all publicly-available Postgres extensions. It can't be a subset, but must be comprehensive, the assumed default recognized and used by the community and included in the Postgres documentation.

Exemplars:

Beneficial

Job: It's expected for extension developers to develop and publish their extensions in the ecosystem.

Extension developers want their extensions to be a part of the ecosystem, to gain all the advantages and privileges of participation, without undo effort.

Publishing an extension should be as straightforward as possible. Ideally it simply requires maintaining a metadata file and publishing releases to the authoritative index.

Exemplars:

Integrated

Job: It's easy for developers to start writing and publishing extensions.

Extension developers need excellent tooling to simplify the process of creating, maintaining, testing, and distributing extensions. Such tools should be well-integrated into the extension ecosystem, and empower developers to get started and easily follow best practices throughout the extension development lifecycle: from beginning development, managing dependencies and other metadata, testing it, and releasing it.

Exemplars:

Discoverable

Job: Make extensions easy to find and understand

There's no use in being the canonical index of available extensions if no one can find them. There needs to be a well-known web site where people find extensions, learn their purposes, distinguish them from each other, and read their docs. Ideally most people use integrated search rather than a search engine, but extension pages should appear high up in search engine results, too.

Exemplars:

Informative

Job: Make extension documentation consistent, standardized, and comprehensive

Extensions are only as easy to use as their documentation enables. It's therefore important to provide well-organized, standardized, and accessible documentation --- and that it be expedient for developers to document their extension packages. Ideally development tools encourage the inclusion of documentation in an extension. With standardized documentation, it's easy to publish docs in multiple formats, especially on the canonical discovery site.

Exemplars:

Programmable

Job: Provide stable, easy-to-use, and comprehensive APIs with which downstream developers can build other tools and products

Provide excellent, well-documented, intuitive web APIs that allow anyone to build useful tools and services on the index. The jobs below require it, but creative coders will invent and mashup APIs in ways we haven't thought of. Web APIs enable imaginative solutions within the ecosystem and without. Some example APIs:

  • Recent releases feeds
  • Extension metadata
  • Full text search
  • Event webhooks to notify registered services

Exemplars:

Installable

Job: Provide automatic binary packaging for wide variety of platforms

It must be as easy as possible for users to install all indexed extensions. Ideally, independent services consume the index APIs to be notified of new releases and automatically build binary packages for as wide a variety of OSes, OS versions, architectures (arm64, amd64, etc.), and PostgreSQL versions as possible.

Some example repositories to support or enable to build and distribute extensions from the core registry:

Exemplars: ❓❓

Trusted

Job: Validate extensions and protect from supply chain vulnerabilities

Provide tools and features to earn and retain the user trust by ensuring that that extensions indexed by the canonical registry have been validated, come from trustworthy sources, and are unlikely to be a vector for supply chain attacks. This means some combination of author validation (e.g. badging projects from well-known sources), checksums, public key signing, and perhaps static code analysis.

Exemplars:

Manageable

Job: Provide intuitive, powerful interfaces for installing and managing extensions

Provide applications to manage the extensions in one or more Postgres clusters. At a minimum provide a command line interface that manages extensions for a locally-installed Postgres --- as currently required for it to install packages, for example.8

Users need to easily manage their Postgres extensions in a single, consistent manner. The extensions manager should be aware of installed extensions, available extensions, dependency graphs, and known vulnerabilities; allow for search (via the APIs); and of course install or upgrade extensions.

Exemplars:

Tools

Tools fulfill the jobs to be done. Or rather categories of tools, since some may be deployed a single, well-defined service, while others denote a type of service or tool that may be implemented by multiple literal apps (many of which we have yet to dream up!).

Core Registry

Jobs:

The core registry is the single, canonical service that indexes Postgres extensions (ideally replacing hacks like this). It provides the APIs for fetching the complete index, individual extension metadata, release notifications, and more.

Some requirements to get to that point:

  • A community-defined and maintained metadata standard to enable indexing, building, and installing extensions. Contents may include, among other things:
    • PGXN metadata
    • control file metadata
    • Binary dependencies (configure, build, test, and runtime)
    • Unpackaged dependencies (Python packages, Perl modules, Ruby gems, Java JARs, etc.)
    • Platform requirements (OS, architecture)
    • Conflicts
    • Required configurations (e.g., "must be in shared_preload", "requires restart", etc.)
    • New metadata, e.g., TLE-enabled, extension type, etc.
    • Other useful metadata pioneered by the pgxman buildkit, trunk.toml, and other registries
  • A versioning standard, such as SemVer, to recognize new releases and index them appropriately and in meaningful order
  • An extension registration and hosting pattern; options to consider:3
    • Automatically index releases hosted on and namespaced to GitHub, BitBucket, GitLab, etc., akin to the Go Module Index
    • Provide developer registration and host all indexed extensions internally, with extension names defining namespaces; akin to PGXN, crates.io, PAUSE, RubyGems, and PyPI
  • Curation and ownership: always use the distribution metadata provided by the developer, but consider generating default metadata for extensions that don't provide the metadata. But allow it to be superseded by the extension owner if they later add it.

Trusted, registered clients provide metadata and can use the index for additional use cases, such as:

  • Incrementing download counts
  • URI templates for additional links in the discovery site (e.g., binary downloads)
  • Registering a webhook to be notified of new releases or other events

Notionally the Core Registry is a single service, but in reality it may encompass multiple services. Some examples:5

  • Extension discovery
  • Event notification (webhook queuing and dispatch)
  • User management4 (registration, permissions)
  • Release management (indexing, namespacing)
  • Feed service
  • Sum service (auditable checksum database which will be used by the go command to authenticate modules)

Prior Art:

API Service

Jobs:

A service to provide all the web APIs for the index and search site (in fact, the search site itself should be a thin wrapper around this API). This would be the canonical programmable resource for developers building services and mashups with Postgres extension data. It might also provide an API to fetch clean HTML rendered from extension documentation.

All versions of extensions will be present and queryable via the API.

Over time it will expand to support additional features, such as:

  • Metrics collection from trusted third parties: download counts, test, build, install successes and failures, etc.
  • Repository metadata (stars, commit count, last update, etc.)
  • Version comparisons (diffs)
  • User comments and/or reviews
  • Categorization/Tagging/Bundling

Prior Art:

Search Site

Jobs:

A web site that presents a compelling interface over the API, providing effective search, categorization, standardized presentation of docs, and links to other resources. Provides extension home pages generally linked by third party search engines in the first few results when searching for extension names.6

The web site is likely to be the most feature rich service; many features can be added over time to the API and rendered here, such as detailed metadata, version history, user reviews, user-provided documentation, bundling, etc.

Required features:

  • Search engine indexing (Google, Duck Duck Go, Kari, etc.)
  • Full text search akin to pkg.go.dev and crates.io
  • Release home pages
  • User home pages
  • Beautifully-formatted extension documentation
  • Extension categorization
  • User feedback and reviews
  • Metrics for downloads, usage, most visited extensions, build/test/install failures, vulnerabilities, etc.
  • User management UX (registration, user profile)
  • Release management UX (ownership, permissions, archiving)

Prior Art:

Packaging

Jobs:

Downstream services that consume from the core registry to automatically, download and verify source code, then build and sign binary distributions. Initial targets include apt.postgresql.org, yum.postgresql.org, and packaging required for Tembo.

Such a service can be manually managed in the short term, as it is today: a curated list of extensions with hand-edited specifications (RPM spec files, Apt files, Homebrew Bottles, Chocolatey packages, etc.).

But work toward the ideal end state, where community packaging repositories support all major platforms, OSes, PostgreSQL, and major versions of all, automatically built from extension metadata provided by the core registry upon release. The flow would look something like this:

  • Developers release versioned extension source files and metadata to the core registry, which notifies trusted downstream services
  • Such trusted services install dependencies documented in the metadata, then download, verify, build, test, sign, and upload to apt.postgresql.org, yum.postgresql.org, Trunk, pgxman, etc.

A complementary service provides an API to represent each release and binary packaging distribution, allowing clients to query for the appropriate packaging repository for a given OS, architecture, and Postgres cluster (as well as source distributions to be compiled into non-packaged PostgreSQL clusters).

Name Mapping Service

Jobs:

Automated packaging likely requires some sort of canonical name mapping service, so an extension author can just require, say, libxml2 and the automated packaging system can find out what things need to be installed to meet that canonical name on each OS and version.

Prior Art:

  • ❓❓

Installation

Jobs:

An SDK and CLI and perhaps other interfaces that use the API and Packaging services to quickly and easily verify and install extensions on supported platforms, architectures, and Postgres versions. Should also support source installation (perhaps borrowing from the packaging service's techniques).

Ideally implemented as an SDK so that thin wrappers can be provided for various environments: A CLI, a native Mac or Windows app, a web app, etc.

  • SDK to build interfaces for downloading and verify binaries from packaging services --- or source from canonical index and compiling --- and installing1 on wide variety of OSes (and architectures) and Postgres versions
  • CLI for supported OSes (Linux, MacOS, Windows, *BSD) and architectures (amd64, arm64, i386, etc.), with an extensible architecture to integrate development features and allow third parties to quickly features
  • Native apps or web apps?
  • Catalog of extensions for specific Postgres clusters, so the installer/extension2 manager knows what is installed

Prior Art:

Development

Jobs:

An SDK and CLI and interfaces that use the API to manage the extension development lifecycle, from starting an extension, managing dependencies and other metadata, testing it, and releasing it to the core registry.

Ideally an SDK and CLI that complements or, better still, extends the management CLI by adding the appropriate subcommands. Features:

  • Initialize new extension project, scaffolding the essential files, metadata, and community-recommended project organization.
  • Commands to edit metadata, eventually format and lint source files and docs, add new files (for upgrades, tests, etc.), and run tests.
  • Commands to bundle, sign, and publish a release.
  • Continuous integration tools to enable testing on multiple platforms, OSes, and Postgres versions.
  • Continuous deployment tools to bundle and release an extension.

Prior Art:

Other Tool Ideas

Brainstorming other tool ideas, to be developed by anyone with the itch to scratch.

Testing

Downstream service that consumes from the core index and automatically tests extensions on all major platforms, OSes, PostgreSQL, and major versions of all. May be part of the ideal end state for packaging, since there's do much overleap? Provides a matrix for each extension7 as data and perhaps a web page linked from the search site. Perl example.

Tooling

Editor plugins or a language server to support extension auto-completion.

Certification of Validation

A service similar to Testing but focused on additional quality metrics, such as build, test, and install successes and failures; static code analysis warnings and errors; vulnerabilities, vulnerable dependencies, coding errors, etc. Reports issues back to the API service to appear on site extension home pages.

Questions

Given the efforts made toward building extension Postgres distribution services and tools in the past, not to mention the ambitions of some the project.

  • What will PGXN's role be in this ideal extension ecosystem?
  • What will be the role of other extension registries?
  • What should be the format of standard metadata, and what will become of existing metadata (control file, PGXN META.json, Trunk.toml, pgxman extension.yaml, pgrx Cargo.toml)?
  • What should be the source of record for identity? The PostgreSQL community account might be ideal
  • Do we want to auto-discover and -index extensions akin to how Go package discovery? If not, how can we make participation as simple as possible? Includes metadata management, identity and authentication, publishing and permissions.
  • Auto-discovery would also raise the question of namespacing. How would we tie discovered extensions like github.com/theory/pg-semver to Postgres extension naming conventions? If we just allow the last part of the name, how would we deal with naming conflicts?
  • For new components, what language or languages should we use? Defer to whatever each contributor is comfortable with? Establish some community standards to increase credibility, and attract developers, and/or simplify deployment?
  • Who's going to do what and how can be best collaborate? Do we nail down the core registry and go from there? Something else?

Once we have some answers or move in the direction of solid answers to these (and surely other!) questions, we can start to build a project plan.


Footnotes

1

Do permissions need thinking through?

2

Integrate with Postgres core config or make a separate thing?

3

This really is the key decision to make. Go-style auto-discovery is super cool and ensures comprehensive indexing without the overhead of developer registration or source storage. On the other hand, it requires Go package-style $host/$user/$package namespacing to avoid naming Conflicts between hosting services. On the other other hand, it directly allows for alternative and private source distribution. But see also the explanation for why crates.io doesn't use GitHub for its registry.

4

We'd also need to consider authentication and identity management. Do we trust relevant OAuth2 providers like GitHub and BitBucket? Or perhaps we require use of the postgresql.org account?

5

Inspired by Go registry services, among others.

6

This will be an interesting challenge, as we don't want to overdo SEO (which is gross and usually wrong), but need some method to get results higher in the search results. Maybe best to host it on PostgreSQL.org and reach out to search engine companies?

7

Years ago someone actually implemented this! The service consumed from PGXN and tested extensions on a matrix of Postgres versions and OSes and made a very nice report. I'd meant to work with them to standardize it, host it on a PGXN subdomain, and link from each page on PGXN, but didn't get around to it and the project stalled. Will try to dig it up and see if we can build on it.

8

Would be interesting to provide some method for bundling extensions, so that one could, say, build a bundle in a CI/CD pipeline and not need the CLI to install it on a production host. But on the other hand, many will do this via Docker builds or similar patterns anyway.