User:Gurjeet

About

See http://Gurje.et

Patch Reviews

Patches that I have, or am, reviewing Patch Reviews.

Projects

Completed

In Development

Mark a transaction (or savepoint) uncommittable

Many a times we start a transaction that we never intend to commit; e.g testing, EXPLAIN ANALYZE, etc.

Allow the user to mark a transaction that can never be committed.

Allow such marker to be placed when the transaction is being started, or while it's in progress.

Once marked uncommittable, do not allow the mark to be removed.

Any command that ends a transaction (END, COMMIT, ROLLBACK) must result in a rollback.

All of these apply to savepoints, too.

Named Operators

The operators in Postgres can get very cryptic; guess what does <@ mean?

The proposal is to allow any identifier to be used as a name for an operator.

For example, the named operator :contained_in: can be used to represent the tsquery <@ tsquery operator.

See this comment on HN for at least one person's desire to identify an operator by name.

Patch submitted. See discussion here

PG Plan Guarantee

Websockets
See also HTTP API

Add support for RISC-V memory-barriers
Actively in the process of acquiring a SiFive HiFive Unmatched-RISCV Development Board

Active as of 2022-08-10: See https://www.ebay.com/itm/385053303589

Won auction, waiting for delivery (2022-08-16)

Received, unpacked. (2022-08-23)

Used this video for guidance on which components to buy.

Ordered a PSU (2022-08-30); other items, such as SSD and WiFi are optional, so will buy those after a boot-up test.

Received the PSU (2022-09-01). Bootup test completed. Logged-in via serial console, and can verify Linux working. See pictures

Ordered 1TB SSD. (2022-09-02)

Ordered quieter CPU fans, and Mini-ITX enclosure. (2022-09-03)

Received SSD, and CPU fans. (2022-09-05)

2022-09-06:

Received Mini-ITX enclosure.

Assembled all the components into Mini-ITX enclosure; took about 3 hours.

Booted up again from the flash card, and observed boot messages over the serial-to-usb connection.

Created partition on the SSD.

Ordered Intel WiFi card (2022/09/07). See https://a.co/a09xLOq

Intel WiFi card arrived; installed, connected to Home WiFi (2022/09/09)

Latest Linux distro to install: Ubuntu. See https://ubuntu.com/download/risc-v

TODO: Install Ubuntu 22.04.1 on SSD. See https://wiki.ubuntu.com/RISC-V

TODO: Install Nix package manager (to support various compilers simultaneously)

TODO: Set up Postgres buildfarm

TODO: Register as a new animal in the Buldfarm

TODO: Add RISC-V to supported platforms

Future / Potential / Interested

Use external/remote cache server to increase Postgres effective shared_buffers

Would be helpful to serve read-only workloads

Will improve performance in cases where storage is orders of magnitude slower than network-based caching servers

Modify core Postgres to provide a hooks to fetch a block from storage and discard a block

Implement this feature as an extension that implements the above hooks

Populate the cache on every block eviction

Read the block from remote-cache on local cache-miss

Be resilient to network failures and delays; use async requests with timeout

Use pgbench-readonly workload to compare performance

Allow functions to be defined as CTEs, as requested here:

https://news.ycombinator.com/item?id=42861686

Check if Postgres' UTF-8 implementation suffers from any of the concerns mentioned here:

https://en.wikipedia.org/wiki/UTF-8#Invalid_sequences_and_error_handling

SQL Pipe syntax

SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL

Paper: https://storage.googleapis.com/gweb-research2023-media/pubtools/1004848.pdf

HN Discussion: https://news.ycombinator.com/item?id=41338877

Spitballing: If the command starts with keyword FROM, then use a separate parser, that emits the same ParseTree

Division operator for (interval / interval => double precision)

Proposal: https://www.postgresql.org/message-id/CABwTF4W7x%3Dq7tPrhPa6cjZwstDk2d3oYnTQydoc6%3DYvCTZzY_w%40mail.gmail.com

Allow cloning / create-database-from-template while source database has active connections.

Initial version may allow only read-only connections

Eventually allow read-write operations while the template/source database is being copied

Changing contents of the source database (INSERT, UPDATE, VACUUM, etc.) are the main problem.

Possibly use redo logs of the source database to patch-up the destination

Allow users to attach and detach database

drop database X export to '/dest/path';

maybe just `export database X to '/dest/path'`, without dropping the source DB

may need to block all physical write operations

drop the database, but don't delete the database directory

freeze all relations in the database, to mark all committed rows always visible

alternatively, let the import command do this work

hence less work for a presumably busy exporting database

importing instance will need the transaction snapshot information to determine which transaction IDs had committed

move the directory to given path in the filesystem

create database X import from '/source/path';

Move the folder into $PGDATA

Give it a new OID, just like a normal `create database` does.

Verify that all relations are frozen.

Allow setting limits on how much of shared_buffers each database can populate

Allow setting this limit as memory size (MB, GB, etc.) or a percent of shared_buffers

In multi-tenant environments, this will help limit a single database from hogging the whole shared_buffers.

Make VACUUM FULL purge remnants of dropped columns

The 'limits' section, linked and quoted below, says dropped columns continue to occupy space

Make 'vacuum full', perhaps with an additional flag, remove all remnants of the column

Remove the column from catalog tables, and compress the bitmap in every row to reflect this removal.

Limits section: https://www.postgresql.org/docs/current/limits.html

> Columns that have been dropped from the table also contribute to the maximum column limit. Moreover,

> although the dropped column values for newly created tuples are internally marked as null in the

> tuple's null bitmap, the null bitmap also occupies space.

Implement traffic mirroring in pgbouncer (and other connection poolers)

For many tests you need an accurate replication of production traffic

Implement a mirroring feature in the pooler that sends identical traffic to the mirror

By default, throw away _all_ the responses from the mirror.

To hear about a real-world use case for such a need, see the following presentation

Why Machine Learning for Automatically Optimizing Databases Doesn't Work by Andy Pavlo

@ time 21:57 https://youtu.be/-p0fX1rsoxY?t=1317

Change psql's \! command to perform variable interpolation

\! currently passes all its input to a subshell, without any variable interpolation

Allowing variable interpolation will allow users to use query-results to perform interesting operations in subshell

Introduce a new boolean psql variable, off by default, that enables this new behaviour

Add psql command-line option to start without attempting connection

Allows the user to use psql's features without first connecting to a database

Name the option --no-connect

Add shell-like suspend command to psql

Keyboard sequence ctrl-Z suspends psql, and returns control to invoking shell

OS shells, like Bash, have a command `suspend` to do the same thing

Exposing \suspend as a scriptable command allows users to create complex interactions with the invoking shell

e.g. The shell script can expect psql to suspend, and do some stuff before executing `fg` to resume psql's operations.

Create a built-in enum guc_context

It would help in queries that want to compare enums relative to each other

E.g. see last query in this blog post: Understanding Postgres Parameter Context

Raw Export and Import of Tables

Support unplugging and plugging-in of tables across DB Instances

All current interfaces to export and import a table are very slow

SQL (SELECT - INSERT), COPY (OUT - IN)

These are slow since they need to traverse the stack, and sometimes network roundtrip to client

Fastest method would be to simply copy the table segments

But that can't be done since the row-headers include MVCC info

So perform the following trickery, to raw-export a table

Copy all file segments of a table to make a duplicate table

Run VACUUM FREEZE on the duplicate table

Move the segments of the duplicate table to the export location (outside $PGDATA)

Place an appropriate CREATE TABLE command in a sibling file.

And perform the following to raw-import the table into the target DB Instance

Create the table; Use the CREATE TABLE command placed by the exporting DB Instance

Move the table segments from the staging area to $PGDATA/base/nnn/, with appropriate names

Support only individual tables, not complete hierarchies of partitions, etc.

The exported CREATE TABLE command should contain only the physical attributes of the stored data

Specifically, should not create any indexes, foreign keys, etc.

Export the associated TOAST table, as well.

Perform these operations in transactions, so that concurrent transactions don't see any effects

The exporting transaction must rollback

The importing transaction must commit

Integrate Linux landlock in Postgres

A different approach was tried and rejected

https://www.postgresql.org/message-id/flat/CAMT0RQRX9j69GnrjJ2tkdy7zN_yofXUvU7iCOGdZE%3DjYiqX3wg%40mail.gmail.com

A different feature, but restricting filesystem access was one of the eventual goals

https://www.postgresql.org/message-id/flat/20220520225619.GA876272%40nathanxps13

pg_replslotdata; alternate name: pg_repslotinfo

Parallel Recovery

Make autovacuum_max_workers changeable on server reload

This is useful for autovacuuming customers' databases @work.

Eventually Consistent (EC) Tables

EC Tables bring some of the advantages of NoSQL databases to the Postgres users.

Users keeps their main table simple, with just a PKey

Hence they get fast responses to SQL commands

User creates an EC Table based off a normal table

System creates a table identical in structure to the original table
System creates a trigger on the EC Table to reject any direct DML commands.
?? Does the system recreate the partitions, too, if the original table is partitioned? No

No, for now. The intended goal is to keep the main table fast; partitioned main tables are not fast.

A Background (BG) process hooks into the WAL stream
The BG Process replays the changes of main table on the EC Table.

It considers WAL records only from successful transactions

Hence the 'eventual' in the feature's name.

It converts WAL records into DML commands, and applies them to the EC Table

User can create additional indexes on the the EC Table to speed up commands on that table.
Potential problems

What to do when a DDL command changes the main table?

Advantages

The commands and queries that rely on PKey access are the fastest they can be.
The commands that can use slightly stale data can use the EC Table, and its indexes

Disadvantages

At least double the space consumption

No different than how NoSQL databases deal with it

Long running transactions can fill up WAL storage
Possible improvements that can help

Index-only Tables
EC Tables only on a replica.

tz_name data-type
Before: functions accept a text param and then try to convert it to timezone name/abbreviation

After: functions can simply declare they accept a tz_name, and the type-I/O functions will do the validation.

Example: make_timestamptz_at_timezone function accepts 'text' and then converts it into TZ-name

pl/wasm
Postgres extension to run WASM code

plpgsql-to-WASM
Convert plpgsql to WASM, for speed, while retaining safety.

pg_wasm
Postgres compiled to WebAssembly; to potentially run in Web Browsers

emPostgres
Postgres as a library, embedded in other applications

csvPostgres
emPostgres embedded in an application, with file_fdw enabled, to process CSV files using SQL.

It could be an alternative to Q.

See also inline-sql for Python

file.postgres
Data directory in a single-file (think: SQLite replacement, when combined with emPostgres)

PGLite
emPostgres and file.postgres, combined, will provide a SQLite alternative.

Muheam - Multi-headed Monster

Export/Import
Enable copying relations as files between Postgres Instances

SQL - Engine Support
List of SQL features and support level in various databases; a. la. C++ compiler support

Consistent read-after-write on replica
Enable read-after-write consistency when using read-only replicas for read-scalability

Notify-XID + libpq + backend-wait-for-XID

- Application writes to primary

- Application gets XID upon commit

- Applciation sends that XID to start new transaction on replica

- Replica backend waits until that XID is applied

Copy-on-Write system/object snapshots
Similar to how Filesystem (e.g. ZFS) snapshots work
Contribute to PostgresML
Optimized Mode: Chained, One-shot Transactions
In this mode of server operation, only one transaction is allowed to run at a time, theoretically.

This allows the transaction to run without taking any locks, hence improving latency.

The transaction must be fully available to the server; interactivity not allowed; savepoints not allowed.

For better throughput, multiple transactions run at the same time.

Each transaction knows exactly which transaction is ahead of it.

A transaction cannot be committed before its predecessor transaction.

Optimistically, each transaction assumes the transaction ahead of it is going to COMMIT; and _not_ ROLLBACK.

Based on that optimism

* the transaction ignores row-versions deleted by its precedecessor, and

* considers the new row-versions created by the predecessor as committed

- Problem: the predecessor may create many row-versions of a row;

- How to identify which is the last row-version, so we can ignore the non-last row-versions.

- This is currently a blocker.

If a transaction fails (on ERROR, etc.), then all following transactions that are in-progress also fail.

- This is the cost of optimistic transactions.

Any transaction that starts after such failure, starts a fresh list/chain of transactions.

When done with a row, a transaction passes the row on to its follower transaction.

- This may be our antidote to the problem of identifying the non-last row-version.

- A transaction _must_ only pass a row's last row-version to the follower transaction.

- ... it _must_not_ pass any non-last row-versions of a row to the follower transaction.

If not interested in a relation, the transaction must pass the whole relation to the next transaction.

- This would help increase throughput, since the transactions working on unrelated sets of relations may proceed in parallel.

As an optimization, a predecessor transaction would pass the negative-list of relations

- Any relations not in the list are freely available to the follower to operate on.

- The follower may add its own relations to the negative-list, and pass on the list to its follower.

User:Gurjeet

Contents

About

Patch Reviews

Projects

Completed

In Development

Future / Potential / Interested

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Tools

Search