GDQIssues
Please add / re-prioritize talking points.
Current state
There 2 implementations that use the "snapshot-based grouping" technology for queueing: Slony and PgQ.
- Slony - queueing is not generic, usable only for Slony itself. Historically had problems in queueing logic, but should be OK now. But that may have given few people bad impression of the "queue tables".
- Pgq - queueing is available with generic API. Few people have heard about it, few of those are familiar with it. Original authors have no advertising budget. (quick overview, full docs)
There are few alternative ideas going around, but they don't seem to stand any scrutiny:
- use WAL for queue
- use Group Communication Systam for queue
- use non-transactional storage for queue.
There is at least one product that uses non-generic queue - Bucardo. They don't use snapshot-based grouping, thus cannot guarantee that batch does not change when repeatedly read. This is fine when you only signal "this PK had an event", but not for anything else.
Requirements
A Generic queue implementation should fulfill the following requirments:
- Useable by a number of postgresql based replication systems (Longdiste, Slony, Bucardo?)
- Not require the producer and the consumer to be running the same major version of postgresql
- Allow multiple readers
- Allow multiple writers (libpq clients)
- Must allow reader downtime, without event loss, and without affecting writers
- Stable batches - batch contents must not change between when read repeatedly. This is required for transactional processing in remote servers.
Non-issues
Here are topics that seem already decided. Opening them requires really good and practical arguments.
Josh: there's been less than a week of discussion on the GDQ. Seems early for major issues to be already decided.
Marko: Indeed. But having everything open is call for bike-shedding. The initial categorization was my personal opinion, based on existing queue implementation - PgQ. Thus requiring practical arguments for doing something differently than already working code seems like good idea. Deciding design without practical reasons seems impractical... In any case, the page seemed to initiated the discussion about the topic, let's make the page now reflect community consensus.
Transactional vs. non-transactional
Transactional. Otherwise it's unusable for replication and applications that do expect their events be persistent.
We could have an option of turning a transactional queue into non-transactional one, with Global Temp Tables (see below).
Designing for only non-transactional usage seems wrong. Also there are dedicated solutions around that do much better work at that.
Outside Postgres vs. inside
If you have more efficient transactional storage engine available than current Postgres one, please replace Postgres one...
I remember some arguments about writing over network because of "write overhead":
- Overloaded server. Additional queue write would kill it. This is plain case of "Doctor, it hurts".
- Mostly-write load (eg. some sort of transactional logging). Additional queue write would double the write load. Solution is simple: write directly to queue. But this solution can only be seen if you have generic (not limited to replication) queue available.
Josh: a suggestion would be a special type of table which can be fsync'd on write, or not per settings. This would allow us to eliminate the WAL overhead for the GDQ, a considerable savings. It would also support applications which don't care about retaining the queue after a restart.
Marko: yes, having option to turn transactional overhead off could be good idea.
Snapshot-based grouping vs. something else
No alternatives that can be taken seriously seem to exist.
Requirements:
- Multiple readers
- Multiple writes
- Writing is transactional
- Reading is transactional
- Must allow reader downtime, without event loss, and without affecting writers
- Stable batches - batch contents must not change between when read repeatedly. This is required for transactional processing in remote servers.
Poll vs. push
Poll.
In busy server there is always events available. No point in optimizing for unloaded server.
Josh: disagree. Polling introduces an automatic delay into all update processing, and means that near-synch replication is never possible, or integration into event-based systems. Also, in systems where part of the data is being replicated, updates may be sporadic but timing nonetheless critical.
Marko: you cannot have both high-throughput and low-latency queue, that also attempts to be transactional. 'Near-sync' replication seems to be argument for having different queue implementations around, for different usage scenarios.
Limited to replication or generic.
Generic.
This also makes the "modification trigger format" non-issue.
Itagaki: "Storing all data in text format" might not be the best solution for "Generic" queue. Do we need to consider "type-safe" generic solutions? For example, CREATE GDQ name (seq_id bigint, dml_type "char", pk pk_type, tup table_type)
.
Marko: You seem to be talking about the "User-defined event table structure" mentioned below. I would suggest against it, as it does not allow different event types in one queue. With current PgQ (fixed table format), we have per-event-type "interface tables", which are empty tables with proper structure and types, and with BEFORE trigger (pgq.logutriga) which formats and inserts the event into queue and then tells Postgres to skip the actual insert. Thus we can insert the event from plain SQL with INSERT into interface-table, with full type safety. With GDQ we could make creating such per-event-type interface tables easier, although it's quite easy already
Ordinary table vs. custom storage.
Table. If the custom storage takes away ability to do ordinary SELECT/INSERT/UPDATE/DELETE it would be too limiting.
Josh: how do you reconcile this with the desire to reduce write overhead for the queue?
Marko: The optional usage of Global Temp Table seems to be good balance. Anything more special, that also drops normal table access for efficiency, seems designing for very narrow use-case, while ignoring how it is supposed to developed, maintained, administrated and used... I seriously suggest not competing with dedicated, non-transactional, in-memory queuing solutions.
Custom access protocol vs. plain SQL
Plain SQL.
Eg. in PgQ we have plain PL/pgSQL function acting as queue consumer.
Losing such flexibility would be bad.
With or without middle-ware
Without.
With middle-ware here means the solution is tied to code that does event processing. [User code as middle-ware plugin.]
Without middle-ware means the solution makes events available for reading, does not decide how they are processed. [User code is SQL client.]
Potential issues
Here are few decisions that could go either way, if one is starting to write a queue from scratch.
No option seems clearly better than alternative.
But realistically, deciding differently then current PgQ has done means writing lot of code..
Fixed event table structure vs. user-defined.
Fixed format here means that event data is multiplexed into couple of pre-determined fields. Eg. PgQ:
- ev_type - short id which classifies the event.
- ev_data - urlencoded / JSON / XML container for event data.
- ev_extra1 - meta-data about event. Although there are several extraN fields in PgQ, one is enough, if it is defined to be extensible format (eg. urlencoded key-value pairs).
User-defined structure means free-form table definitions, except few fixed (or hidden) fields for queuing.
Fixed format
- Can be implemented without core changes
- Allows several event types in one queue.
- Allows generic frameworks that may want to use their own event types (Cascading in Skytools 3.0)
Customizable structure
- Academically nicer
- Plays badly with table rotation
- Plays badly with multiple event types in one queue
- Plays badly with generic frameworks.
- Requires core support for above?
Function-based API vs. special syntax
Special syntax (queue logic in core) is required for:
- Only way to implement working queue (Oracle)
- Only way to popularize new technology (Postgres?)
- Only way to have non-plain-table storage.
- Best way to allow customizable event table format.
Although the customizable event table format is implementable with functions-based API, there are various places where that will be clumsy. If that feature is needed, it may be better to implement it in core.
So unless implementer decides for customizable structure, or non-table storage, the queue-in-core is not needed.
Interesting Issues
What is the goal?
- Something shareable between Slony/Londiste, but also usable as generic queue?
- Compatibility with Oracle syntax?
- Minimal overhead at any price?
- Must be in core?
- Something perfect in any dimension? Probably involving pluggable backends?
- Something which will work with other databases and caching systems?
How about PgQ into /contrib?
Why not have PgQ or a cleaned up variant of it in /contrib?
Arguments against it would considerably clarify what people want from queue implementation. And if there are no good arguments, then let's do it.
Potential areas that could be simplified:
- "Event retry" functionality is very lightly tied to main queueing, it can be removed from base implementation and built on top of it.
- Internal table structure could be cleaned up.
Minimal core changes
Altough Slony and PgQ prove it's possible to implement queue with current core features, independent re-implementations are practically impossible because some parts of the logic are quite complex. We could move only the complex parts into core, thus making queue re-implementations easier.
WHERE txid_field BETWEEN snapshot1 AND snapshot2
Making such query perform is quite hard. Add the optimization logic into core.
Rotating tables
There is some trickery needed to make reading, writing and trunacting rotated tables work while avoiding locking problems.
Make that easier.
Global Temp Tables
GTT would be a table that:
- Visible to all, as ordinary table
- Data is written directly to table, skipping WAL writes
- No fsyncing is done
- Table is truncated on crash, perhaps on reboot too.
GTT would allow us to easily turn a transactional queue into non-transactional one, for people who want to trade event persistence for maximum efficiency.
The assumption is that sync-replication should also ignore such table.
Example of different queueing techniques
The above-mentioned snapshot-based grouping (SBG?) can used for both high-throughput and low-latency situations.
It is questionable, whether one implementation can be made tunable between low-latency and high-throughput. Which seems to hint that we may want different implementations for both cases.
SBG, high-throughput, transactional
The key is to have large batches. Separate process writes snapshots, readers poll for new ones and read them when they becoma available. Delay comes from 2 factors - the period which the snapshot-writer writes new batches and the period of polling from reader. We don't want to eliminate first one, because that is what makes processing large amount of data effective. Latter could be eliminated with listen/notify, but that does not seem to be good idea, because the with larger number of readers, it's bad idea to let them request reads at exact moment of time.
Plus:
- Reader downtime is not a problem
- Large number of events is not a problem.
Minus:
- Requires noticeable latency to minimize per-event overhead.
SBG, low-latency, transactional
The key is immediate batches. Reader writes the snapshot itself and immediately reads the events from this and previous snapshot. Compared to simply reading from table, the SBG allows stable batches and INSERT-only table.
Plus:
- Latency only depends on reader polling period.
- Latency can be eliminated with LISTEN/NOTIFY.
Minus:
- Reader downtime is not allowed.
- Large number of events could be a problem.
Non-transactional
Both of the above can be turned non-transactional with GTT.
When somebody whats even more efficient non-transactional queue, isn't it better to turn to dedicated solutions?