Skytools Specific Consumers

From PostgreSQL wiki

Revision as of 21:26, 18 May 2012 by Boshomi (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

PGQ is the foundation of the londiste replication solution, and can be used to easily write more specific consumers. Here's the description of what you'll find included in the Skytools package.

Contents

bulk_loader

bulk_loader is PgQ consumer that reads url encoded records from source queue and writes them into tables according to configuration file. It is targeted to slow databases that cannot handle applying each row as separate statement. Originally written for BizgresMPP/greenplumDB which have very high per-statement overhead, but can also be used to load regular PostgreSQL database that cannot manage regular replication.

Behavior properties:

  • reads urlencoded "logutriga" records.
  • does not do partitioning, but allows optionally redirect table events.
  • does not keep event order.
  • always loads data with COPY, either directly to main table (INSERTs) or to temp tables (UPDATE/COPY) then applies from there.

Events are usually produced by pgq.logutriga(). Logutriga adds all the data of the record into the event (also in case of updates and deletes).

cube_dispatcher

cube_dispatcher is PgQ consumer that reads url encoded records from source queue and writes them into partitioned tables according to configuration file. Used to prepare data for business intelligence. Name of the table is read from producer field in event. Batch creation time is used for partitioning. All records created in same day will go into same table partition. If partition does not exist cube dispatcher will create it according to template.

Events are usually produced by pgq.logutriga(). Logutriga adds all the data of the record into the event (also in case of updates and deletes).

cube_dispatcher can be used in to modes:

keep_all
keeps all the data that comes in. If record is updated several times during one day then table partition for that day will contain several instances of that record.
keep_latest
only last instance of each record is kept for each day. That also means that all tables must have primary keys so cube dispatcher can delete previous versions of records before inserting new data.

queue_mover

queue_mover is PgQ consumer that transports events from source queue into target queue. One use case is when events are produced in several databases then queue_mover is used to consolidate these events into single queue that can then be processed by consumers who need to handle theses events. For example in case of patitioned databases it's convenient to move events from each partition into one central queue database and then process them there. That way configuration and dependancies of partiton databases are simpler and more robust. Another use case is to move events from OLTP database to batch processing server.

Transactionality: events will be inserted as one transaction on target side. That means only batch_id needs to be tracked on target side.

queue_spliter

queue_splitter is PgQ consumer that transports events from source queue into several target queues. ev_extra1 field in each event shows into which target queue it must go. (pgq.logutriga() puts there the table name.)

One use case is to move events from OLTP database to batch processing server. By using queue spliter it is possible to move all kinds of events for batch processing with one consumer thus keeping OLTP database less crowded.

table_dispatcher

table_dispatcher is PgQ consumer that reads url encoded records from source queue and writes them into partitioned tables according to configuration file. Used to partition data. For example change log's that need to kept online only shortly can be written to daily tables and then dropped as they become irrelevant. Also allows to select which columns have to be written into target database Creates target tables according to configuration file as needed.

Personal tools