PGConf.ASIA2016 Developer Unconference

From PostgreSQL wiki
Jump to navigationJump to search

An Unconference-style event for active PostgreSQL developers will be held in the afternoon of 1 Dec, 2016 at Akihabara Conference Center, as part of PGConf.ASIA 2016.

This Unconference will be focused on technical PostgreSQL development discussions ranging from Clustering and replication to the infrastructure.

Unconference Time Table

Time Room 1 Room 2
1 Dec 12:00-13:00 Theme proposal and registration


1 Dec 13:00-13:30 Arrangement & theme decision
1 Dec 13:30-14:20 Recovery.conf In-memory Columnar Store
1 Dec 14:30-15:20 Commit Fest app Improvement Variable length fields over 1GB
1 Dec 15:30-16:20 Pgpool-II & Clustering Error code
1 Dec 16:30-17:20 FDW Sharding
1 Dec 17:45- Social event

Unconference minutes

Recovery.conf

In-memory Columnar Store

Commit Fest app Improvement

Variable length fields over 1GB (KaiGai)

Motivation: we want to give large ( >1GB in this context) data blocks to SQL functions, operators and so on. A typical use case is analytic workloads implemented by procedural language; like PL/CUDA which takes 2D-array as an alternative representation of matrix. Source of the problem is header format of varlena. The least bit of the first byte introduces the type of header; whether it is short (up to 126B) or long (up to 1GB). The 2nd bit introduces whether the long varlena is compressed or not. Thus, 30bits of 32bits are available to indicate length of the variable length fields. Its maximum length is 1GB.

xxxxxx00 4-byte length word, aligned, uncompressed data (up to 1G)
xxxxxx10 4-byte length word, aligned, *compressed* data (up to 1G)
00000001 1-byte length word, unaligned, TOAST pointer
xxxxxxx1 1-byte length word, unaligned, uncompressed data (up to 126b)

In this session, we discussed three ideas to support large variable length fields, especially, the second and third option below.

  • Enhancement of varlena header
    • Good: Flat data format is available.
    • Bad: We cannot expect 4B header is supplied even if it is purely on-memory structure.
  • Data type specific solution
    • Good: harmless to the existing code.
    • Bad: segment boundary around 1GB
    • AI: infrastructure enhancement to support type specific format (incl. indirect references) on toast/detoast.
  • Utilization of large object
    • Good: No infrastructure enhancement is needed.
    • Bad: User has to build large objects preliminary, thus, not convenient to construct a large matrix on the fly.

Through the discussion, overall consensus was type specific solution because most of data types are satisfied with current varlena limitation (<1GB). So, some of data types for specific workloads (like matrix?) will take special support for larger data size. For example, if there is a 8GB matrix we have split into 3x3 chunks, a special matrix data type will be able to have indirect reference to the 9 chunks. And functions/operators which support matrix can support these internal data structure. A few infrastructure enhancement are expected on toast/detoast routines because toast_save_datum() expects variable length field has its contents in a flat data structure. Likely, type specific callbacks are needed to serialize and deserialize the large flexible length field. KaiGai will take deeper investigation, then propose the idea to pgsql-hackers.

Pgpool-II & Clustering

Error code

FDW Sharding

Social Event

  • All attendees can join the social event.
  • Please receive 2 drink tickets at the entrance.
  • Venue
 * PRONTO IL BAR UDX
 * https://goo.gl/maps/do5XSrgvRbp

Notes

  • Registration is opened from 12.30pm
  • All the attendee will be invited to the social event

Reference