FreeBSD

From PostgreSQL wiki
Jump to navigationJump to search

The secret FreeBSD/PostgreSQL cabal. Mission: Make PostgreSQL + FreeBSD the best open source relational database stack. Please feel free to edit, add ideas, work on features, point out problems etc. Expect progress to be very slow, as this is most likely spare time work...

On the PostgreSQL side

  • DONE: Yugi Gu from ARM sent in a patch to teach PostgreSQL to use ARM CRC32 hardware, but it uses Linux-only interfaces to detect the CPU feature; let's fix that so that FreeBSD-on-ARM benefits too - Done!
  • DONE: The ability to use kern.ipc.shm_use_phys for a small gain was lost when we switched to anon shared mmap in 9.3 -- bring back the option to use SysV shared memory for the main buffer pool! Done as shared_memory_type=sysv, in PostgreSQL 12 (primarily for the benefit of AIX users but some FreeBSD users might care).
  • DONE: We should use kqueue -- patch exists but more testing would be good, including on other BSDs, to understand regressions reported by a couple of people (were they due to using Unix domain sockets?)
  • DONE: FreeBSD UFS (and NFS?) is slow when there is a striding pattern caused by parallel sequential scan across multiple processes. Fixed! (problem also affects at least Windows)

On the FreeBSD side

  • DONE: It sucks that you can't actually list the POSIX shm segments that PostgreSQL relies on so heavily. kib proposed posixshmcontrol and it has landed in 13.
  • DONE: Andres Freund proposed that we should use PR_SET_PDEATHSIG on Linux so that we can skip polling the postmaster pipe during busy recovery and rely on the kernel to give us a signal if the postmaster goes away. https://commitfest.postgresql.org/18/1617/ for the PostgreSQL side of that. We should add a similar feature to FreeBSD so it doesn't miss out on this optimisation. Done! As PROC_PDEATHSIG_CTL, released in FreeBSD 11.2.
  • DONE: PostgreSQL updates the process title so that you see eg SELECT, COMMIT etc in "top"; on Linux that's just writing into memory but on FreeBSD it's a sysctl() call (see setproctitle.c). That's really slow so people set update_process_title = off on busy PostgreSQL servers. That sucks. Could we find a way to make it just write to memory from userspace, while keeping some kind of sane concurrency semantics? patch posted to freebsd-hackers!, Done! as setproctitle_fast() in FreeBSD 12-CURRENT, https://commitfest.postgresql.org/19/1715/ for the PostgreSQL side
  • DONE: FreeBSD UFS was slow when there are interleaving reads and writes from two sequential streams. Fixed.
  • DONE: It would be really cool to get version information for collations from the standard library, so that we could log warnings when they change and your indexes might be borked (as we can already do if you use ICU collations). It would be ideal if all operating systems could support that via some standardised interface, but as a start, here's a proposal to add that to FreeBSD libc (and here is a thread about how to deal with that data on the PostgreSQL side, in particular in the case of the database default locale which is probably the most interesting case to get working)
  • It would be cool to get fadvise(POSIX_FADV_WILLNEED) working. Thomas has experimental patches for UFS and ZFS, watch this space.
  • DONE: FreeBSD 11.1 gained fdatasync(2). FreeBSD 13 gained aio_fsync(O_DSYNC) (the asynchronous equivalent) will be important for the new async I/O mode) and open(O_DSYNC).
  • Linux has sync_file_range() which can be used to do a kind of partial fdatasync(); PostgreSQL uses that to control writeback (see pg_flush_data()). On platforms that don't have it, it does other tricks like mmap(), msync(), munmap() which may or may not be as effective (what does it really do on ZFS?). request for FreeBSD implementation
  • It would be nice if strcoll()/strxfrm() could work directly on UTF-8, instead of allocating a temporary wide-character string to work on!
  • (This is really a POSIX/C problem, but that doesn't mean we can't solve it out-of-spec) It would be nice if there were a variant of strcoll() that took a length, "strncoll()", or didn't use null-termination at all. Then PostgreSQL wouldn't have to allocate extra copies of strings just to call strcoll() in some scenarios where it has a non-NT string. ICU can do this.
  • It sucks that you can't actually see the pathnames of POSIX shm segments mapped into PostgreSQL processes with procstat -v (whereas Linux pmap shows them). Here's a proposal to fix that: https://reviews.freebsd.org/D20390
  • mjg@freebsd.org points out that __vfrprintf is memsetting 312 bytes when we call setproctitle_fast(), which shows up in profiles
  • A whole brain dump of ideas relating to asynchronous I/O is in a separate page at FreeBSD/AIO.
  • It would be nice to know when it is safe to run without full page writes, due to atomic write size and alignment. See FreeBSD/AtomicIO for one idea on how to achieve that.

On the FreeBSD ports side

  • Currently you can't install major versions of PostgreSQL at the same time; we should fix that by using different paths, like Debian does (FreeBSD users typically use jails to get around that)
  • This is a problem for pg_upgrade, which can perform very fast upgrades from the data directory of one major version to another (optionally preserving the old copy unchanged), but it needs access to the binaries from the old major version and the new one
  • Currently there is no easy friendly way to list, create, start, stop multiple database clusters; people coming from Linux distros are used to that and it would be nice to have it on FreeBSD too. Steal postgresql-common from Debian?
  • Currently, at least when using pkg, ports finish up depending on a specific PostgreSQL client version, leading to absurdities like: attempting to install py27-psycopg2 (the Python PostgreSQL client package) tells you that you'll have to uninstall postgresql96-server and install postgresql95-server. Fix that by doing a separate libpq port, that just uses the latest libpq major version, that clients can depend on?
  • Currently, there isn't a separate per-PG-major-version port for server packages like postgis. Probably you'd want combinations of PostGIS and PostgreSQL versions, as they have on Debian.... so postgis24-pg10, postgis24-pg11 etc?
  • Currently, the pkg for postgresql is built with the option to include our own copy of the tz stuff. That means that it's possible for PostgreSQL to disagree with libc about timezone definitions. That seems like a bad idea, and other distributions don't do that as a matter of policy.

Other tools

  • DONE -- PostgreSQL uses GNU make. When running tests under -jMANY, the output is garbled. GNU make has -Otarget that will synchronise output so that output for each target is collected up and written out without getting mixed up. It's broken on all BSDs and Macs when output is a pipe, for example on CI and build farm systems. Boo. Here's an attempt to get that fixed upstream in GNU make. This was fixed (slightly different patch, whatever), but we don't really care anymore as we switched to Meson/Ninja.

Other recent developments that are good for PostgreSQL users

  • kib added fdatasync(), with support on UFS, available in FreeBSD 11.1 (previously PostgreSQL had to call fsync(), which might be slower because it might sync filesystem meta-data like modified time that we don't care about)
  • bapt added Unicode collation support (ported from Illumos), available in FreeBSD 11.0 (previously Unicode could only be sorted in binary order in FreeBSD's libc, though other encodings like Latin1 supported proper collating)
  • recent improvements in Unix domain socket code which was known to show up in some PostgreSQL benchmarks
  • kib's work from 2014 on PostgreSQL/FreeBSD scalability on many core machines
  • On the Impact of Instruction Address Translation Overhead (2019 paper) tested three FreeBSD changes that improved PostgreSQL performance significantly by adjusting super page promotion, padding and sharing logic.

Related resources

  • A FOSDEM 2019 BSD devroom talk about this stuff (slides + video recording).
  • A BSDCan 2020 talk about this stuff (slides + video recording to follow).