Python PostgreSQL Driver TODO

From PostgreSQL wiki
Jump to navigationJump to search

Introduction

The PostgreSQL community has a wide array of choices for Python drivers, with no less than 8 different projects in this area. There is similar page on the Python wiki. Having so many choices, each with a very different set of trade-offs, is very confusing to newcomers and ultimately hinders some PostgreSQL+Python adoption. A recent long discussion on the pgsql-hackers list helped focus on the major issues in this area from the perspective of the PostgreSQL community.

A few general themes emerged from that discussion that considerably narrow the list of candidate drivers that seem worth focusing on:

  • There is not much value to the people who most want Python+PostgreSQL support in working on any of the pure Python implementations. Best case performance appears half or less of the ones that more directly wrap libpq. Even the best case with a faster driver like psycopg is enough of a performance hit as it is.
  • Targeting Python 2.X is a requirement; a roadmap featuring 3.X is nice; only supporting 3.X is of little value right now.
  • Projects using GPL or non-standard licenses are less attractive, due to a strong preference for the simpler license issues that come from a BSD or MIT style license in this community.

At the moment Psycopg and PyGreSQL appear to be the best positioned alternatives to match these goals, but each would need a number of changes to get completely there.

Improvements desired for Psycopg

A relicensed Psycopg seems like the closest match to the overall goals of the PostgreSQL project, in terms of coding work needed both in the driver and on the application side (because so many apps already use this driver). The following are the main open concerns, approximately in order of perceived value:

  • Confirm/add support for the most common standard types (array, tuple, record)
  • Consider refactoring to better follow standard driver practices; using PQExecParams in particular might be helpful.
    • There are discussion and test ongoing in this direction. Unfortunately this would break compatibility with existing applications (e.g. PQExecParams doesn't support sending multiple statements in a single query, a feature currently exposed by psycopg; user-defined typecasters may break too). Because many people see value in using the *Params path we are looking for a solution.
  • Unless you explicitly use server-side cursors, running a query with a large result set can result in memory errors even if you use fetchone because psycopg2 reads the entire results set internally. This seems to be different than the default behavior of most other drivers.

Several issues have already been resolved after they were raised with the project developers:

  • License change. BSD or MIT style license would be preferred. Switching to the LGPL would be less ideal but still a big improvement, so long as the OpenSSL license issues were addressed (and there is already such a clause in the existing license).
  • Add formal documentation page. The PostgreSQL community can do that on this wiki rather than expecting initd.org to handle it, and already started Psycopg documentation here.
  • Review/test implementation of V2.0 async changes.
    • Async support has been reviewed and tested: version 2.2 provides a very solid support.

Improvements desired for PyGreSQL

PyGreSQL is even older than Python's DB-API, and its DB-API interface appears less comprehensive. But it does have the right license, and its "classic" pg interface is a pretty direct mapping of libpq which can have some benefits. Features that would make it much more attractive, again in decreasing order of importance, include:

  • Test/complete/refactor for full DB-API 2.0 support. Some of the issues here seem to be documentation based rather than code--it's somewhat unclear what is and isn't supported.
  • Add some sort of extension support, perhaps modeled on what Psycopg does
  • Build a full COPY extension
  • Confirm/add support for the most common standard types (array, tuple, record, timestamp/timestamptz)
  • Confirm/add parameterized query support
  • Confirm/add multi-threaded support
  • Investigate integrating bytea support more cleanly into the DB-API interface. It's available in the classic interface already.
  • Consider refactoring to better follow standard driver practices

Note that many of these are already listed on the project's own future directions list.

Advanced feature checklist

There are a few features available in the database or to getting good performance from commonly expected Python apps which aren't necessary to build a functional driver, but that would be helpful. There are also some things that are missing from the Python DB-API that would be nice to have:

  • Optimizations to COPY pipeline to improve speed for operations like ETL (like avoiding redundant memory allocation)
  • Supporting advisory locks
  • Add direct support of prepared statements
  • Resultset cursors
  • DB structure examination

Driver Application Integration

In addition to the basic database wrapper, building easier PostgreSQL based Python applications requires integrating into the major development frameworks too:

Framework psycopg PyGreSQL pg8000 bgpsql py-postgresql
SQLAlchemy yes Mixed reports 0.6 beta no No 3.0
Django yes no no experimental No 3.0
Twisted yes yes no no No 3.0
web2py yes no yes, since 1.99.5+ no No 3.0
Peewee yes no no no No 3.0


Having the PostgreSQL community more unified around a single Python driver (or at least a smaller set) would seem to make it more likely those preferred driver(s) would be targeted for application support.