JDBC
JDBC is a core API of Java 1.1 and later. It provides a standard set of interfaces to SQL-compliant databases.
PostgreSQL provides a type 4 JDBC driver. Type 4 indicates that the driver is written in Pure Java, and communicates in the database system's own network protocol. Because of this, the driver is platform independent; once compiled, the driver can be used on any system.
User documentation
Development projects
Currently State
The driver was written some years ago (before NIO) and is starting to show it's age.
There is an issue with encrypted connections as the only way to determine if there is any more data in an encrypted connection is to do a blocking read.
We would like to avoid blocking I/O. One solution to this would be to introduce a monitor thread which would monitor all of the connections for events. This feels a lot like reinventing NIO. The other solution is to rewrite the driver using NIO. Given the maturity of NIO I think this is the desired direction.
That said I believe that since Netty has abstracted some of the NIO complexity it would make sense to use Netty to rewrite the driver.
Driver new architecture proposal by aht
First place, I definitely agree with the use of Netty. Netty is a very mature, robust, performat and battle-tested Network I/O framework. It takes away significant effort from the driver while providing a simpler programming model almost abstracted away from the sockets world, may use off-heap buffers and is used extensively in really high performance environments (read: Twitter). +1 to Netty.
What I'd like to suggest is building the layer in a very modular approach, trying to split it into modules, such that most --if not all-- of these modules may be used, independently, but third parties. This not only clearly favors cleaner design for the driver and separating concerns, but also widens the code users base which will only help improve quality and correctness of it. For example, the protocol layer (messages format, encoders, decoders, etc) may be used by other drivers, and the goal of Java of "code reuse" is effectively possible.
Irrespectively of whether these modules may be on separate repositories within pgjdbc or just separate maven artifacts, I propose to have the following modules:
- "FEBE" modules
- common: Contains the definition (read: POJOs, or equivalents) of all the messages of the Fe/Be protocol. It makes no assumptions on how they are sent/received or used, just contains their definitions. Encapsulates knowledge about the byte structure of them.
- client: Contains encoders for the messages to be sent by FeBe clients (such as the driver) and decoders for messages sent by the server. Has "common" as a dependency.
- server: Contains encoders for messages sent by a hypothetically Java PostgreSQL server, and decoders for messages sent by a client. While this module is optional, it can serve two purposes: one for integration testing of the driver, creating a simple "mock" PostgreSQL server based on Java; and also as the basis for creating server-side PostgreSQL servers, such as mock servers or even proxies/connection pools. Has "common" as a dependency.
- netty-client: a module building on top of "common" and "client" that uses Netty to send/receive messages. It is driver-agnostic, i.e., you could build a JDBC or fully async driver on top of it. Provides interface(s) with callbacks/IoC to plug in drivers functionality.
- pgjdbc: builds on top of netty-client to implement the JDBC standard. This is where significant parts of existing code would be ported.
- Optional modules. Other modules containing side functionality could also be developed separately from the driver, and could be the basis of other projects.
- For example, current driver contains a basic "SQL parser". However, a full-blown SQL parser in Java (maybe implemented with Antlr) could be very useful not only for the driver, but also for other separate projects.
- SCRAM. Already an external dependency (https://github.com/ongres/scram).
- CDC (Change Data Capture) abstraction layer on top of current driver, abstracting away from Logical Decoding plugin use. Could be re-used by other software, such as Debezium.
This architecture is not theoretical. A very similar approach was taken on a project I started but went abandonware shortly after (time constraints). But reflects these principles, at least for the FEBE part. Have a look at Phoebe. There would be no problem from rescueing parts from this project and incorporating them as the basis of this new architecture, if that would be helpful. This implementation does not follow 100% the above architecture, as the FEBE modules have dependency on netty-buffer. But actually that's not Netty itself, just a convenient way of representing an array of bytes that could be sliced. However, this could be further abstracted away if necessary. Other than that there are probably useful bits to use from this project, like MessageType, which is an enum that encodes all the properties of every message in the FEBE protocol and serves as a basis for every message implementation. Any phoebe code may be 100% donated to pgjdbc if that would be useful.
Features / Goals
- Per-row processing of results without requiring a transaction or fetch size to be set. This does not have to be exposed to the JDBC layer but the core driver should allow for something like an interface or callback to be specified for per-row result handling.
- Allow arbitrary PostgreSQL commands without knowing in advance whether the result type is a standard result set response or CopyOut response. This may not be necessary with CopyIn as the user would need to supply data and thus know in advance the command type.