GSoC 2020

From PostgreSQL wiki
Jump to: navigation, search

This page is for collecting ideas for future Summer of Code projects.

Regarding Project Ideas

Project ideas are to be added here by community members.

NOTE: Google wants each idea to be supported by a few sentences and then a list of skills/reqs, difficulty level, potential mentors, expected outcomes, etc. Please add sections in this format.

Mentors (2020)

The following individuals have been listed as possible mentors on the below projects, and/or offered to be mentors for student-proposed projects:

  • Stephen Frost
  • David Steele
  • Andrey Borodin
  • Andreas Scherbaum

WAL-G performance improvements (2020)

Project Description

WAL-G is the simple backup tool for PostgreSQL. Essentially, it supports backup and WAL archiving to cloud storages. https://github.com/wal-g/wal-g

We would like to see implemented following features:

  • unpack delta-backups in reverse order to avoid rewriting the same pages
  • reorganize backup structure to avoid downloading unnecessary files(which appears unchanged after the previous feature)

Also, it would be good to implement some safety features:

  • WAL's history consistency check -- we want to ensure, that we can provide PITR (point in time recovery) for the user.
  • Page checksum verification -- it's one of the key features needed for backup verification.
Skills needed
  • Go language
  • Ability to read technical papers and documentation to understand PostgreSQL inner structure.
Difficulty level

Moderate

Potential Mentors
  • Andrey Borodin
  • Georgy Rylov
  • Kirill Reshke
  • Svyatoslav Ermilin
Expected Outcomes

All before mentioned features properly implemented, covered with tests and Pull-Requested to WAL-G.


Improve PostgreSQL Regression Test Coverage (2020)

Project Description

The current regression test coverage for PostgreSQL isn't great, to the point where some areas of the code are covered only at single-digit-percent levels.

Having good regression tests for such an important project as PostgreSQL is really key to minimizing the chance that any regressions are introduced. While this might seem like a small project, it isn't, particularly when it comes to PostgreSQL. PostgreSQL is over 1.3M lines of code and some of the code paths can be tricky to reach.

The current regression test coverage can be see here: https://coverage.postgresql.org

PostgreSQL's build system includes a "make coverage-html" to generate the report.

Please note that this project involves writing SQL code and Perl code, at a minimum, to implement the tests necessary to increase the code coverage of the PostgreSQL regression tests. There is a lot of opportunity for improvement here as PostgreSQL is currently at only about 73% LOC coverage.

Skills needed
  • Perl, as many PostgreSQL regression tests are written using the Perl TAP system and new ones will likely need to be.
  • SQL, to craft tests that hit certain code paths
  • Ability to read C code enough to work through a way to get a particular line or lines of code tested.
Difficulty Level

For someone with the skills listed, even at a relatively beginner level, should make this a very straight-forward if tedious project.

Potential Mentors
  • Stephen Frost can mentor. Stephen is a major contributor and committer and has been working on improving regression tests in PG for quite a while.
  • Andreas Scherbaum
Expected Outcomes
  • Significantly improved code coverage for the PostgreSQL regression test suite, ie: 73% -> 80%.

Develop Performance Farm Benchmarks and Website (2020)

Project Description

The PostgreSQL Performance Farm project is a community project to collect and review performance data from tests as code changes are made to PostgreSQL, working on Linux and OSX. The infrastructure uses a VueJS website, a Django-REST API and a Python script to run benchmarks and collect analytics data from PostgreSQL instances and hosts. Right now, the site and the API are functioning, yet the Python benchmark requires extensive debugging to remedy to recent end-of-life of Python 2.7, operating systems developments and support newer PostgreSQL releases. Test results have to be submitted as JSON or text files, and the Django API needs to be adapted to any change in results formatting. Furthermore, the full infrastructure is still not compliant to Postgresql.org system authentication standards: the system login needs to be rewritten to be able to interface with custom authentication modules, allowing users to log in using their PostgreSQL Community accounts, or external accounts. For reference, the code that will be supplying test results is https://git.postgresql.org/gitweb/?p=pgperffarm.git;a=summary.

As an example, the PostgreSQL Build Farm site [1] is a central repository for the results of testing source code changes for PostgreSQL as they occur, on a wide variety of platforms.

Skills needed
  • Python programming (both scripting and Django-REST)
  • Web development (JSON, VueJS)
  • Understanding of PostgreSQL benchmarks and website infrastructure
Difficulty Level

This project requires familiarity with Python programming and database experience.

Potential Mentors
  • Andreas Scherbaum
Expected Outcomes
  • A functional Web site where clients can upload test results, and users can search and review uploaded results.
References

[1] https://buildfarm.postgresql.org/


Read/write transaction-level routing in Odyssey (2020)

Project Description

Odyssey is an advanced multi-threaded PostgreSQL connection pooler and request router. Odyssey allows defining client routing rules by specifying client database, client user name and storage connection. In some HA scenarios, it is necessary to define different routing for read-only and read-write transactions. Odyssey can parse transaction opening statements and assign different routes depending on transaction types.

It would be cool to support many consistency models for reads on standby: 1. Redirecting read query on standby only if standby is lagging no more than X seconds 2. Redirecting read query on standby unless there was recent write (writer should see his writes) 3. Redirecting some read queries on syn or async standby

Skills needed
  • C programming
Difficulty Level

Moderate.

Potential Mentors
  • Andrey Borodin can mentor.
Expected Outcomes
  • Pull request to Odyssey with described feature
References

[1] https://github.com/yandex/odyssey


pgBackRest port to Windows (2020)

Project Description

This project would aim to port pgBackRest, which is now fully written in C, over to Windows. This will require some changes in how pgBackRest operates. The necessary changes for running pgBackRest on Windows have been contemplated during the architecture and design of pgBackRest and therefore should be reasonably straight-forward to implement.

Skills needed
  • C
  • Working in a Windows development environment
Difficulty Level

Medium.

Potential Mentors
  • Stephen Frost
  • David Steele
Expected Outcomes
  • pgBackRest able to be built and run on Windows.
References

[1] https://www.pgbackrest.org/

[2] https://github.com/pgbackrest


PL/Java build system improvements (2020)

Project Description

PL/Java is maintained externally but referred to in PostgreSQL's SQL Conformance statement as the component that provides Java routines and types to meet part 13 of the SQL spec. It is implemented mostly in Java and C.

PL/Java's build system was reworked in 2013 to be based on Maven.

Subproject details 1: continuous integration

Several services for continuous integration are readily available on GitHub, where PL/Java's repository is hosted. The PostgreSQL JDBC project, also hosted on GitHub, is already using Travis for CI.

Enabling CI for PL/Java, and building for a good selection of platforms and PostgreSQL versions, would be extremely helpful to avoid surprises at release time.

Subproject details 2: Maven repository deployment of pljava-api artifact

PL/Java's Maven build produces several artifacts; the one called pljava-api is needed by users of PL/Java when compiling their own code. The pljava-api artifact should be available in the Maven Central repository so it can simply be named as a dependency when building client code. There has been an issue opened to request this.

Deployment can be automated at release time using Travis; the PgJDBC project already does so. Coordination will be necessary with PgJDBC's maintainer, who controls the org.postgresql Maven coordinates.

Subproject details 3: the C shared-object build

The C portion of PL/Java is built with the nar-maven-plugin. The plugin is useful in handling many details of C compilation and shared-library building across a variety of platforms. It is otherwise not a perfect fit: it is made to build a particular kind of "native archive" that PL/Java does not need, and it does not pick up the same compiler options that were used to build PostgreSQL itself (there is an open PL/Java issue for this). And the nar plugin uses different warning options than the main PostgreSQL build, which makes the PL/Java build noisy with unimportant warnings so that any real problems are hard to find.

The project could develop a new Maven plugin simpler than the nar-maven-plugin and directly geared to building for PostgreSQL, picking up the necessary options from pg_config. A less-ambitious approach could simply use the maven-antrun-plugin (already used in PL/Java's build anyway), and extend the Ant scripting to take over the rest of what the nar-maven-plugin now does. Both approaches are described briefly in issue 152.

Skills needed
  • Java
  • Maven build system
  • Subproject 3: C shared-library building on more than one platform/toolchain
  • Subproject 3: Maven plugin development, possibly Ant and JavaScript
Difficulty level
  • Subprojects 1, 2: easy
  • Subproject 3: moderate plus (build requirements on multiple platforms must be accounted for)
Mentors
  • Chapman Flack can mentor. Chap is the current PL/Java maintainer.
Expected outcomes
  • Subproject 1: CI builds and reports occur on pull requests, for multiple versions of PostgreSQL and Java, and on Linux, macOS, and Windows.
  • Subproject 2: pljava-api jar artifact deployed to Maven Central when a new release is made.
  • Subproject 3: Build system no longer uses nar-maven-plugin, correctly uses build settings from pg_config, and is tested on at least Linux, macOS, and Windows (with Visual Studio and with MinGW).

PL/Java online documentation improvements (withdrawn for 2020)

Project Description

Note: this project idea has been withdrawn in the 2020 cycle.

PL/Java is maintained externally but referred to in PostgreSQL's SQL Conformance statement as the component that provides Java routines and types to meet part 13 of the SQL spec. It is implemented mostly in Java and C.

PL/Java's online documentation is currently generated, from Doxia Markdown and Velocity sources and JavaDoc, by the maven-site-plugin. It is hosted via GitHub Pages.

Project details
  • The generated site currently shows only one version of the documentation, for the most recent release. With a new PL/Java major release anticipated, the site should offer documentation trees for more than one release, much as the PostgreSQL documentation itself does.
  • Very little has been done to style the generated documentation beyond the maven-site-plugin defaults, with a very rudimentary look. Without any invasive restructuring of the documentation sources, it would be good to have typographical and color conventions more resembling the documentation of PostgreSQL itself or of PgJDBC. (The PostgreSQL project does not use Markdown as its documentation source format, but PgJDBC does.)
Skills needed
  • Maven build system and the maven-site-plugin
  • HTML, CSS, Velocity templates
Difficulty level

Easy, though constrained by the behavior of the maven-site-plugin and Velocity.

Mentors
  • Chapman Flack can mentor. Chap is the current PL/Java maintainer.
Expected outcomes

A pull request, with code changed and documentation, such that when tested with mvn compile site site:stage, the staging area can be viewed in a browser and:

  • Offers release-specific documentation trees
  • Appearance (typography, color, etc.) need not be identical to either PostgreSQL or PgJDBC documentation, but should look recognizably similar; glaring problems with the current style (such as literal text being much smaller than the surrounding type) should be eliminated.