GSoC 2022

From PostgreSQL wiki
Jump to: navigation, search

This page is for collecting ideas for Google Summer of Code 2022 projects.

Note that there is a top-level GSoC page for PostgreSQL here: PostgreSQL General GSoC Page - please read this first before proceeding to contact mentors!

Contents

Proposals

The GSoC contributor application deadline has been reached.

Approved GSoC projects will be announced May 20th 2022.

Mentors (2022)

The following individuals have been listed as possible mentors on the below projects, and/or offered to be mentors for student-proposed projects:

  • Stephen Frost
  • Ilaria Battiston
  • Dave Cramer
  • Andreas Scherbaum
  • Jesper Pedersen
  • Chapman Flack
  • Mark Wong

Improve PostgreSQL Regression Test Coverage (2022)

Project Description

The current regression test coverage for PostgreSQL isn't great, to the point where some areas of the code are covered only at single-digit-percent levels.

Having good regression tests for such an important project as PostgreSQL is really key to minimizing the chance that any regressions are introduced. While this might seem like a small project, it isn't, particularly when it comes to PostgreSQL. PostgreSQL is over 1.3M lines of code and some of the code paths can be tricky to reach.

The current regression test coverage can be see here: https://coverage.postgresql.org

PostgreSQL's build system includes a "make coverage-html" to generate the report.

Please note that this project involves writing SQL code and Perl code, at a minimum, to implement the tests necessary to increase the code coverage of the PostgreSQL regression tests. There is a lot of opportunity for improvement here as PostgreSQL is currently at only about 73% LOC coverage.

Skills needed
  • Perl, as many PostgreSQL regression tests are written using the Perl TAP system and new ones will likely need to be.
  • SQL, to craft tests that hit certain code paths
  • Ability to read C code enough to work through a way to get a particular line or lines of code tested.
Difficulty Level

For someone with the skills listed, even at a relatively beginner level, should make this a very straight-forward if tedious project.

Project Size

This can be either a medium (175 hours) or a large (350 hours) project, depending on deliverables proposed by contributors.


Potential Mentors
  • Stephen Frost can mentor. Stephen is a major contributor and committer and has been working on improving regression tests in PG for quite a while.


Expected Outcomes
  • Significantly improved code coverage for the PostgreSQL regression test suite, ie: 73% -> 80%.

Develop Performance Farm Benchmarks and Website (2022)

Project Description

The PostgreSQL Performance Farm project is a community project to collect and review performance data from tests as code changes are made to PostgreSQL, working on Linux and OSX. The infrastructure uses a Django API and a Python script to run benchmarks and collect analytics data from PostgreSQL instances and hosts. Right now, the infrastructure is fully functional, however it could benefit from some additional features. Specifically, these are some ideas which could be implemented:

  • Extension of PgBench with additional benchmarks (TPC-H, YCSB, ...)
  • Add custom queries
  • Integrate profiling tools
  • Adding email notification support

This list can eventually be subject to changes, and is open to students' suggestions. Current code is at: https://github.com/PGPerfFarm/pgperffarm_server and https://github.com/PGPerfFarm/pgperffarm. Mentors are either past developers or past GSoC mentors for the same project.

As an example, the PostgreSQL Build Farm site [1] is a central repository for the results of testing source code changes for PostgreSQL as they occur, on a wide variety of platforms.

Skills needed
  • Python programming (both scripting and Django)
  • Web development (plain Javascript, HTML, CSS)
  • Understanding of PostgreSQL benchmarks and website infrastructure
Difficulty Level

This project should not be particularly challenging having basic full-stack development and Python skills.

Project Size

Medium sized project (175 hours).

Potential Mentors
  • Stephen Frost
  • Ilaria Battiston
Expected Outcomes
  • An expanded infrastructure where clients can upload test results which can be hosted on PostgreSQL infrastructure and users can search and review uploaded results.
References

[1] https://buildfarm.postgresql.org/


Creating a pgweb testing harness (2022)

Project Description

pgweb [1] is the official repository containing code of the website [http://www.postgresql.org/ ]. The infrastructure is deployed and maintained by a team of developers, however lacks of a testing suite to check for a correct functioning after each modification. The team could benefit of a framework to be integrated in the current CI pipeline, to support both tests for the current code and the addition of new tests for any upcoming new code.

What to be tested specifically is to be defined and will be part of the GSoC project, but there are some desired features:

  • automatic testing of existing code:
    • proper functionality of links, forms, cookies
    • browser compatibility and performance
    • accessibility
    • integration tests
  • possibility to easily implement new tests
  • other things can be suggested by students

The testing suite should be a standalone module, to eventually be integrated in CI pipeline after development. Since the web framework of pgweb is in Django and Python, these would be the preferred tools. Furthermore, a student should know a bit of web development and testing to be able to properly define requirements and implement them.

Skills needed
  • Django
  • Python
  • CI/CD pipelines
  • Web development and testing
  • HTML, CSS, Bootstrap
  • Javascript, jQuery
Difficulty Level

Medium.

Project Size

Medium sized project (175 hours).

Potential Mentors
  • Ilaria Battiston
  • Stephen Frost


Expected Outcomes
  • We expect the student to define a set of features and requirements to test, to then implement these tests and be able to run them for the existing codebase, automatically notifying errors on a periodic basis.
References

[1] https://github.com/postgres/pgweb


Improve pgarchives (2022)

Project Description

pgarchives [1] is the official archive of Postgres mailing lists. Users are able to access the interface from the website [2] and browse it, searching according to mailing list, keywords and date. Nevertheless, archives are not quite user friendly and lack some potentially useful functionalities.

Ideally, pgarchives should allow users to interact with mailing lists directly from its webpage, without recurring to external clients or websites. This could be implemented through a form, but it also needs to be supported by backend code, providing the necessary checks of credentials and permissions.

Furthermore, the graphical interface could benefit of some changes, in particular its search results page. There are some threads in the pgweb mailing lists stating desired improvements, which include:

  • better pagination
  • search results focussed on entire threads, rather than single separate emails
  • mobile-friendly results
  • general GUI restyling

Code is written in Python, with a Django API and some HTML/CSS on top.

Skills needed
  • Django
  • Python
  • Web development
Difficulty Level

Easy/medium knowing web development basics.

Project Size

Medium sized project (175 hours).

Potential Mentors
  • Ilaria Battiston
  • Stephen Frost


Expected Outcomes
  • We expect the student to improve search functionality, user interface usability and integrate mailing list interaction through the website.
References

[1] https://github.com/postgres/pgarchives [2] https://www.postgresql.org/list/


pgBackRest port to Windows (2022)

Project Description

This project would aim to port pgBackRest, which is now fully written in C, over to Windows. This will require some changes in how pgBackRest operates. The necessary changes for running pgBackRest on Windows have been contemplated during the architecture and design of pgBackRest and therefore should be reasonably straight-forward to implement.

Skills needed
  • C
  • Working in a Windows development environment
Difficulty Level

Medium.

Project Size

Medium (175 hours) or large (350 hours) sized project, flexibility according to deliverables.

Potential Mentors
  • Stephen Frost


Expected Outcomes
  • pgBackRest able to be built and run on Windows.
References

[1] https://www.pgbackrest.org/

[2] https://github.com/pgbackrest


New and improved website for pgjdbc (JDBC) (2022)

Project Description

This project would aim to update the pgjdbc website. Currently it is based on a very old PostgreSQL template and uses jekyll to build it. Ideally something more modern. Ideally buildable from github actions so that updating it was automated

Skills needed
  • Web development
Difficulty Level

Medium.

Project Size

Medium (175 hours).

Potential Mentors
  • Dave Cramer


Expected Outcomes
  • new and improved awesome website for the pgjdbc project
References

[1] https://jdbc.postgresql.org/

[2] https://github.com/pgjdbc/pgjdbc/tree/master/docs


pgagroal: Persistent SSL session (2022)

Project Description

pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

This project aims to add a persistent SSL session to the project such that the data of SSLContext inside OpenSSL can be shared between processes. pgagroal will use this new data structure to enhance its TLS support.

Skills needed
  • C
  • PostgreSQL
  • Transport Layer Security
  • OpenSSL
Difficulty Level

Hard.

The tasks in this project requires C programming skills, knowledge about Transport Layer Security (TLS) and OpenSSL internals.

This work will require some knowledge about basic PostgreSQL administration.

Project Size

Large (350 hours).

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Have a persistent SSL session that can be shared between processes.
References

[1] https://agroal.github.io/pgagroal/

[2] https://github.com/agroal/pgagroal

[3] https://github.com/agroal/pgagroal/blob/master/src/libpgagroal/security.c

[4] https://www.openssl.org/


pgagroal: SCRAM-SHA-256-PLUS support (2022)

Project Description

pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

This project aims to add SCRAM-SHA-256-PLUS support in order to enhance security when communicating with the connection pool.

These are some tasks that need to be completed (additional suggestions welcome):

  • Study the relevant specifications
  • Implement SCRAM-SHA-256-PLUS support
  • Implement SCRAM-SHA-256-PLUS between pgagroal and PostgreSQL
Skills needed
  • C
  • PostgreSQL
  • Transport Layer Security
  • OpenSSL
Difficulty Level

Medium.

The tasks in this project requires C programming skills, knowledge about Transport Layer Security (TLS) and the OpenSSL API.

This work will require some knowledge about basic PostgreSQL administration.

Project Size

Medium (175 hours) or large (350 hours) depending on the deliverables.

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Implement SCRAM-SHA-256-PLUS support (Medium)
  • Implement SCRAM-SHA-256-PLUS between pgagroal and PostgreSQL (Large)
References

[1] https://agroal.github.io/pgagroal/

[2] https://github.com/agroal/pgagroal

[3] https://github.com/agroal/pgagroal/blob/master/src/libpgagroal/security.c

[4] https://tools.ietf.org/html/rfc5802

[5] https://tools.ietf.org/html/rfc7677

[6] https://tools.ietf.org/html/rfc5246

[7] https://www.openssl.org/


pgmoneta: Write-Ahead Log (WAL) infrastructure (2022)

Project Description

pgmoneta is a backup / restore solution for PostgreSQL.

This project aims to integrate the PostgreSQL Write-Ahead Log (WAL) data structures into pgmoneta.

These are some tasks that need to be completed (additional suggestions welcome):

  • Study the PostgreSQL Write-Ahead Log (WAL) format
  • Replace usage of pg_receivewal inside pgmoneta with a pgmoneta native solution
  • Create a foundation for a pgmoneta library that can be used for WAL interaction
Skills needed
  • C
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires C programming skills.

Project Size

Medium (175 hours) or large (350 hours) depending on the deliverables.

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Replace usage of pg_receivewal inside pgmoneta with a native solution (Medium)
  • Create a foundation for a pgmoneta library that can be used for WAL interaction (Large)
References

[1] https://pgmoneta.github.io/

[2] https://github.com/pgmoneta/pgmoneta

[3] https://www.postgresql.org/


pgmoneta: Storage API (2022)

Project Description

pgmoneta is a backup / restore solution for PostgreSQL.

This project aims to abstract how backups are stored in various locations such as locally and on remote machines.

These are some tasks that need to be completed (additional suggestions welcome):

  • Create an internal API for storage engines
  • Create an implementation of various storage engines (local, SSH, ...)
Skills needed
  • C
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires C programming skills.

Project Size

Medium (175 hours) or large (350 hours) depending on the deliverables.

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Create an internal API for storage engines (Medium)
  • Create an implementation of various storage engines (local and SSH) (Medium)
  • Additional storage engines (Large)
References

[1] https://pgmoneta.github.io/

[2] https://github.com/pgmoneta/pgmoneta


pgmoneta: On-disk encryption (2022)

Project Description

pgmoneta is a backup / restore solution for PostgreSQL.

This project aims to add support for on-disk encryption of backups.

These are some tasks that need to be completed (additional suggestions welcome):

  • Create an internal API for encryption / decryption
  • Create an implementation using AES-256
Skills needed
  • C
  • PostgreSQL
  • OpenSSL
Difficulty Level

Medium.

The tasks in this project requires C programming skills.

Project Size

Medium (175 hours).

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Create an internal API for encryption / decryption
  • Create an implementation using AES-256
References

[1] https://pgmoneta.github.io/

[2] https://github.com/pgmoneta/pgmoneta


pgexporter: Custom file (2022)

Project Description

pgexporter is a Prometheus exporter for PostgreSQL.

This project aims to add the option to have a custom file specify the queries run against the PostgreSQL instances.

These are some tasks that need to be completed (additional suggestions welcome):

  • Define and integrate loading of the format - maybe YAML - into pgexporter
  • Create Prometheus metric for each of the defined queries
  • Improve infrastructure on creating new queries and their representation in the Prometheus interface
Skills needed
  • C
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires C programming skills.

Project Size

Medium (175 hours) or large (350 hours) depending on the deliverables.

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Define and integrate loading of the format - maybe YAML - into pgexporter (Medium)
  • Create Prometheus metric for each of the defined queries (Medium)
  • Improve infrastructure on creating new queries and their representation in the Prometheus interface (Large)
References

[1] https://pgexporter.github.io/

[2] https://github.com/pgexporter/pgexporter

[3] https://prometheus.io/

[4] https://grafana.com/oss/


PL/Java refactoring for support of other JVM and polyglot languages (2022)

Project Description

PL/Java is maintained externally but referred to in PostgreSQL's SQL Conformance statement as the component that provides Java routines and types to meet part 13 of the SQL spec. PostgreSQL is extensible in many ways beyond what that spec provides, such as with new operators and aggregates, and PL/Java can be used to build those too. It is implemented mostly in Java and C.

Having supported only the Java language since 2004, PL/Java is getting a thorough refactoring to make it a common base for building PostgreSQL server-side PLs using any of the current large ecosystem of languages that run on the JVM or with the polyglot features of GraalVM. A contributor on this project will be shaping a PL/Java that looks forward to its next decade or two.

These are some tasks that need to be completed:

  • Implement data type adapters for PostgreSQL types not yet covered.
  • Develop regression tests for data type adapters.
  • Complete the implementation of isSubtype and use to implement an adapter manager as described in the pull request.
  • Finish the Java-to-PostgreSQL side of APIs (Adapter, Datum.Accessor, TupleTableSlot) that currently only implement PostgreSQL-to-Java.

A more complete list of tasks can be found in pull request 399. An interested contributor should become familiar with that pull request and with building PL/Java from the associated branch. A project proposal may include a combination of items from that list, and additional ideas are also welcome.

Skills needed

  • Primarily Java
  • Familiarity with C and acquaintance with the Java Native Interface (JNI)
  • Some familiarity with PostgreSQL internals, such as experience creating some simple C-language PostgreSQL extension.

Historically, hacking on PL/Java has been C- and JNI-heavy, but this refactoring aims to move much more of the interesting stuff to pure Java, so contributors with primarily Java experience can make substantial contributions. Some familiarity with C and JNI may still be helpful for looking underneath to see what is going on.

  • Ideally, a favorite JVM or GraalVM language other than Java, to use in the final "expected outcome" below.

Difficulty level

Moderately easy to moderately hard, depending on the items included.

Project size

Medium (175 hours) or large (350 hours) depending on the proposed deliverables.

Mentors

  • Chapman Flack can mentor. Chap is the current PL/Java maintainer.

Expected outcomes

  • Accepted pull requests in the PL/Java GitHub repository for the items proposed.
  • For your favorite JVM or GraalVM programming language Foo, other than Java, an implementation on GitHub of a simple PostgreSQL PL/Foo (with perhaps limited functionality), implemented with this branch of PL/Java.

Database Load Stress Benchmark (2022)

Project Description

This project involves developing a new benchmarking stress tool for testing PostgreSQL's durability and resilience. This will help database engineers find the limits of their current server and database configuration.

Scripts need to be developed to:

  • create a database (including schema for tables and indexes)
  • load data to a database
  • create transactional load with read and write queries (INSERTs, UPDATEs, and DELETEs)
  • perform additional maintenance operations such as creating indexes and vacuuming

The Touchstone project provides C and Rust code for generating random data such as strings, numbers and dates. If time allows, it would be useful to also generate random data for other data types PosgreSQL supports, such as arrays and JSON.

Some aspects we may want to control include:

1. The amount of data to generate 2. The number of tables to generate 3. The duration of the test 4. Which additional SQL statements to run (e.g. SELECT, INSERT, UPDATE, DELETE) and how many users to run them

Skills needed

  • C or Rust
  • PostgreSQL
  • Shell scripting

Difficulty level

Moderately easy to moderately hard, depending on the items included.

Project size

Medium (175 hours) or large (350 hours) depending on the proposed deliverables.

Mentors

  • Mark Wong can mentor. Mark is a major contributor, wrote the data generation routines in the Touchstone project and has experience writing benchmarking kits.

Expected outcomes

  • Implement a test that will load a PostgreSQL database with random data.
  • Benchmark the data loading for different data sizes and data types.
  • Create a comparable test scenario for I/O performance measurements.
  • Generate a benchmark report summarizing the test just run, including test time, system statistics and database statistics.

DBT-5 Stored Procedure Development (2022)

Project Description

This project involves updating DBT-5, a fair use benchmarking kit based on the TPC Benchmark(TM) E. The server side business logic was developed at a time before stored procedures and sub-transaction support was available for PostgreSQL.

The goal is to update the current server side business logic, up to 12 transactions, to follow the benchmark specification to use stored procedures and sub-transaction as described by the specification.


Skills needed

  • C
  • C++
  • pl/PGSQL
  • PostgreSQL

Difficulty level

Moderately easy to moderately hard, depending on proficiency with C coding.

Project size

Medium (175 hours) or large (350 hours) depending on the skill level with C for use with libpq libraries and writing C stored procedures.

Mentors

  • Mark Wong can mentor. Mark is a major contributor, has experience writing benchmarking kits, and mentored the original GSOC project that developed this kit.

Expected outcomes

  • Review of the 12 transaction to determine which should be converted to procedures and updated with appropriate sub-transaction handling
  • Updated C code for all transactions that need to be updated.
  • Updated pl/PGSQL code for all transactions that need to be updated.
  • The C++ code may need to be updated in conjunction with updates to the server side business logic.

Database EAV Performance Benchmark (2022)

Project Description

This project involves developing a new benchmarking demonstrating the performance issues of the Entity-Attribute-Value (EAV) anti-pattern.

Scripts need to be developed to:

  • create a database (including schema for tables and indexes)
  • load data to a database
  • create a load with read queries

The Touchstone project provides C and Rust code for generating random data such as strings, numbers and dates.

Some aspects we may want to control include:

1. The amount of data to generate 2. How many and which queries to run

Skills needed

  • C or Rust
  • PostgreSQL
  • Shell scripting
  • Database design

Difficulty level

Moderately easy to moderately hard, depending on familiarity with EAV data models and identifying common use cases.

Project size

Medium (175 hours) or large (350 hours) depending on the proposed deliverables.

Mentors

  • Mark Wong can mentor. Mark is a major contributor, wrote the data generation routines in the Touchstone project and has experience writing benchmarking kits.

Expected outcomes

  • Database schema design.
  • Set of common queries used to in EAV data model.
  • Generate a benchmark report summarizing the test just run, including test time, system statistics and database statistics.

pl/Julia - PostgreSQL Procedural Language Development (2022)

Project Description

pl/julia is the a PostgreSQL extension to support the Julia programming language in PostgreSQL. While currently functional there are some features that are still missing.

Here is a summary of accomplishments from the previous project.

These are some tasks that need to be completed (additional suggestions welcome):

  • Support additional PostgreSQL data types, including but not limited to, bytea, Date, Time, JSON
  • Transaction support
  • Additional examples demonstrating the capabilities of Julia, (e.g. examples using data frames, high powered statistical functions)

Note that all of these do not need to be completed and that proposals should consider what could be be some and stretch goals of what else could be attempted if time permits.

Skills needed

  • Julia
  • C
  • SQL
  • PostgreSQL

Difficulty level

Moderately easy to moderately hard, depending on familiarity with EAV data models and identifying common use cases.

Project size

Medium (175 hours) or large (350 hours) depending on the proposed deliverables.

Mentors

  • Mark Wong can mentor.

Expected outcomes

  • Any of the proposed work to be functional.

GUI representation of monitoring System Activity with the system_stats Extension in pgAdmin 4

Project Description

This project involves design and developing dashboard graphs/charts in pgAdmin 4 which shows the system level statistics for monitoring Postgres activity with the help of the system_stats extension. The system_stats extension is a library of stored procedures that allow users to access system-level statistics for monitoring Postgres activity. These procedures reveal a variety of system metrics to the database server.

Skills Needed

  • Python/Flask
  • Javascript
  • React JS
  • CSS

Difficulty level

  • Medium

Project Size

  • Medium (175 hours).

Potential Mentors

  • Khushboo Vashi

Expected Outcomes

  • To have a GUI representation (charts/graphs) of the system level statistics on Dashboards tab in pgAdmin 4