GSoC 2021

From PostgreSQL wiki
Jump to navigationJump to search

This page is for collecting ideas for future Summer of Code projects.

Note that there is a top-level GSoC page for PostgreSQL here: PostgreSQL General GSoC Page

Regarding Project Ideas

Project ideas are to be added here by community members.

NOTE: Google wants each idea to be supported by a few sentences and then a list of skills/reqs, difficulty level, potential mentors, expected outcomes, etc. Please add sections in this format.

Mentors (2021)

The following individuals have been listed as possible mentors on the below projects, and/or offered to be mentors for student-proposed projects:

  • Stephen Frost
  • David Steele
  • Andreas Scherbaum
  • Mark Wong
  • Ilaria Battiston
  • Andrey Borodin
  • Jesper Pedersen
  • Mark Rofail
  • Gabrielle Roth
  • Fabrízio de Royes Mello

Improve PostgreSQL Regression Test Coverage (2021)

Project Description

The current regression test coverage for PostgreSQL isn't great, to the point where some areas of the code are covered only at single-digit-percent levels.

Having good regression tests for such an important project as PostgreSQL is really key to minimizing the chance that any regressions are introduced. While this might seem like a small project, it isn't, particularly when it comes to PostgreSQL. PostgreSQL is over 1.3M lines of code and some of the code paths can be tricky to reach.

The current regression test coverage can be see here: https://coverage.postgresql.org

PostgreSQL's build system includes a "make coverage-html" to generate the report.

Please note that this project involves writing SQL code and Perl code, at a minimum, to implement the tests necessary to increase the code coverage of the PostgreSQL regression tests. There is a lot of opportunity for improvement here as PostgreSQL is currently at only about 73% LOC coverage.

Skills needed
  • Perl, as many PostgreSQL regression tests are written using the Perl TAP system and new ones will likely need to be.
  • SQL, to craft tests that hit certain code paths
  • Ability to read C code enough to work through a way to get a particular line or lines of code tested.
Difficulty Level

For someone with the skills listed, even at a relatively beginner level, should make this a very straight-forward if tedious project.

Potential Mentors
  • Stephen Frost can mentor. Stephen is a major contributor and committer and has been working on improving regression tests in PG for quite a while.
  • Andreas Scherbaum
  • Mark Rofail, former GSoC student.
  • Fabrízio de Royes Mello, former GSoC student.
Expected Outcomes
  • Significantly improved code coverage for the PostgreSQL regression test suite, ie: 73% -> 80%.

Develop Performance Farm Benchmarks and Website (2021)

Project Description

The PostgreSQL Performance Farm project is a community project to collect and review performance data from tests as code changes are made to PostgreSQL, working on Linux and OSX. The infrastructure uses a Django API and a Python script to run benchmarks and collect analytics data from PostgreSQL instances and hosts. Right now, the infrastructure is functional but some improvements would be beneficial, especially for better portability on the PostgreSQL server. Specifically, these are some features which could be implemented:

  • Integration of collectd and pg_stat_statement data
  • Creation of an admin panel accessible through the website
  • General GUI improvements
  • Eventual migration from Vue.js to vanilla Javascript
  • Removal of dependencies

This list can eventually be subject to changes, and is open to students' suggestions. Current code is at: https://git.postgresql.org/gitweb/?p=pgperffarm.git;a=summary. Mentors are either past developers or past GSoC mentors for the same project.

As an example, the PostgreSQL Build Farm site [1] is a central repository for the results of testing source code changes for PostgreSQL as they occur, on a wide variety of platforms.

Skills needed
  • Python programming (both scripting and Django)
  • Web development (Javascript, Vue.js)
  • Understanding of PostgreSQL benchmarks and website infrastructure
Difficulty Level

This project should not be particularly challenging having basic full-stack development skills.

Potential Mentors
  • Stephen Frost
  • Ilaria Battiston
  • Mark Rofail, former GSoC student
Expected Outcomes
  • A functional Web site where clients can upload test results which can be hosted on PostgreSQL infrastructure and users can search and review uploaded results.
References

[1] https://buildfarm.postgresql.org/


Improve pgeu-system for Conference Management (2021)

Project Description

pg-eu system is an infrastructure used by Postgres for managing non-profit organizations along with an extensive conference management system. It contains features such as invoices, administration of conferences, memberships and elections, along with social media integration. There are some improvements which could be made to the system, which are being proposed in the form of GitHub issues [1], such as adding HTML emails and adding a generalised form.

Specific sub-projects which we would like to see a student propose to work on:

  • Add support for sending emails with HTML. Students interested in this would do well to review how the pgweb [2] system supports sending HTML emails in the News system. [3] Github issue for this. [4]
  • Add support for multiple vote tallying methodologies. There is currently one built-in method of voting but we would like to see that extended, in particular adding support for STV [6]. The github issue for this [5]
  • Generalized form capability. This task is definitely larger than the others included and a full proposal for this item by itself could be a GSoC project, depending on the student's skill. The idea is to have a generalized form-building capability which administrators could use to craft surveys to send to members and others. There is some support already for this in the conference system but we'd like to generalize it and make it available outside of the conference part of the system. Github issue for this [7]
  • Other projects could be suggested by a student based on the github issues list.
Skills needed
  • Django
  • Web development
Difficulty Level

Easy/medium.

Potential Mentors
  • Ilaria Battiston
  • Stephen Frost
  • Mark Rofail, former GSoC student
Expected Outcomes
  • We expect the student to go through the list of issues, and work on at least a few of them, especially the major ones.
References

[1] https://github.com/pgeu/pgeu-system/issues [2] https://git.postgresql.org/gitweb/?p=pgweb.git;a=summary [3] https://git.postgresql.org/gitweb/?p=pgweb.git;a=tree;f=pgweb/news;h=6fd47283f0e39dfe6c7b31fded55ff79ae521767;hb=HEAD [4] https://github.com/pgeu/pgeu-system/issues/53 [5] https://github.com/pgeu/pgeu-system/issues/54 [6] https://github.com/RhodiumToad/sql_stv [7] https://github.com/pgeu/pgeu-system/issues/52


Creating a pgweb testing harness (2021)

Project Description

pgweb [1] is the official repository containing code of the website [http://www.postgresql.org/ ]. The infrastructure is deployed and maintained by a team of developers, however lacks of a testing suite to check for a correct functioning after each modification. The team could benefit of a framework to be integrated in the current CI pipeline, to support both tests for the current code and the addition of new tests for any upcoming new code.

What to be tested specifically is to be defined and will be part of the GSoC project, but there are some desired features:

  • automatic testing of existing code:
    • proper functionality of links, forms, cookies
    • browser compatibility and performance
    • accessibility
    • integration tests
  • possibility to easily implement new tests
  • other things can be suggested by students

The testing suite should be a standalone module, to eventually be integrated in CI pipeline after development. Since the web framework of pgweb is in Django and Python, these would be the preferred tools. Furthermore, a student should know a bit of web development and testing to be able to properly define requirements and implement them.

Skills needed
  • Django
  • Python
  • CI/CD pipelines
  • Web development and testing
  • HTML, CSS, Bootstrap
  • Javascript, jQuery
Difficulty Level

Medium.

Potential Mentors
  • Ilaria Battiston
  • Stephen Frost
  • Mark Rofail, former GSoC student
Expected Outcomes
  • We expect the student to define a set of features and requirements to test, to then implement these tests and be able to run them for the existing codebase, automatically notifying errors on a periodic basis.
References

[1] https://github.com/postgres/pgweb


Improve pgarchives (2021)

Project Description

pgarchives [1] is the official archive of Postgres mailing lists. Users are able to access the interface from the website [2] and browse it, searching according to mailing list, keywords and date. Nevertheless, archives are not quite user friendly and lack some potentially useful functionalities.

Ideally, pgarchives should allow users to interact with mailing lists directly from its webpage, without recurring to external clients or websites. This could be implemented through a form, but it also needs to be supported by backend code, providing the necessary checks of credentials and permissions.

Furthermore, the graphical interface could benefit of some changes, in particular its search results page. There are some threads in the pgweb mailing lists stating desired improvements, which include:

  • better pagination
  • search results focussed on entire threads, rather than single separate emails
  • mobile-friendly results
  • general GUI restyling

Code is written in Python, with a Django API and some HTML/CSS on top.

Skills needed
  • Django
  • Python
  • Web development
Difficulty Level

Easy/medium.

Potential Mentors
  • Ilaria Battiston
  • Stephen Frost
  • Mark Rofail, former GSoC student
Expected Outcomes
  • We expect the student to improve search functionality, user interface usability and integrate mailing list interaction through the website.
References

[1] https://github.com/postgres/pgarchives [2] https://www.postgresql.org/list/


pgBackRest port to Windows (2021)

Project Description

This project would aim to port pgBackRest, which is now fully written in C, over to Windows. This will require some changes in how pgBackRest operates. The necessary changes for running pgBackRest on Windows have been contemplated during the architecture and design of pgBackRest and therefore should be reasonably straight-forward to implement.

Skills needed
  • C
  • Working in a Windows development environment
Difficulty Level

Medium.

Potential Mentors
  • Stephen Frost
  • David Steele
Expected Outcomes
  • pgBackRest able to be built and run on Windows.
References

[1] https://www.pgbackrest.org/

[2] https://github.com/pgbackrest


Add functionality to pg_top and supporting tools (2021)

Project Description

pg_top (https://pg_top.gitlab.io/) is 'top' for PostgreSQL. It is a terminal based monitoring tool specifically for PostgreSQL. There are companion tools such as pg_systat (https://pg_systat.gitlab.io/) and pg_proctab (https://pg_proctab.gitlab.io/) that help monitor database statistics and enable monitoring of remote databases. pg_systat focuses more on database statistics and general system statistics as opposed to individual processes. pg_proctab is a PostgreSQL extension that exposes operating system statistics through user defined functions.

This is a list of some functionality enhancements that could improve these tools (additional suggestions welcome):

  • Add monitoring of pg_stat_statements (https://www.postgresql.org/docs/current/pgstatstatements.html) to pg_systat.
  • Add ability to monitor remote systems with pg_systat, which may need supporting functionality added to pg_proctab.
  • Add additional i/o stats to pg_systat's monitoring of table spaces.
  • Update pg_top to work on current releases of MacOS
  • Port pg_proctab to work on FreeBSD
  • Rewrite pg_top in rust.
  • Rewrite pg_systat in rust.
Skills needed
  • C
  • SQL
  • PostgreSQL
  • Rust (only for porting tasks)
Difficulty Level

The tasks in this project needs familiarity with C programming. Understanding SQL is a plus but can be learned during the project as its use is limited.

Some functionality involves learning about operating systems such as the process table and how system statistics are reported.

Some PostgreSQL administration skills are needed for creating a database in order to tests these tools.

Potential Mentors
  • Mark Wong
  • Gabrielle Roth
Expected Outcomes

There are many tasks and it may not be possible to complete all of them within a single summer. A plan is expected for which tasks to be attempted.

References

[1] pg_top presetnation: https://www.slideshare.net/markwkm/pgtop-is-top-for-postgresql-pgtop-pgproctab-presentation

[2] pg_proctab presentation: https://www.slideshare.net/markwkm/pgproctab-accessing-system-stats-in-postgresql-3573304

[3] pg_systat presentatin: https://postgresql.us/events/pgopen2019/sessions/session/622/slides/48/terminal-tools.pdf

[4] pg_top homepage: https://pg_top.gitlab.io/

[5] pg_proctab homepage:https://pg_proctab.gitlab.io/

[6] pg_systat homepage: https://pg_systat.gitlab.io/


Create procedural language extension for the Julia programming language (2021)

Project Description

pl/julia is the start of an extension to support the Julia programming language in PostgreSQL. While currently functional it has minimal support for writing user defined functions so additional functionality is needed to make the extension more useful.

These are some tasks that need to be completed (additional suggestions welcome):

  • Support triggers
  • Support event triggers
  • Support inline code execution, a.k.a DO statement
  • Support input parameters as arrays
  • Cache procedural language code instead of looking it up every time

Some of the tasks here may overlap and can also be implemented with plsample (see plsample idea on this wiki).

Skills needed
  • C
  • Julia
  • SQL
  • PostgreSQL
Difficulty Level

The tasks in this project requires C programming and knowledge on how to generate Julia code. Understanding SQL is needed in order to write regression tests but can be learned during the project as its use is limited.

Some knowledge of PostgreSQL administration is needed for setting up a database to use the extension and running regression tests.

Potential Mentors
  • Mark Wong
  • Gabrielle Roth
  • Fabrízio de Royes Mello, former GSoC student.
Expected Outcomes

A feature complete extension is likely beyond the scope this year's program but there are individual tasks that can be completed and get us closer to that end.

References

[1] Julia programming language: https://julialang.org/

[2] pl/julia code: https://gitlab.com/pljulia/pljulia

[3] Blog (1 of 5) about pl/julia extension: https://www.2ndquadrant.com/en/blog/creating-a-postgresql-procedural-language-part-1-setup/


Make plsample a more complete procedural language handler example (2021)

Project Description

plsample is a sample procedural language example in the PostgreSQL source code. It is intended to serve at least two purposes: to provide a template for others to build their own procedural language handler, and to help test the PostgreSQL code.

These are some tasks that need to be completed (additional suggestions welcome):

  • Support triggers
  • Support event triggers
  • Support inline code execution, a.k.a DO statement
Skills needed
  • C
  • PostgreSQL
Difficulty Level

The tasks in this project requires C programming. Understanding SQL is needed in order to write regression tests but can be learned during the project as its use is limited.

Some knowledge of PostgreSQL administration is needed for setting up a database to use the extension and running regression tests.

Potential Mentors
  • Mark Wong
  • Gabrielle Roth
  • Fabrízio de Royes Mello, former GSoC student.
Expected Outcomes

A feature complete extension is likely beyond the scope this year's program but there are individual tasks that can be completed and get us closer to that end. There may be some overlap with improving pl/julia.

References

[1] src/test/modules/plsample: See at https://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/test/modules/plsample;h=ca0363ef94d3ff0c305df3340b43014a08b23118;hb=HEAD or https://github.com/postgres/postgres/tree/master/src/test/modules/plsample

WAL-G: optimize full backups using object-storage remote-copy APIs(2021)

Project Description

Certain cloud providers have object-storage remote copy support, where a bucket object can be copied from one bucket-key to another without any bits actually moving around (basically like a clone or hard-link, done within the object store.)

wal-g backup-push could be extended (for these object-storage providers) to create full backups based on previous full or incremental backups, by:

  • finding the set of changed files from the base, as if an incremental backup were being created
  • for any file that exists unchanged in the base, use a remote-copy to clone it from the old bucket-key to the new bucket-key
  • for any file that has changed, upload it as normal
Skills needed
  • Go
  • PostgreSQL
Difficulty Level

The tasks in this project require Go programming. Some knowledge of PostgreSQL administration is needed to setup backups with PITR.

Potential Mentors
  • Daniil Zakhlystov
  • Andrey Borodin
Expected Outcomes

Faster full backups where databases have a lot of static files.


pgagroal: Metrics and monitoring (2021)

Project Description

pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

This project aims to enhance the current collected metrics by adding new data points that provides a deeper insight to the traffic flow through the connection pool.

To provide a better out-of-the-box experience for end-users a sample Grafana dashboard should be created that provides all key metrics in an easy to understand way.

These are some tasks that need to be completed (additional suggestions welcome):

  • Investigate and analyze possible data points
  • Create configuration property for gatering these new metrics
  • Expose new metrics in the Prometheus interface
  • Create Grafana dashboards
Skills needed
  • C
  • PostgreSQL
  • Prometheus
  • Grafana
Difficulty Level

The tasks in this project requires C programming skills to enhance metrics gathering and exposure through the Prometheus interface.

Any candidate should also be able to work with a Grafana setup in order to create the dashboards.

This work will require some knowledge of the PostgreSQL protocol as well as basic administration.

Potential Mentors
  • Jesper Pedersen
Expected Outcomes
  • Improve the metrics gathered, if enabled, and expose them through the Prometheus interface.
  • Create a sample Grafana dashboard that exposes these metrics.
References

[1] https://agroal.github.io/pgagroal/

[2] https://github.com/agroal/pgagroal

[3] https://github.com/agroal/pgagroal/blob/master/src/libpgagroal/prometheus.c

[4] https://github.com/agroal/pgagroal/blob/master/src/libpgagroal/pipeline_session.c

[5] https://prometheus.io/

[6] https://grafana.com/oss/


pgagroal: SCRAM-SHA-256-PLUS support (2021)

Project Description

pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

This project aims to add SCRAM-SHA-256-PLUS support in order to enhance security when communicating with the connection pool.

These are some tasks that need to be completed (additional suggestions welcome):

  • Study the relevant specifications
  • Implement SCRAM-SHA-256-PLUS support
  • Optional: Implement SCRAM-SHA-256-PLUS between pgagroal and PostgreSQL
Skills needed
  • C
  • PostgreSQL
  • Transport Layer Security
  • OpenSSL
Difficulty Level

The tasks in this project requires C programming skills, knowledge about Transport Layer Security (TLS) and the OpenSSL API.

This work will require some knowledge about basic PostgreSQL administration.

Potential Mentors
  • Jesper Pedersen
Expected Outcomes
  • Implement SCRAM-SHA-256-PLUS support
  • Optional: Implement SCRAM-SHA-256-PLUS between pgagroal and PostgreSQL
References

[1] https://agroal.github.io/pgagroal/

[2] https://github.com/agroal/pgagroal

[3] https://github.com/agroal/pgagroal/blob/master/src/libpgagroal/security.c

[4] https://tools.ietf.org/html/rfc5802

[5] https://tools.ietf.org/html/rfc7677

[6] https://tools.ietf.org/html/rfc5246

[7] https://www.openssl.org/


Database Load Stress Benchmark

Project Description

This project involves developing a new benchmarking stress tool for testing PostgreSQL's durability and resilience. This will help database engineers find the limits of their current server and database configuration.

Scripts need to be developed to:

  • load data to a database
  • create transactional load with read and write queries (INSERTs, UPDATEs, and DELETEs)
  • perform additional maintenance operations such as creating indexes and vacuuming

The Touchstone project provides C and Rust code for generating random data such as strings, numbers and dates. If time allows, it would be useful to also generate random data for other data types PosgreSQL supports, such as arrays and JSON.

Some aspects we may want to control include:

1. The amount of data to generate

2. The number of tables to generate

3. The duration of the test

4. Which additional SQL statements to run (e.g. SELECT, INSERT, UPDATE, DELETE) and how many users to run them

Skills needed
  • C or Rust
  • PostgreSQL
  • Shell scripting
Difficulty Level

The tasks in this project requires C or Rust programming skills, and shell scripting.

This work will require some knowledge about basic PostgreSQL administration.

Potential Mentors
  • Mark Wong
  • Gabrielle Roth
Expected Outcomes
  • Implement a test that will load a PostgreSQL database with random data.
  • Benchmark the data loading for different data sizes and data types.
  • Create a comparable test scenario for I/O performance measurements.
References

[1] https://gitlab.com/touchstone/touchstone