GSoC 2023

From PostgreSQL wiki
Jump to navigationJump to search
This page is for collecting ideas for Google Summer of Code 2023 projects. This page is for IDEAS ONLY.
IF YOU ARE A CONTRIBUTOR: there is a top-level GSoC page for PostgreSQL here: PostgreSQL General GSoC Page - please read this first before proceeding to contact mentors! Contribution guidelines, channels and communication methods are in this page too. PLEASE make sure you have read everything thoroughly.

If you are a mentor and would like to participate, feel free to edit this page and put your name below.

Mentors mailing list for proposals: gsoc2023-mentors@lists.postgresql.org

Mentors

The following individuals have been listed as possible mentors on the below projects, and/or offered to be mentors for student-proposed projects:

  • Stephen Frost
  • Ilaria Battiston
  • Jesper Pedersen
  • Mariam Fahmy
  • Chapman Flack
  • Andreas 'ads' Scherbaum
  • Gurjeet
  • David E. Wheeler
  • Jimmy Angelakos
  • Mark Wong
  • Dave Cramer
  • Federico Campoli

Develop Performance Farm Benchmarks and Website (2023)

Project Description

The PostgreSQL Performance Farm project is a community project to collect and review performance data from tests as code changes are made to PostgreSQL, working on Linux and OSX. The infrastructure uses a Django API and a Python script to run benchmarks and collect analytics data from PostgreSQL instances and hosts. Right now, the infrastructure is fully functional, however it could benefit from bug fixes and some additional features. Specifically, these are some ideas which could be implemented:

  • Generation and collection of EXPLAIN query plans
  • Add custom queries
  • Integrate profiling tools

This list can eventually be subject to changes, and is open to students' suggestions. Current code is at: https://github.com/PGPerfFarm/pgperffarm_server and https://github.com/PGPerfFarm/pgperffarm. Mentors are either past developers or past GSoC mentors for the same project.

As an example, the PostgreSQL Build Farm site [1] is a central repository for the results of testing source code changes for PostgreSQL as they occur, on a wide variety of platforms.

Skills needed
  • Python programming (both scripting and Django)
  • Web development (plain Javascript, HTML, CSS)
  • Understanding of PostgreSQL benchmarks and website infrastructure
Difficulty Level

This project should not be particularly challenging having basic full-stack development and Python skills.

Project Size

Medium sized project (175 hours).

Potential Mentors
  • Stephen Frost
  • Ilaria Battiston

Please do not email us, rather find us on Slack after reading carefully the general GSoC information page higlighted above.

Expected Outcomes
  • An expanded infrastructure where clients can upload test results which can be hosted on PostgreSQL infrastructure and users can search and review uploaded results.
References

[1] https://buildfarm.postgresql.org/


Creating a pgweb testing harness (2023)

Project Description

pgweb [1] is the official repository containing code of the website [http://www.postgresql.org/ ]. The infrastructure is deployed and maintained by a team of developers, however lacks of a testing suite to check for a correct functioning after each modification. The team could benefit of a framework to be integrated in the current CI pipeline, to support both tests for the current code and the addition of new tests for any upcoming new code.

What to be tested specifically is to be defined and will be part of the GSoC project, but there are some desired features:

  • automatic testing of existing code:
    • proper functionality of links, forms, cookies
    • browser compatibility and performance
    • accessibility
    • integration tests
  • possibility to easily implement new tests
  • other things can be suggested by students

The testing suite should be a standalone module, to eventually be integrated in CI pipeline after development. Since the web framework of pgweb is in Django and Python, these would be the preferred tools. Furthermore, a student should know a bit of web development and testing to be able to properly define requirements and implement them.

Skills needed
  • Django
  • Python
  • CI/CD pipelines
  • Web development and testing
  • HTML, CSS, Bootstrap
  • Javascript, jQuery
Difficulty Level

Medium.

Project Size

Medium sized project (175 hours).

Potential Mentors
  • Ilaria Battiston
  • Stephen Frost

Please do not email us, rather find us on Slack after reading carefully the general GSoC information page higlighted above.


Expected Outcomes
  • We expect the student to define a set of features and requirements to test, to then implement these tests and be able to run them for the existing codebase, automatically notifying errors on a periodic basis.
References

[1] https://github.com/postgres/pgweb



pgagroal: Query cache

Project Description

pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

This project aims to add a query cache to the project such that data can be reused in future queries.

Skills needed
  • C
  • PostgreSQL
Difficulty Level

Hard.

The tasks in this project requires C programming skills.

This work will require some knowledge about basic PostgreSQL administration.

Project Size

Large (350 hours).

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Have a query cache in either shared or per-connection mode.
References

[1] https://agroal.github.io/pgagroal/

[2] https://github.com/agroal/pgagroal

[3] https://www.postgresql.org/


pgagroal: Simple vault

Project Description

pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

This project aims to add a simple dynamic vault implementation to recycle passwords for users.

Skills needed
  • C
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires C programming skills.

This work will require some knowledge about basic PostgreSQL administration.

Project Size

Large (350 hours).

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Have a simple dynamic vault
References

[1] https://agroal.github.io/pgagroal/

[2] https://github.com/agroal/pgagroal

[3] https://www.postgresql.org/


pgexporter: Improve version support

Project Description

pgexporter is a Prometheus solution for PostgreSQL.

This project aims to improve the support of different PostgreSQL versions in pgexporter.

Skills needed
  • C
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires C programming skills.

Project Size

Medium (175 hours).

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Create infrastructure in pgexporter to support different PostgreSQL versions out of the box.
  • Create default YAML files for PostgreSQL 11 to PostgreSQL 16
References

[1] https://pgexporter.github.io/

[2] https://github.com/pgexporter/pgexporter

[3] https://www.postgresql.org/


pgmoneta: Native backup

Project Description

pgmoneta is a backup / restore solution for PostgreSQL.

This project aims to create a native backup solution and thereby replace the current usage of pg_basebackup.

Skills needed
  • C
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires C programming skills.

Project Size

Large (350 hours).

Potential Mentors
  • Jesper Pedersen

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Replace usage of pg_basebackup inside pgmoneta with a native solution
References

[1] https://pgmoneta.github.io/

[2] https://github.com/pgmoneta/pgmoneta

[3] https://www.postgresql.org/


pgmoneta: Monitoring

Project Description

pgmoneta is a backup / restore solution for PostgreSQL.

This project aims to improve monitoring of pgmoneta by creating a Grafana dashboard as well as improving the Prometheus metrics.

Skills needed
  • C
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires C programming skills.

Project Size

Medium (175 hours).

Potential Mentors
  • Jesper Pedersen
  • Mariam Fahmy

Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.

Expected Outcomes
  • Create a Grafana dashboard
  • Improve the project with new Prometheus metrics
References

[1] https://pgmoneta.github.io/

[2] https://github.com/pgmoneta/pgmoneta

[3] https://www.postgresql.org/

[4] https://grafana.com/oss/

[5] https://github.com/grafana/grafana

[6] https://prometheus.io/

[7] https://github.com/prometheus/prometheus


pg_adviser / index_adviser: Recommend Potentially Useful Indexes

Project Description

Index Adviser is an extension for Postgres that processes the workload and recommends indexes that may help improve performance.

  • Low-hanging fruit: Improve Postgres Version Support
  • Exploratory: Support more index types: GiST, GIN, etc.
Skills needed
  • C
  • Postgres
Difficulty Level
  • Low (for Low-hanging fruit: Improve Postgres Version Support)
  • Medium (for Exploratory: Support more index types: GiST, GIN, etc.)

The tasks in this project requires C programming skills.

Project Size
  • Medium (for Low-hanging fruit: Improve Postgres Version Support)
  • Large (for Exploratory: Support more index types: GiST, GIN, etc.)
Potential Mentors
  • Gurjeet

Reach out to gurjeet@{singh.im,amazon.com} for a review of your proposal.

Expected Outcomes
  • Support all active versions of Postgres (for Low-hanging fruit: Improve Postgres Version Support)
  • Support at least one new Index type recommendation (for Exploratory: Support more index types: GiST, GIN, etc.)
References

[1] https://github.com/DrPostgres/pg_adviser/blob/master/index_adviser/README.txt

[2] https://www.postgresql.org/


Add various UI components to PGXN::Manager

Skills needed
  • Perl
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires Perl programming skills.

Project Size

Medium (175 hours) ???

Potential Mentors
  • David E. Wheeler
  • Andreas Scherbaum

Reach out to david (plus) pgxn (at) justatheory (dot) com and ads (at) pgug (dot) de for a review of your proposal.


PGXN: Add a hook to trigger other things on release; related features

Skills needed
  • Perl
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires Perl programming skills.

Project Size

Medium (175 hours) ???

Potential Mentors
  • David E. Wheeler
  • Andreas Scherbaum

Reach out to david (plus) pgxn (at) justatheory (dot) com and ads (at) pgug (dot) de for a review of your proposal.


PGXN: Various release management and system features

Skills needed
  • Perl
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires Perl programming skills.

Project Size

Medium (175 hours) ???

Potential Mentors
  • David E. Wheeler
  • Andreas Scherbaum

Reach out to david (plus) pgxn (at) justatheory (dot) com and ads (at) pgug (dot) de for a review of your proposal.


PGXN: API

PGXN API is where the full text indexing happens and the search APIs live. They can be improved. Full text indexing is built on Apache Lucy, which is super capable and efficient, with no external data store required, but has been retired. The search could use a whole rethink, probably using a single search field for everything and then classifying results as users, extensions, release, etc. in the search results. The drop down menu separating them was a failure.

A useful SoC project replaces the search implementation with something else, perhaps tsearch, since this is all about Postgres, but it doesn’t really matter. This would be a pretty nice, highly visible project. The relevant bits of that infrastructure are:

On the other hand, some folks might prefer to do away with PGXN altogether and build something more like pkg.go.dev, which automatically indexes go packages on GitHub with no need to make a separate bundle and release. docs.rs takes a similar approach.

Skills needed
  • Perl
  • PostgreSQL
Difficulty Level

Medium.

The tasks in this project requires Perl programming skills.

Project Size

Largs (??? hours) ???

Potential Mentors
  • David E. Wheeler
  • Andreas Scherbaum

Reach out to david (plus) pgxn (at) justatheory (dot) com and ads (at) pgug (dot) de for a review of your proposal.


PGEU Conference System

The PGEU Conference System has a number of open feature requests. A few are labeled as good first issue.

Students can select features and build a proposal which of the open requests to implement in the available time. The duration of the GSoC project can be discussed with the mentors.

Skills needed
  • Django
  • Python
  • PostgreSQL
Difficulty Level

Adjustable

The tasks in this project require Python and Django programming skills.

Project Size

Adjustable

Potential Mentors
  • Andreas Scherbaum

Reach out to ads (at) pgug (dot) de for a review of your proposals.


PL/Java refactoring for support of other JVM and polyglot languages

Project Description

PL/Java is maintained externally but referred to in PostgreSQL's SQL Conformance statement as the component that provides Java routines and types to meet part 13 of the SQL spec. PostgreSQL is extensible in many ways beyond what that spec provides, such as with new operators and aggregates, and PL/Java can be used to build those too. It is implemented mostly in Java and C.

Having supported only the Java language since 2004, PL/Java is getting a thorough refactoring to make it a common base for building PostgreSQL server-side PLs using any of the current large ecosystem of languages that run on the JVM or with the polyglot features of GraalVM. A contributor on this project will be shaping a PL/Java that looks forward to its next decade or two.

These are some tasks that need to be completed:

  • Implement data type adapters for PostgreSQL types not yet covered.
  • Develop regression tests for data type adapters.
  • Complete the implementation of isSubtype and use to implement an adapter manager as described in the pull request.
  • Finish the Java-to-PostgreSQL side of APIs (Adapter, Datum.Accessor, TupleTableSlot) that currently only implement PostgreSQL-to-Java.

A more complete list of tasks can be found in pull request 399. An interested contributor should become familiar with that pull request and with building PL/Java from the associated branch. A project proposal may include a combination of items from that list, and additional ideas are also welcome.

Skills needed

  • Primarily Java
  • Familiarity with C and acquaintance with the Java Native Interface (JNI)
  • Some familiarity with PostgreSQL internals, such as experience creating some simple C-language PostgreSQL extension.

Historically, hacking on PL/Java has been C- and JNI-heavy, but this refactoring aims to move much more of the interesting stuff to pure Java, so contributors with primarily Java experience can make substantial contributions. Some familiarity with C and JNI may still be helpful for looking underneath to see what is going on.

  • Ideally, a favorite JVM or GraalVM language other than Java, to use in the final "expected outcome" below.

Difficulty level

Moderately easy to moderately hard, depending on the items included.

Project size

Medium (175 hours) or large (350 hours) depending on the proposed deliverables.

Mentors

  • Chapman Flack can mentor. Chap is the current PL/Java maintainer.

Expected outcomes

  • Accepted pull requests in the PL/Java GitHub repository for the items proposed.
  • For your favorite JVM or GraalVM programming language Foo, other than Java, an implementation on GitHub of a simple PostgreSQL PL/Foo (with perhaps limited functionality), implemented with this branch of PL/Java.


pg_statviz: time series analysis & visualization of statistics

Project Description

pg_statviz is a brand new extension and utility pair for snapshotting Postgres cumulative & dynamic statistics, performing time series analysis & creating visualizations from them.

It's a (PL/pg)SQL extension and Python command line utility pair, fully modular. Code is currently at "feature-complete alpha" maturity level.

It needs:

  • Additional modules for stats to monitor (such as WAL replication, I/O, tables)
  • Writing of Python regression tests
  • Packaging for PGDG repositories and Linux distributions

Skills needed

  • SQL
  • PL/pgSQL
  • Python
  • optionally Matplotlib

Difficulty level

Moderately easy to moderately hard, depending on the items included.

Project size

Medium or large depending on the proposed deliverables.

Mentors

  • Jimmy Angelakos (author of pg_statviz)

Expected outcomes

Postgres extension tutorial / quick start

Project Description

How to write an extension for Postgres is a total mystery for the uninitiated. The documentation desperately needs an easy-to-follow tutorial with clear examples because the barrier for entry is prohibitive for new contributors. There are various third-party blog posts focusing on different areas, and sometimes contradictory.

The tutorial should be a quick start guide that assumes only knowledge of Postgres and the target language:

  • Prerequisites for writing an extension
  • How to start writing an extension
  • How to write a Makefile and what it should do
  • Do I need PGXS? How does it work?
  • Do I need PGXN? How does it work?
  • How to use Postgres Procedural Languages for the extension
  • How to use languages external to Postgres for the extension
  • How to write regression tests for the extension
  • Extension release management and upgradability

Skills needed

  • Postgres extension system
  • Postgres documentation system

Difficulty level

Moderately easy

Project size

Large because of the amount of content to be produced

Mentors

  • Jimmy Angelakos
  • ?

Expected outcomes

Improving PostgreSQL EXPLAIN output

Project Description

One of the essential parts of PostgreSQL performance is the query planner and understanding what plans are generated and how. There's a lot which goes into the query planner and there is a lot of information returned by EXPLAIN, but there's still information which could be added that would be informative to users of PostgreSQL. While additional ideas are encouraged, here are two specific ones recently requested by PostgreSQL users:

  • Add TOAST information to EXPLAIN ANALYZE
    • EXPLAIN ANALYZE currently doesn't perform any de-TOASTing and doesn't provide any information about what the cost would be to return TOAST'd values
    • The first part of this sub-project would involve adding just a simple count of how many values would have to be de-TOAST'd
    • A next step would be to include size information about how large the TOAST'd values are that would be returned
  • Report details about how query plan cost components are used to make the final cost value for a node
    • A lot of different variables go into query plan cost. These include seq_page_cost, random_page_cost, cpu_tuple_cost, cpu_operator_cost, and more.
    • Currently, PostgreSQL reports just two 'cost' values from EXPLAIN for each node- a total cost to run the node, and the cost to return the first tuple.
    • Being able to see how PostgreSQL arrived at the query cost for a given node could be very helpful when considering how to adjust those parameters to encourage PostgreSQL to use a different plan
    • This sub-project would involve changing EXPLAIN to keep track of the query components separately to report them out independently in the EXPLAIN output

Each of these would be new parameters to EXPLAIN, similar to 'BUFFERS' and others.

Skills needed

  • C
  • SQL

Difficulty level

Moderately easy

Project size

Small to Medium depending on the resulting set of improvements to EXPLAIN

Mentors

  • Stephen Frost
  • ?

Expected outcomes

  • Patches posted to the PostgreSQL -hackers mailing list with these improvements
  • Iterations with the PostgreSQL community to update the patches with their feedback

DBT-3 Benchmark Kit Development (2023)

Project Description

This project involves updating DBT-3, a fair use benchmarking kit based on the TPC Benchmark(TM) H.

The goal is to make the kit more portable with a combination of create an AppImage of the kit, and rewriting some of the R scripts into language with an engine that can also be distributed with an AppImage. The R scripts do a combination of statistical calculations and plotting.

Skills needed

  • POSIX shell scripting
  • Julia
  • PostgreSQL

Difficulty level

Easy to Moderately hard.

Project size

Medium.

Mentors

  • Mark Wong can mentor. Mark is a major contributor, has experience writing benchmarking kits, and is maintaining this kit.

Expected outcomes

  • Learn to use CMake, Julia, and containers.
  • Scripts and configuration needed to create an AppImage.
  • Additional Julia scripts to calculate statistics and plot chargs.

DBT-5 Stored Procedure Development (2023)

Project Description

This project involves updating DBT-5, a fair use benchmarking kit based on the TPC Benchmark(TM) E. The server side business logic was developed at a time before stored procedures and sub-transaction support was available for PostgreSQL.

The goal is to update the current server side business logic, up to 12 transactions, to follow the benchmark specification to use stored procedures and sub-transaction as described by the specification.

Skills needed

  • C (time permitting)
  • C++
  • pl/PGSQL
  • PostgreSQL

Difficulty level

Moderately hard, depending on proficiency with C coding.

Project size

Large.

Mentors

  • Mark Wong can mentor. Mark is a major contributor, has experience writing benchmarking kits, and mentored the original GSOC project that developed this kit.

Expected outcomes

  • Review of the 12 transaction to determine which should be converted to procedures and updated with appropriate sub-transaction handling
  • Updated pl/PGSQL code for all transactions that need to be updated.
  • The C++ application code will need to be updated in conjunction with updates to the server side business logic.
  • Time permitting, updated C code for all transactions that need to be updated.

PostgreSQL JDBC Struct/Array Support (2023)

Project Description

The JDBC API has:


Struct createStruct(String typeName, Object[] attributes) throws SQLException

Factory method for creating Struct objects. Parameters: typeName - the SQL type name of the SQL structured type that this Struct object maps to. The typeName is the name of a user-defined type that has been defined for this database. It is the value returned by Struct.getSQLTypeName. attributes - the attributes that populate the returned object


Currently pgjdbc does not support this. The goal would be to implement the function

Skills needed

  • JAVA
  • JDBC
  • PostgreSQL

Difficulty level

Moderately hard, depending on proficiency with JAVA coding.

Project size

Medium.

Mentors

  • Dave Cramer can mentor. Dave is a major contributor, has experience writing JAVA and JDBC.

Expected outcomes

  • Working implementation of createStruct

GUI representation of monitoring System Activity with the system_stats Extension in pgAdmin 4 (2023)

Project Description

This project involves design and developing dashboard graphs/charts in pgAdmin 4 which shows the system level statistics for monitoring Postgres activity with the help of the system_stats extension. The system_stats extension is a library of stored procedures that allow users to access system-level statistics for monitoring Postgres activity. These procedures reveal a variety of system metrics to the database server.

Skills Needed

  • Python/Flask
  • Javascript
  • React JS
  • CSS

Difficulty level

  • Medium

Project Size

  • Medium (175 hours).

Potential Mentors

  • Khushboo Vashi

Expected Outcomes

  • To have a GUI representation (charts/graphs) of the system level statistics on Dashboards tab in pgAdmin 4

pg_chameleon sql library

Project Description

pg_chameleon can translate in the PostgreSQL dialect any DDL generated by MySQL. The library was written several years ago and it works generally well.

However a complete rewriting is necessary for cleaning up the weird stuff and improve the maintenance of the library.


github project: pg_chameleon on github

  • Python
  • regexp
  • PostgreSQL
  • MySQL

Difficulty level

Medium to hard, depending on proficiency with Python coding and regexp writing.

Project size

Small to Medium

Mentors

  • Federico Campoli

Reach out to thedoctor (at) pgdba (dot) org for your proposal