GSoC 2023
This page is for collecting ideas for Google Summer of Code 2023 projects. This page is for IDEAS ONLY.
IF YOU ARE A CONTRIBUTOR: there is a top-level GSoC page for PostgreSQL here: PostgreSQL General GSoC Page - please read this first before proceeding to contact mentors! Contribution guidelines, channels and communication methods are in this page too. PLEASE make sure you have read everything thoroughly.
If you are a mentor and would like to participate, feel free to edit this page and put your name below.
Mentors mailing list for proposals: gsoc2023-mentors@lists.postgresql.org
Mentors
The following individuals have been listed as possible mentors on the below projects, and/or offered to be mentors for student-proposed projects:
- Stephen Frost
- Ilaria Battiston
- Jesper Pedersen
- Mariam Fahmy
- Chapman Flack
- Andreas 'ads' Scherbaum
- Gurjeet
- David E. Wheeler
- Jimmy Angelakos
- Mark Wong
- Dave Cramer
- Federico Campoli
Develop Performance Farm Benchmarks and Website (2023)
Project Description
The PostgreSQL Performance Farm project is a community project to collect and review performance data from tests as code changes are made to PostgreSQL, working on Linux and OSX. The infrastructure uses a Django API and a Python script to run benchmarks and collect analytics data from PostgreSQL instances and hosts. Right now, the infrastructure is fully functional, however it could benefit from bug fixes and some additional features. Specifically, these are some ideas which could be implemented:
- Generation and collection of EXPLAIN query plans
- Add custom queries
- Integrate profiling tools
This list can eventually be subject to changes, and is open to students' suggestions. Current code is at: https://github.com/PGPerfFarm/pgperffarm_server and https://github.com/PGPerfFarm/pgperffarm. Mentors are either past developers or past GSoC mentors for the same project.
As an example, the PostgreSQL Build Farm site [1] is a central repository for the results of testing source code changes for PostgreSQL as they occur, on a wide variety of platforms.
Skills needed
- Python programming (both scripting and Django)
- Web development (plain Javascript, HTML, CSS)
- Understanding of PostgreSQL benchmarks and website infrastructure
Difficulty Level
This project should not be particularly challenging having basic full-stack development and Python skills.
Project Size
Medium sized project (175 hours).
Potential Mentors
- Stephen Frost
- Ilaria Battiston
Please do not email us, rather find us on Slack after reading carefully the general GSoC information page higlighted above.
Expected Outcomes
- An expanded infrastructure where clients can upload test results which can be hosted on PostgreSQL infrastructure and users can search and review uploaded results.
References
[1] https://buildfarm.postgresql.org/
Creating a pgweb testing harness (2023)
Project Description
pgweb [1] is the official repository containing code of the website [http://www.postgresql.org/ ]. The infrastructure is deployed and maintained by a team of developers, however lacks of a testing suite to check for a correct functioning after each modification. The team could benefit of a framework to be integrated in the current CI pipeline, to support both tests for the current code and the addition of new tests for any upcoming new code.
What to be tested specifically is to be defined and will be part of the GSoC project, but there are some desired features:
- automatic testing of existing code:
- proper functionality of links, forms, cookies
- browser compatibility and performance
- accessibility
- integration tests
- possibility to easily implement new tests
- other things can be suggested by students
The testing suite should be a standalone module, to eventually be integrated in CI pipeline after development. Since the web framework of pgweb is in Django and Python, these would be the preferred tools. Furthermore, a student should know a bit of web development and testing to be able to properly define requirements and implement them.
Skills needed
- Django
- Python
- CI/CD pipelines
- Web development and testing
- HTML, CSS, Bootstrap
- Javascript, jQuery
Difficulty Level
Medium.
Project Size
Medium sized project (175 hours).
Potential Mentors
- Ilaria Battiston
- Stephen Frost
Please do not email us, rather find us on Slack after reading carefully the general GSoC information page higlighted above.
Expected Outcomes
- We expect the student to define a set of features and requirements to test, to then implement these tests and be able to run them for the existing codebase, automatically notifying errors on a periodic basis.
References
[1] https://github.com/postgres/pgweb
pgagroal: Query cache
Project Description
pgagroal is a high-performance protocol-native connection pool for PostgreSQL.
This project aims to add a query cache to the project such that data can be reused in future queries.
Skills needed
- C
- PostgreSQL
Difficulty Level
Hard.
The tasks in this project requires C programming skills.
This work will require some knowledge about basic PostgreSQL administration.
Project Size
Large (350 hours).
Potential Mentors
- Jesper Pedersen
Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.
Expected Outcomes
- Have a query cache in either shared or per-connection mode.
References
[1] https://agroal.github.io/pgagroal/
[2] https://github.com/agroal/pgagroal
[3] https://www.postgresql.org/
pgagroal: Simple vault
Project Description
pgagroal is a high-performance protocol-native connection pool for PostgreSQL.
This project aims to add a simple dynamic vault implementation to recycle passwords for users.
Skills needed
- C
- PostgreSQL
Difficulty Level
Medium.
The tasks in this project requires C programming skills.
This work will require some knowledge about basic PostgreSQL administration.
Project Size
Large (350 hours).
Potential Mentors
- Jesper Pedersen
Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.
Expected Outcomes
- Have a simple dynamic vault
References
[1] https://agroal.github.io/pgagroal/
[2] https://github.com/agroal/pgagroal
[3] https://www.postgresql.org/
pgexporter: Improve version support
Project Description
pgexporter is a Prometheus solution for PostgreSQL.
This project aims to improve the support of different PostgreSQL versions in pgexporter.
Skills needed
- C
- PostgreSQL
Difficulty Level
Medium.
The tasks in this project requires C programming skills.
Project Size
Medium (175 hours).
Potential Mentors
- Jesper Pedersen
Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.
Expected Outcomes
- Create infrastructure in pgexporter to support different PostgreSQL versions out of the box.
- Create default YAML files for PostgreSQL 11 to PostgreSQL 16
References
[1] https://pgexporter.github.io/
[2] https://github.com/pgexporter/pgexporter
[3] https://www.postgresql.org/
pgmoneta: Native backup
Project Description
pgmoneta is a backup / restore solution for PostgreSQL.
This project aims to create a native backup solution and thereby replace the current usage of pg_basebackup.
Skills needed
- C
- PostgreSQL
Difficulty Level
Medium.
The tasks in this project requires C programming skills.
Project Size
Large (350 hours).
Potential Mentors
- Jesper Pedersen
Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.
Expected Outcomes
- Replace usage of pg_basebackup inside pgmoneta with a native solution
References
[1] https://pgmoneta.github.io/
[2] https://github.com/pgmoneta/pgmoneta
[3] https://www.postgresql.org/
pgmoneta: Monitoring
Project Description
pgmoneta is a backup / restore solution for PostgreSQL.
This project aims to improve monitoring of pgmoneta by creating a Grafana dashboard as well as improving the Prometheus metrics.
Skills needed
- C
- PostgreSQL
Difficulty Level
Medium.
The tasks in this project requires C programming skills.
Project Size
Medium (175 hours).
Potential Mentors
- Jesper Pedersen
- Mariam Fahmy
Reach out to jesper (dot) pedersen (at) redhat (dot) com for a review of your proposal.
Expected Outcomes
- Create a Grafana dashboard
- Improve the project with new Prometheus metrics
References
[1] https://pgmoneta.github.io/
[2] https://github.com/pgmoneta/pgmoneta
[3] https://www.postgresql.org/
[5] https://github.com/grafana/grafana
[7] https://github.com/prometheus/prometheus
pg_adviser / index_adviser: Recommend Potentially Useful Indexes
Project Description
Index Adviser is an extension for Postgres that processes the workload and recommends indexes that may help improve performance.
- Low-hanging fruit: Improve Postgres Version Support
- Exploratory: Support more index types: GiST, GIN, etc.
Skills needed
- C
- Postgres
Difficulty Level
- Low (for Low-hanging fruit: Improve Postgres Version Support)
- Medium (for Exploratory: Support more index types: GiST, GIN, etc.)
The tasks in this project requires C programming skills.
Project Size
- Medium (for Low-hanging fruit: Improve Postgres Version Support)
- Large (for Exploratory: Support more index types: GiST, GIN, etc.)
Potential Mentors
- Gurjeet
Reach out to gurjeet@{singh.im,amazon.com} for a review of your proposal.
Expected Outcomes
- Support all active versions of Postgres (for Low-hanging fruit: Improve Postgres Version Support)
- Support at least one new Index type recommendation (for Exploratory: Support more index types: GiST, GIN, etc.)
References
[1] https://github.com/DrPostgres/pg_adviser/blob/master/index_adviser/README.txt
[2] https://www.postgresql.org/
Add various UI components to PGXN::Manager
- Add Co-Ownership UI
- Add a User Admin Interface
- Add Ownership Admin Interface
- Add UI For Maintenance Tasks
- Add UI to Reindex Distributions
Skills needed
- Perl
- PostgreSQL
Difficulty Level
Medium.
The tasks in this project requires Perl programming skills.
Project Size
Medium (175 hours) ???
Potential Mentors
- David E. Wheeler
- Andreas Scherbaum
Reach out to david (plus) pgxn (at) justatheory (dot) com and ads (at) pgug (dot) de for a review of your proposal.
- Add Release Callback Interface
- Add “Instant Mirroring” Support
- Add Hook to Update a Pex Repository
- Add Non-Stable Release Status to Tweets
- Add a “Recent Releases” Feed
- Add triggers to post to Mastodon, other social media
Skills needed
- Perl
- PostgreSQL
Difficulty Level
Medium.
The tasks in this project requires Perl programming skills.
Project Size
Medium (175 hours) ???
Potential Mentors
- David E. Wheeler
- Andreas Scherbaum
Reach out to david (plus) pgxn (at) justatheory (dot) com and ads (at) pgug (dot) de for a review of your proposal.
PGXN: Various release management and system features
- Allow Distributions to be Deleted?
- Add Pluggable Auth
- Add Badges to the META.json
- Allow Previous Owner When Reindexing a Distribution
- List only Latest release of each Extension in the Recent Releases pane
Skills needed
- Perl
- PostgreSQL
Difficulty Level
Medium.
The tasks in this project requires Perl programming skills.
Project Size
Medium (175 hours) ???
Potential Mentors
- David E. Wheeler
- Andreas Scherbaum
Reach out to david (plus) pgxn (at) justatheory (dot) com and ads (at) pgug (dot) de for a review of your proposal.
PGXN: API
PGXN API is where the full text indexing happens and the search APIs live. They can be improved. Full text indexing is built on Apache Lucy, which is super capable and efficient, with no external data store required, but has been retired. The search could use a whole rethink, probably using a single search field for everything and then classifying results as users, extensions, release, etc. in the search results. The drop down menu separating them was a failure.
A useful SoC project replaces the search implementation with something else, perhaps tsearch, since this is all about Postgres, but it doesn’t really matter. This would be a pretty nice, highly visible project. The relevant bits of that infrastructure are:
On the other hand, some folks might prefer to do away with PGXN altogether and build something more like pkg.go.dev, which automatically indexes go packages on GitHub with no need to make a separate bundle and release. docs.rs takes a similar approach.
Skills needed
- Perl
- PostgreSQL
Difficulty Level
Medium.
The tasks in this project requires Perl programming skills.
Project Size
Largs (??? hours) ???
Potential Mentors
- David E. Wheeler
- Andreas Scherbaum
Reach out to david (plus) pgxn (at) justatheory (dot) com and ads (at) pgug (dot) de for a review of your proposal.
PGEU Conference System
The PGEU Conference System has a number of open feature requests. A few are labeled as good first issue.
Students can select features and build a proposal which of the open requests to implement in the available time. The duration of the GSoC project can be discussed with the mentors.
Skills needed
- Django
- Python
- PostgreSQL
Difficulty Level
Adjustable
The tasks in this project require Python and Django programming skills.
Project Size
Adjustable
Potential Mentors
- Andreas Scherbaum
Reach out to ads (at) pgug (dot) de for a review of your proposals.
PL/Java refactoring for support of other JVM and polyglot languages
Project Description
PL/Java is maintained externally but referred to in PostgreSQL's SQL Conformance statement as the component that provides Java routines and types to meet part 13 of the SQL spec. PostgreSQL is extensible in many ways beyond what that spec provides, such as with new operators and aggregates, and PL/Java can be used to build those too. It is implemented mostly in Java and C.
Having supported only the Java language since 2004, PL/Java is getting a thorough refactoring to make it a common base for building PostgreSQL server-side PLs using any of the current large ecosystem of languages that run on the JVM or with the polyglot features of GraalVM. A contributor on this project will be shaping a PL/Java that looks forward to its next decade or two.
These are some tasks that need to be completed:
- Implement data type adapters for PostgreSQL types not yet covered.
- Develop regression tests for data type adapters.
- Complete the implementation of
isSubtype
and use to implement an adapter manager as described in the pull request. - Finish the Java-to-PostgreSQL side of APIs (
Adapter
,Datum.Accessor
,TupleTableSlot
) that currently only implement PostgreSQL-to-Java.
A more complete list of tasks can be found in pull request 399. An interested contributor should become familiar with that pull request and with building PL/Java from the associated branch. A project proposal may include a combination of items from that list, and additional ideas are also welcome.
Skills needed
- Primarily Java
- Familiarity with C and acquaintance with the Java Native Interface (JNI)
- Some familiarity with PostgreSQL internals, such as experience creating some simple C-language PostgreSQL extension.
Historically, hacking on PL/Java has been C- and JNI-heavy, but this refactoring aims to move much more of the interesting stuff to pure Java, so contributors with primarily Java experience can make substantial contributions. Some familiarity with C and JNI may still be helpful for looking underneath to see what is going on.
- Ideally, a favorite JVM or GraalVM language other than Java, to use in the final "expected outcome" below.
Difficulty level
Moderately easy to moderately hard, depending on the items included.
Project size
Medium (175 hours) or large (350 hours) depending on the proposed deliverables.
Mentors
- Chapman Flack can mentor. Chap is the current PL/Java maintainer.
Expected outcomes
- Accepted pull requests in the PL/Java GitHub repository for the items proposed.
- For your favorite JVM or GraalVM programming language Foo, other than Java, an implementation on GitHub of a simple PostgreSQL PL/Foo (with perhaps limited functionality), implemented with this branch of PL/Java.
pg_statviz: time series analysis & visualization of statistics
Project Description
pg_statviz is a brand new extension and utility pair for snapshotting Postgres cumulative & dynamic statistics, performing time series analysis & creating visualizations from them.
It's a (PL/pg)SQL extension and Python command line utility pair, fully modular. Code is currently at "feature-complete alpha" maturity level.
It needs:
- Additional modules for stats to monitor (such as WAL replication, I/O, tables)
- Writing of Python regression tests
- Packaging for PGDG repositories and Linux distributions
Skills needed
- SQL
- PL/pgSQL
- Python
- optionally Matplotlib
Difficulty level
Moderately easy to moderately hard, depending on the items included.
Project size
Medium or large depending on the proposed deliverables.
Mentors
- Jimmy Angelakos (author of pg_statviz)
Expected outcomes
- Accepted pull requests in the pg_statviz GitHub repository for the items proposed
Postgres extension tutorial / quick start
Project Description
How to write an extension for Postgres is a total mystery for the uninitiated. The documentation desperately needs an easy-to-follow tutorial with clear examples because the barrier for entry is prohibitive for new contributors. There are various third-party blog posts focusing on different areas, and sometimes contradictory.
The tutorial should be a quick start guide that assumes only knowledge of Postgres and the target language:
- Prerequisites for writing an extension
- How to start writing an extension
- How to write a Makefile and what it should do
- Do I need PGXS? How does it work?
- Do I need PGXN? How does it work?
- How to use Postgres Procedural Languages for the extension
- How to use languages external to Postgres for the extension
- How to write regression tests for the extension
- Extension release management and upgradability
Skills needed
- Postgres extension system
- Postgres documentation system
Difficulty level
Moderately easy
Project size
Large because of the amount of content to be produced
Mentors
- Jimmy Angelakos
- ?
Expected outcomes
- New documentation section for postgresql.org/docs
Improving PostgreSQL EXPLAIN output
Project Description
One of the essential parts of PostgreSQL performance is the query planner and understanding what plans are generated and how. There's a lot which goes into the query planner and there is a lot of information returned by EXPLAIN, but there's still information which could be added that would be informative to users of PostgreSQL. While additional ideas are encouraged, here are two specific ones recently requested by PostgreSQL users:
- Add TOAST information to EXPLAIN ANALYZE
- EXPLAIN ANALYZE currently doesn't perform any de-TOASTing and doesn't provide any information about what the cost would be to return TOAST'd values
- The first part of this sub-project would involve adding just a simple count of how many values would have to be de-TOAST'd
- A next step would be to include size information about how large the TOAST'd values are that would be returned
- Report details about how query plan cost components are used to make the final cost value for a node
- A lot of different variables go into query plan cost. These include seq_page_cost, random_page_cost, cpu_tuple_cost, cpu_operator_cost, and more.
- Currently, PostgreSQL reports just two 'cost' values from EXPLAIN for each node- a total cost to run the node, and the cost to return the first tuple.
- Being able to see how PostgreSQL arrived at the query cost for a given node could be very helpful when considering how to adjust those parameters to encourage PostgreSQL to use a different plan
- This sub-project would involve changing EXPLAIN to keep track of the query components separately to report them out independently in the EXPLAIN output
Each of these would be new parameters to EXPLAIN, similar to 'BUFFERS' and others.
Skills needed
- C
- SQL
Difficulty level
Moderately easy
Project size
Small to Medium depending on the resulting set of improvements to EXPLAIN
Mentors
- Stephen Frost
- ?
Expected outcomes
- Patches posted to the PostgreSQL -hackers mailing list with these improvements
- Iterations with the PostgreSQL community to update the patches with their feedback
DBT-3 Benchmark Kit Development (2023)
Project Description
This project involves updating DBT-3, a fair use benchmarking kit based on the TPC Benchmark(TM) H.
The goal is to make the kit more portable with a combination of create an AppImage of the kit, and rewriting some of the R scripts into language with an engine that can also be distributed with an AppImage. The R scripts do a combination of statistical calculations and plotting.
Skills needed
- POSIX shell scripting
- Julia
- PostgreSQL
Difficulty level
Easy to Moderately hard.
Project size
Medium.
Mentors
- Mark Wong can mentor. Mark is a major contributor, has experience writing benchmarking kits, and is maintaining this kit.
Expected outcomes
- Learn to use CMake, Julia, and containers.
- Scripts and configuration needed to create an AppImage.
- Additional Julia scripts to calculate statistics and plot chargs.
DBT-5 Stored Procedure Development (2023)
Project Description
This project involves updating DBT-5, a fair use benchmarking kit based on the TPC Benchmark(TM) E. The server side business logic was developed at a time before stored procedures and sub-transaction support was available for PostgreSQL.
The goal is to update the current server side business logic, up to 12 transactions, to follow the benchmark specification to use stored procedures and sub-transaction as described by the specification.
Skills needed
- C (time permitting)
- C++
- pl/PGSQL
- PostgreSQL
Difficulty level
Moderately hard, depending on proficiency with C coding.
Project size
Large.
Mentors
- Mark Wong can mentor. Mark is a major contributor, has experience writing benchmarking kits, and mentored the original GSOC project that developed this kit.
Expected outcomes
- Review of the 12 transaction to determine which should be converted to procedures and updated with appropriate sub-transaction handling
- Updated pl/PGSQL code for all transactions that need to be updated.
- The C++ application code will need to be updated in conjunction with updates to the server side business logic.
- Time permitting, updated C code for all transactions that need to be updated.
PostgreSQL JDBC Struct/Array Support (2023)
Project Description
The JDBC API has:
Struct createStruct(String typeName, Object[] attributes) throws SQLException
Factory method for creating Struct objects. Parameters: typeName - the SQL type name of the SQL structured type that this Struct object maps to. The typeName is the name of a user-defined type that has been defined for this database. It is the value returned by Struct.getSQLTypeName. attributes - the attributes that populate the returned object
Currently pgjdbc does not support this. The goal would be to implement the function
Skills needed
- JAVA
- JDBC
- PostgreSQL
Difficulty level
Moderately hard, depending on proficiency with JAVA coding.
Project size
Medium.
Mentors
- Dave Cramer can mentor. Dave is a major contributor, has experience writing JAVA and JDBC.
Expected outcomes
- Working implementation of createStruct
GUI representation of monitoring System Activity with the system_stats Extension in pgAdmin 4 (2023)
Project Description
This project involves design and developing dashboard graphs/charts in pgAdmin 4 which shows the system level statistics for monitoring Postgres activity with the help of the system_stats extension. The system_stats extension is a library of stored procedures that allow users to access system-level statistics for monitoring Postgres activity. These procedures reveal a variety of system metrics to the database server.
Skills Needed
- Python/Flask
- Javascript
- React JS
- CSS
Difficulty level
- Medium
Project Size
- Medium (175 hours).
Potential Mentors
- Khushboo Vashi
Expected Outcomes
- To have a GUI representation (charts/graphs) of the system level statistics on Dashboards tab in pgAdmin 4
pg_chameleon sql library
Project Description
pg_chameleon can translate in the PostgreSQL dialect any DDL generated by MySQL. The library was written several years ago and it works generally well.
However a complete rewriting is necessary for cleaning up the weird stuff and improve the maintenance of the library.
github project: pg_chameleon on github
- Python
- regexp
- PostgreSQL
- MySQL
Difficulty level
Medium to hard, depending on proficiency with Python coding and regexp writing.
Project size
Small to Medium
Mentors
- Federico Campoli
Reach out to thedoctor (at) pgdba (dot) org for your proposal