This page describes the transparent data encryption feature proposed in pgsql-hackers.

Overview

There has been continual discussion about whether and how to implement Transparent Data Encryption (TDE) in Postgres. Many other relational databases support TDE, and some security standards require it. However, it is also debatable how much security value TDE provides.

Fundamentally, TDE must meet three criteria — it must be secure, obviously, but it also must be done in a way that has minimal impact on the rest of the Postgres code. This has value for two reasons — first, only a small number of users will use TDE, so the less code that is added, the less testing is required. Second, the less code that is added, the less likely TDE will break because of future Postgres changes. Finally, TDE should meet regulatory requirements. Also, TDE doesn't need to protect against all potential attack vectors, but it should fully protect against all intended attack vectors.

History

The first patch was proposed in 2016 [1] and implemented cluster-wide encryption with a single key. In 2018 table-level transparent data encryption was proposed [2], together with a method to integrate with key management systems; that first patch was submitted in 2019 [3]. The patch implemented both tablespace-level encryption using a 2-tier key architecture and generic key management API to communicate with external key management systems.

Scope of TDE

The scope of TDE is:

Internal key management system (KMS), storing keys in the database
Cluster-wide encryption
- encrypt everything that is persistent
- not encrypt shared buffers or data in memory

The benefits of cluster-wide encryption:

Simple architecture
Suitable for the requirement of encrypting all data

Cluster-wide encryption meets the compliance requirements and checks the box as far as TDE is concerned. It also meets the criteria of encrypting data at rest i.e., persistent data.

While this approach give protection from attackers with read access, it does not guarantee data integrity, i.e., protection against those with write access. Someday integrity checking might be attempted, but the number of files that would need to be encrypted would be larger (not just files containing user data), and some files would be difficult to encrypt and integrity-check without significant complexity, e.g., pg_xact. One solution might be to give integrity-checking guarantees only to tablespaces if the default data directory can be guaranteed to be safe from write attacks. The vm and fsm files in the tablespace directories would make this approach complex.

When to encrypt/decrypt

Buffer

It encrypts buffer data during file system I/O:

Processes encrypt data when writing to the file system
Decrypt when reading from the file system
Data in the shared buffer is not encrypted

WAL

In cluster encryption:

Processes insert WAL data to WAL buffers in a non-encrypted state
WAL buffers are encrypted when writing to the file system
WAL would use a dedicated encryption key

Temporary Files

Temporary files would use a temporary key that is randomly generated at postmaster start and lives only for the postmaster lifetime. For parallel queries, especially parallel hash joins, since it's possible that multiple parallel workers use the same temporary files, the temporary key should be shared with parallel workers. A unified I/O API needs to be written for temporary files so encryption changes can be centralized. The current API is streaming so a streaming cipher mode might need to be used.

Backups

In cluster-wide encryption, there will be an option in pg_basebackup to change the heap/index key for key rotation purposes. After failover to a standby, the WAL key can be changed too.

How to encrypt

We will use Advanced Encryption Standard (AES) [4]. We will offer three key length options (128, 192, and 256-bits) selected at initdb time with --file-encryption-method.

Initialization Vector(IV)

IV for heap/index encryption

We will use AES-XTS [5] for heap/index encryption. XTS is less sensitive to the reuse of the initialization vector (IV). Even though it is a block cipher, it uses “ciphertext stealing” to allow for encryption of data that is not a multiple of the AES 16-byte block size. XTS requires two secrets --- one for AES encryption, and the other to XOR the AES input and output.

The IV for heap/index (in addition to the XTS requirements) is not yet clear. It will probably be a mixture of some of these:

if dboid is used as part of the IV, CREATE DATABASE must decrypt/reencrypt copied files.

LSN: While it is possible for different data to be written with the same LSN, the LSN does add useful variability to the IV in most cases. LSN reuse can happen if multiple hint bit changes happen during the same checkpoint, and some GiST index pages have a fixed LSN assigned at index build time. Unlogged relations currently use zero LSNs but should be assigned fake LSNs. (We would use the same fake LSN counter we use for GiST indexes that does not generate WAL.) During LSN reuse, attackers would be able to see which 16-byte blocks changed, but not determine which bits changed. Of course, 8k block changes will always be visible.

block number

Zero pages would also need to be processed as non-encrypted. wal_log_hints=on will be required since hint bits will change 16-byte blocks.

IV for WAL encryption

We will use the WAL segment number to create an IV for each WAL segments. The maximum bits that can be processed with a single key/IV(nonce) pair is 68GB [6][7]. We will use a different IV(nonce) 16MB WAL file, so we will be OK there too. What about timelines?

The WAL might use CTR mode since CTR is a streaming cipher and IV reuse is not an issue for WAL.

IV for temporary files

It is unclear how to set the nonce for temporary files. We will probably use a data encryption key generated at postmaster start, and mix that with the time of day, process id, and maybe file path.

Checksum and Encryption

It is unclear if the page CRC should be encrypted. If encrypted, it gives cryptographically-insecure integrity checking but requires command-line tools that check the CRC to have access to the encryption key.

Prepared Transaction Encryption

During the discussion, the point about prepared transaction encryption also came up since they are also persisted. Sawada-san mentioned that we aren’t storing any important data for prepared transactions so we might not need to encrypt it. However we need to have it as part of the todo list.

Prototypes

Buffer encryption proof of concept by Masahiko Sawada [8]
Cybertec version based on Postgres 12.3 [9]
Key management and LSN-as-nonce set of patches [10]b

TODO for Cluster File Encryption

Here is list of ongoing tasks with there assignment and status for cluster-wide encryption:

Front end tools encryption
- Allow pg_rewind and pg_waldump to work, add --cluster-passphrase-command option
- does pg_rewind need to work across WAL key changes?
- offline tool to allow changing the data encryption key of current WAL and PITR WAL files, must be crash-safe
- changing the pass phrase will require --old-passphrase-command and --new-passphrase-command options
- modify pg_basebackup to store relations with a different relation key so standby servers can be used for relation key rotation

Here are some tasks or areas that we need to research, some of these are being worked as part of main to do's listed above. This is a exhaustive list to ensure that we don't skip any todo required for the first phase of TDE.

TDE for replication
- wal sender (especially xlogreader) needs to take TDE-wal key to decrypt wal data.
- Physical replication
  - WAL records are sent in a form of plaintext?
  - Allow primary and the standby servers to use different heap/index keys
- Logical replication (and decoding)
  - We decrypt and decode WAL data and send these changes in a form of plaintext.
  - The subscriber will use different encryption keys or even can disable TDE.
TDE for backup
- During physical backup (pg_basebackup or copying OS files), table/index data are transferred in a form of encrypted text and all three internal keys are replicated
- Maybe we need to have the ability to change some internal keys during basebackup? This can be done by generating new internal keys, re-encrypting database files with the new key during the transfer (we already do checksum verification for every page), generate new control file having new internal keys and sending it.
- For logical backup (pg_dump), all data are dumped in a form of plaintext because pg_dump simply fetches data via SQL.
- How does TDE work with backup manifests?
Regression test cases for TDE
Documentation
- if a standby is promoted to a primary and the old primary continues writing, one must be re-keyed to avoid using the same IV
Comprehensive testing

List of the files that contain user data

If there are any added files, incorrect note or user data, please correct it.

Num	Database cluster	Contains of user data	note
1	PG_VERSION	not contain	Only PostgreSQL version information is written
2	base/NNNNN/NNNNN	contain	Table data
3	base/NNNNN/NNNNN_vm	not contain	VM file
4	base/NNNNN/NNNNN_fsm	not contain	FSM file
5	base/NNNNN/NNNNN_init	may contain	[11]
6	base/NNNNN/PG_VERSION	not contain	Only PostgreSQL version information is written
7	base/NNNNN/pg_filenode.map	not contain
8	base/NNNNN/pg_internal.init	contains	[12]
9	base/pgsql_tmp/pgsql_tmpPID.tempFileCounter	contain	Temporary file that creates user data temporarily when work_mem size is insufficient
10	current_logfiles	not contain
11	global/NNNN	contain	Database name and user name
12	global/NNNN_vm	not contain	vm file
13	global/NNNN_fsm	not contain	fsm file
14	global/pg_control	not contain
15	global/pg_filenode.map	not contain
16	global/pg_internal.init	contains	[13]
17	pg_commit_ts/0000	not contain
18	pg_dynshmem/mmap.NNNNNNN	contains	[14]
19	pg_logical/mappings/	not contain
20	pg_logical/replorigin_checkpoint	not contain
21	pg_logical/snapshots/0-XXXXXXXX.snap	not contain
22	pg_multixact/members/0000	not contain
23	pg_multixact/offsets/0000	not contain
24	pg_notify/0000	not contain
25	pg_replslot/Slotname/state	not contain
26	pg_replslot/Slotname/xid-NNN-lsn-0-NNNNNNNN.snap	contain	Includes user data decoded from WAL files
27	pg_serial/	not contain
28	pg_snapshots/NNNNNNNN-N	not contain
30	pg_stat/pgstat.stat	contain	Statistics collector includes user data
31	pg_stat/pgstat.tmp	contain	Statistics collector includes user data
33	pg_subtrans/0000	not contain
34	pg_tblspc/PG_NN_NNNNNNNN/NNNNN	contain	Symlink files
35	pg_twophase/NNNNNNNN	not contain	Exclude user data
36	pg_wal/NNNNNNNNNNNNNNNNNN	contain	WAL data
37	pg_wal/*.backup	not contain	Exclude user data
38	pg_wal/*.history	not contain	Exclude user data
39	*pg_wal/.partial**	contain	WAL data
40	pg_wal/archive_status/NNNNNNNNNNNNNNNNNN.done	not contain	Exclude user data
41	pg_wal/archive_status/NNNNNNNNNNNNNNNNNN.ready	not contain	Exclude user data
42	pg_xact/0000	not contain
43	postgresql.auto.conf	not contain	setting file
44	postmaster.opts	not contain
45	postmaster.pid	not contain
46	postgresql.conf	not contain	Setting file
47	pg_hba.conf	not contain
48
49

TDE in other systems

MySQL (InnoDB)

MySQL supports per tablespace, data at rest encryption [15]. Please note that in MySQL the tablespace refers to a data file that can hold data for one or more InnoDB tables and associated indexes, while tablespace refers to a directory in PostgreSQL. innodb_file_per_table option allows tables to be created in their own tablespace. As of MySQL 8.0.16 it supports redo log and undo log encryption [16] and system tables encryption [17]. It supports 2 tier key architecture; it has tablespace keys for each tablespace which are located on the header of tablespace file. The master key can be obtained from external systems via a key ring plugin [18]

MySQL encrypts each page of both redo log and undo log with dedicated keys, not with the keys used for table encryption. The encryption key is stored in the header of the first redo/undo log file in encrypted state.https://www.postgresql.org/message-id/CAEze2WgK%3D8fBtY2CcCffqCrux4wKYFEiRVpkoPMMVaRjDq6Cpg%40mail.gmail.com

Oracle DB

Oracle DB supports column-level and tablespace-level TDE, both approaches use a two-tiered key-based architecture [19]. The Master Encryption Key (MEK) is stored in an external key store with both hardware and software key stores supported [20]. The MEK is used to secure the column- and tablespace-level keys. Column-level TDE uses one key per table, tablespace-level TDE uses one key per tablespace. Oracle TDE supports Triple-DES (3DES168) and AES (128, 192, 256 bit). Column-level TDE defaults to AES-192, tablespace-level TDE defaults to AES-128. Both methods add a salt to the plaintext before encryption by default [21]. Column-level TDE supports a NOMAC parameter to improve performance.

MS SQL Server

MS SQL Server supports database-level TDE with a three-tiered architecture using both symmetric and asymmetric key encryption [22]. The Service Master Key (SMK) is generated automatically during installation (e.g. initdb in PostgreSQL). The Database Master Key (DMK) is created in the `master` database (e.g. postgres default database) and is encrypted by the SMK. The DMK is then used to generate the certificates actually used to secure the Database Encryption Key (DEK). The DEK is the per-database symmetricly used to encrypt data and log files.

Filesystem-level encryption (fscrypt)

https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

Transparent Data Encryption

Contents