Transparent Data Encryption

From PostgreSQL wiki

Jump to: navigation, search

This page describes transparent data encryption feature proposed in pgsql-hackers.

Contents

Overview

This summaries transparent data at rest encryption. The first patch is proposed in 2016[1] which implements the cluster-wide encryption with single key. In 2018 table-level transparent data encryption was proposed [2] (together with the method of integration with key management systems), that first patch is submitted in 2019[3], although it implements tablespace-level encryption as per the discussion. The patch implements both tablespace-level encryption using 2-tier key architecture and generic key management API to communicate with external key management systems.

As of July 2019, there are two proposals implementing different encryption feature: cluster-level encryption and tablespace-level encryption. But these provide the following functionality in common.

  • Data at rest encryption
    • Encrypt/Decrypt data when reading/writing shared buffer from/to the file system
  • Temporary file encryption
    • tupleslots, reorder buffer files

Here is latest development status of TDE (copying from Bruce's latest blog on TDE)

For the past 16 months, there has been discussion about whether and how to implement Transparent Data Encryption (tde) in Postgres. Many other relational databases support tde, and some security standards require it. However, it is also debatable how much security value tde provides.

The tde 400-email thread became difficult for people to follow, partly because full understanding required knowledge of Postgres internals and security details. A group of people who wanted to move forward began attending a Zoom call, hosted by Ahsan Hadi. The voice format allowed for more rapid exchange of ideas, and the ability to quickly fill knowledge gaps. It was eventually decided that all-cluster encryption was the easiest to implement in the first version. Later releases will build on this.

Fundamentally, tde must meet three criteria — it must be secure, obviously, but it also must be done in a way that has minimal impact on the rest of the Postgres code. This has value for two reasons — first, only a small number of users will use tde, so the less code that is added, the less testing is required. Second, the less code that is added, the less likely tde will break because of future Postgres changes. Finally, tde should meet regulatory requirements.

Here are some points discussed in the voice call on 8/10 :

Bruce shared an update on the recent community discussion. The main point being discussed in community is which files we need to encrypt, the design shared with cluster wide encryption is only encrypting the user data and not encrypting files that don't encrypt user data. Couple of to do's have been added to the list below to figure out which files contain user data and getting a buy-in from the community on which files to encrypt. We already know that we need to encypt pg_temp files and statistical files have some user data.

Bruce mentioned that we need to create targetted threads for each file type that we need to encrypt. This will help in having a targetted discussion for each file type encryption.

One of the main critism about this feature was a very long thread of over 400 emails, it is very difficult to understand the feature direction and key decisions made about the feature. The purpose of having the voice calls for the last several weeks to narrow down the scope of the feature for the first release. Bruce has created a fresh email thread describing the current situation and provding a link to the wiki page. The feature development will be carried forward using the new email thread and other information will be shared in the wiki page.

The goal is to get the intial version of this feature committed for PG 13, we still have sometime on our hand to achieve this target. The narrowed down target for PG-13 is cluster wide encryption, some of design details on which files to encrypt etc will be discussed in the community. Once we get the initial version committed, we can build on top of that foundation.

Threat models

TDE protects data from theft when file system access controls are compromised:

  • Malicious user steals storage devices and reads database files directly.
  • Malicious backup operator takes backup.
  • Protecting data at rest (persistent data)


This does not protect from users who can read system memory, e.g., shared buffers, which root users can do.

Granularity of encryption objects

This topic is under discussion which granularity of encryption object we choose as the first version of TDE

There are choices of granularity of encryption objects.

  • Database cluster
    • Proposed on pgsql-hackers[4]
  • Databases
    • Close to the SQL Server choice[5].
  • Tables
    • Close to the Oracle database and MySQL choices.
  • Tablespace
    • Proposed on pgsql-hackers[6]
    • There is a opinion that binding the keys to tablespaces is the wrong abstraction[7].
  • Columns
    • Oracle database supports
    • Better to do during execution something like using pgcrypto and triggers and views.
  • Define new groups
    • Create a group of tables being encrypted.

Scope for the first release of TDE

Following is in-scope for the first release of TDE targetted for PG 13. The idea behind choosing this scope is to target minimal feature-set that is secure, non-invasive and meets the regulatory requirements. We can build more TDE/KMS functionality on top of this in future releases, that could involve supporting more granular encryption type and also doing an external key management system. The group of postgresql communuty members (including senior members) have worked together in last few week and did several hours long design and planning discussions. They have agreed on the following scope for the first release of TDE.

  • Cluster wide encryption
  • Internal key management system (KMS), storing keys in the database
  • Encrypt everything that is persistent, not encrypting shared buffers or data in memor

Cluster wide encryption

The benefit of cluster wide encryption is following:

  • Simple architecture
  • Suitable for the requirement of encryption everything

The cluster wide encryption meets the compliance requirement and it is also a check in the box as far as TDE is concrend. It also meets the criteria of encrypting the data at rest i.e. persistent data.

More-granular-than-cluster encryption

The benefit of more-granular-than-cluster encryption is following:

  • Suppress performance overhead
  • Reduce the amount of data encrypted with a single key
    • Makes cryptoanalysis more difficult
    • Puts less data at risk if someone gets "lucky" in doing brute force
  • Reencryption without rebuilding whole database cluster
  • Useful for multi-user systems where each user's data-at-rest needs to be cryptographically isolated from the others.

However there also are concerns:

  • Performance overhead by looking up keys for encrypted table.

What to encrypt/decrypt

It depends on the granularity of encryption objects. In cluster encryption, it encrypts whole database cluster including all tables, indexes as well as SLRU data.

Cluster wide encryption is the target for first release, this level of granuality will encrypt all the persistent data as-well as WAL, temporary files etc. It doesn't encrypt data in memory i.e shared buffers.

On the other hand, in more-granular-than-cluster encryption we can encrypt the part of database cluster. It encrypts the following database objects.

  • Tables and indexes
  • WAL
  • System catalogs
  • Temporary files

it doesn't encrypt the followings but these might change in the future.

  • SLRU data (clog, commit timestamp, 2pc file etc)
  • Server logs

When to encrypt/decrypt

Buffer

Regardless of the granularity of encryption objects, it seems to get the consensus about when to encrypt buffer data on the shared buffer. It encrypt buffer data when disk I/O; processes encrypt data when writing it to the disk and decrypt when reading from the disk. That is, the data on the shared buffer is not encrypted state.

WAL

In cluster encryption, processes insert WAL data to WAL buffers in non-encrypted state and encrypts WAL buffers when writing to the file system.

On the other hand, in more-granular-than-cluster encryption there were discussions whether we need to use the same encryption key for WAL encryption as used for the table. The conclusion (at least for first version) is to have the encryption key dedicated for WAL and encrypt each WAL blocks with it. The main reason is threat model; if we see TDE as a data-at-rest solution, then I think it's fine to have a separate keyring with such keys, and only allow access to system processes.

To separate privilege of accessing to WAL key we may need to change how WAL writing works, so that individual backends don't really write into the WAL buffers directly, and instead hand-over the data to a separate process (with access to the key). We already have the walwriter, but it may not be running and we consider that to be OK. Or maybe we could have "encrypter" process that does just that. That's surely non-trivial work, but it seems like much less work compared to reworking the WAL format to allow WAL to be encrypted with different keys.

Backups

In cluster wide encryption, there will be an option in pg_basebackup to change the key if needed. They keys aren’t changed by default but we provide an option to change the key when setting up a new stand-by server

How to encrypt

This topic is under discussion that which block cipher modes of operation we use for buffer encryption and WAL encryption, respectively

We will use Advanced Encryption Standard (AES)[8], which is a block cipher algorithm. Key size can be 128 or 256 bits.

We will use AES-CTR[9] for heap/index and WAL encryption.

Key length

This topic is under discussion which key length we support

We will provide two types of key lengths: 128 and 256, selected at initdb time, e.g. --encrypt-aes128.

Initialization Vector(IV)

An IV is a specific type of nonce. Nonce means "number used once". That is, unique but not necessarily kept secret. The NIST requirements[10] intended to ensure the random but unique property with high probability. To generate random IVs, the NIST recommends two methods.

The first method is to apply the forward cipher function, under the same key that is used for the
encryption of the plaintext, to a nonce. The nonce must be a data block that is unique to each
execution of the encryption operation. For example, the nonce may be a counter, as described in
Appendix B, or a message number. The second method is to generate a random data block using a
FIPSapproved random number generator. 

A question that came up in the last meeting is how to set IV value of temporary files since we spend hugh time in determining initialization vector. How to create IV for temporary files that we write to the file-system? Bruce mentioned that we can get get input from the community on temporary file encryption and how to set IV value for temporary files encryption.

Therefore, we concluded to use the first method to generate IVs. That is, select nonce carefully and through the cipher with the key makes it random enough and use it as an IV. The elements of nonce for both buffer encryption and WAL encryption are described below.

IV for heap/index encryption

We will use (page LSN, page number) to create an IV for each page. Page LSN is 8byte and page number is 4byte. Since the length of IV is 16 byte it's fine.

Using the LSN and page number as part of the nonce has three benefits:

  • We don't need to decrypt/re-encrypt during CREATE DATABASE since the page contents are the same in both places, and once one database changes its pages, it gets a new LSN, and hence a new nonce/IV.
  • For each change of an 8k page, you get a new nonce/IV, so you are not encrypting different data with the same nonce/IV.
  • This avoids requiring pg_upgrade to preserve database oids, tablespace oids, relfilenodes.
  • We get a unique nonce even when different two pages in the same relation have the same LSN (we don't use the same LSN in different relations)
    • This can happen for example when heap update expire the old tuple and add the new tuple to the other page.

However, the LSN must then be visible on the encrypted pages. So we will not encrypt the LSN on the page. We will also not encrypt the CRC so pg_checksums can still check pages offline without access to the keys.

IV for WAL encryption

We will use the WAL segment number to create an IV for each WAL segments.

The maximum bits that can be processed with a single key/IV(nonce) pair is 68GB[11][12]. We will use a different IV(nonce) 16MB WAL file, so we will be OK there too.

Rekey

Rekey is an operation of changing the heap/index or WAL encryption keys, or passphrase. This would require to decrypt it with the current key and encrypt with the new key. Initially only offline reencryption will be supported.

Checksum and Encryption

Encrypt and then CRC, and store the CRC decrypted

Keys

Since we're going to use AES, which is symmetric block cipher this section is all about symmetric keys.

2-tier key architecture

2-tier key architecture is a popular way to make key rotation faster that other databases(MySQL, Oracle database) also employees. We use two types of encryption keys: master key (MK) and data encryption key (DEK). MK is used to encrypt DEK and must be stored at a separate location (e.g. key management server) from database. DEK is used to encrypt actual user data and can be stored in the same location of database after encrypted using MK. If we use 1-tier key architecture, we must decrypt data using the current key and re-encrypt them using the new key, which could take a long time with large data. On the other hand in 2-tier key architecture, what we need to encrypt is only DEKs, decrypt DEKs using the current MK and re-encrypt them using new MK. Since we don't touch user data at all and DEKs are very small (32 bytes for AES-256) the key rotation can complete within a very short time.

The proposed tablespace-level encryption have one MK for database cluster and multiple DEKs for each tablespaces.

Prepared transaction encryption

During the discussion, the point about prepared transaction encryption also came up since they are also persisted. Sawada-san mentioned that we aren’t storing any important data for prepared transactions so we might not need to encrypt it. However we need to have it as part of the todo list.

3-tier key architecture

Proposed by Joe Conway,

1. A master key encryption key (KEK): this is the key supplied by the
   database admin using something akin to ssl_passphrase_command

2. A master data encryption key (MDEK): this is a generated key using a
   cryptographically secure pseudo-random number generator. It is
   encrypted using the KEK, probably with Key Wrap (KW):
   or maybe better Key Wrap with Padding (KWP):

3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
    table specific keys.

3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
    generate new keys when needed for WAL (based on the other info we
    need to change WAL keys every 68 GB unless I read that wrong).

Temporary key

Temporary key is a randomly generated keys that lives only process-lifetime. This is used for temporary file encryption.

For parallel queries especially for parallel hash joins, since it's possible that multiple parallel workers use the same temporary file the the temporary key should be shared with parallel workers.


Passing (master) key to PostgreSQL

This topic is under discussion how we support the master key rotation

Encryption keys must be stored outside database. If we employee either 2-tier or 3-tier key architecture the master key is stored outside database. To pass it to PostgreSQL server we got consensus to have a GUC parameter that user can specify arbitrary command like ssl_passphrase_command, at least for the first version.

On the other hand, the tablespace-level encryption patch add a generic key management APIs to communicate various external key management services.

Passing key to front-end programs

This topic is under discussion how front-end programs get keys

Front-end programs that read database files or WAL segment files directly need to access the KMS to get the key. In addition, they need to get data encryption key(DEK) stored in PostgreSQL database directory. Otherwise, will look at the page and think it is corrupt.

Other requirements

wal_log_hints will be enabled automatically in encryption mode.

TDE requires openssl library.

TODO for Full-Cluster Encryption

Here is list of ongoing tasks with there assignment and status for cluster wide encryption and internal key management system.

  • Key management system (Internal key management system) - Assigned to Sawada-san Status : Updated patch sent to hackers and also committed in Swada github. All regression cases passing wih the patch, documentation update remaining.
    • add encryption indicator to pg_control (integer?)
    • Implement GUC cluster_passphrase_command, something like ssl_passphrase_command, to retrieve the pass phrase on boot
    • Add read-only GUC to report the cluster's encryption status
    • verify the pass phrase against its hash on boot
    • use the pass phrase to decrypt the relation (heap/index) and WAL data encryption keys stored in PGDATA and load them into memory
  • Buffer encryption - Assigned to Sawada Status : Updated patch sent to hackers and also committed in Swada github. All regression cases passing wih the patch, documentation update remaining.
    • use CTR mode
    • don't encrypt LSN
    • write CRC of the encrypted page contents and don't encrypt
    • don't encrypt the first 12 bytes of a page so pd_flags is visible in encrypted and non-encrypted mode? (Might be useful for online checksum and encryption.)
    • make pd_pagesize_version visible on the encrypted page?
    • shared buffers nonce is LSN/page-number (nonce is run through encryption to create the IV?)
    • require wal_log_hints to prevent force bit changes to be WAL logged (generates new page LSN)
  • WAL encryption - Assigned to Moon/Swada Status : Updated patch sent to hackers and also committed in Swada github. All regression cases passing wih the patch, documentation update remaining.
    • for WAL, don't use OpenSSL's EVP interface so the offset can be specified
    • WAL nonce is segment number (no timeline)
    • encrypt page hole?
    • add Asssert() code to check that there are no WAL record types that modify more than one relation (already written)
  • Temporary file encryption - Assigned to Moon Status : Work in progress, will be posted to hackers soon.
    • Encryption key: A hash value is randomly generated for each temporary file, and a temporary key is generated by a combination of the hash value and the master key. (will use HMAC256)
    • IV value: The encryption key will be used separately for each file. So that the IV is simply generated(64bit = hash value, 32bit = counter) as it should not exceed 1GB per file(pgsql_tmp).
  • Front end tools encryption - Assigned to Shawn Status : Submitted patches for front-end tools TDE support for review. Couple of tools remaining for TDE support are in-progress.
    • Add initdb options --aes128, --aes256, and --cluster-passphrase-command
      • update pg_control encryption indicator
      • store SHA256 hash of the pass phrase (KEK) in PGDATA (add salt?)
      • store relation (heap/index and TOAST) and WAL data encryption keys (DEK) encrypted with pass phrase in PGDATA
    • pg_upgrade
      • Add initdb flag to reuse the keys from the old cluster in the new cluster
      • check old/new cluster encryption indicators match
      • check encryption pass phrase and keys match
    • Add command-line tool to allow modification of the pass phrase and individual encryption keys
    • Allow pg_rewind and pg_waldump to work, add --cluster-passphrase-command option
    • does pg_rewind need to work across WAL key changes?
    • offlinle tool to allow changing the encryption key of current WAL and PITR WAL files, must be crash-safe
    • changing the pass phrase will require --old-passphrase-command and --new-passphrase-command options
    • modify pg_basebackup to store relations with a different relation key so standby servers can be used for relation key rotation
  • Which Files do we need to encrypt
    • Need to go through the data directory and get a list of all the files that contain user data - Assigned to Moon-san (Moon-san will check if NTT has done some work in this area and share the result)
    • Need to get buy-in from the community for not encrypting non-user data files i.e. visibility map, transactional data etc (This needs to be a team effort)
  • Enhancing pg_crypto to use KMS (Key manaagement system) - Assigned to Ibrar : As discussed on the call Ibrar will enhance pg_crypto to use internal KMS developed as part of this project. Ibrar will post a patch on hackers for this.

Here are some tasks or areas that we need to research, some of these are being worked as part of main to do's listed above. This is a exhaustive list to ensure that we don't skip any todo required for the first phase of TDE.

  • How logical decoding with work with TDE - Need to investigate how TDE with work with logical decoding, whether we need to decode the data in flight etc?
  • TDE for backup and replication
  • Regression test cases for TDE
  • Documentation
    • if a standby is promoted to a primary and the old primary continues writing, one must be rekeyed to avoid using the same IV
  • Comprehensive testing
  • Requires OpenSSL

List of the contains of user data for PostgreSQL files

If there are any added files, incorrect note or user data, please correct it.

Num Database cluster Contains of user data Single Sequential Write Single Process Write-then-Read note
1 PG_VERSION not contain Only PostgreSQL version information is written
2 base/NNNNN/NNNNN contain Table data
3 base/NNNNN/NNNNN_vm not contain VM file
4 base/NNNNN/NNNNN_fsm not contain FSM file
5 base/NNNNN/NNNNN_init not contain
6 base/NNNNN/PG_VERSION not contain Only PostgreSQL version information is written
7 base/NNNNN/pg_filenode.map not contain
8 base/NNNNN/pg_internal.init not contain
9 base/pgsql_tmp/pgsql_tmpPID.tempFileCounter contain Temporary file that creates user data temporarily when work_mem size is insufficient
10 current_logfiles not contain
11 global/NNNN contain Database name and user name
12 global/NNNN_vm not contain vm file
13 global/NNNN_fsm not contain fsm file
14 global/pg_control not contain
15 global/pg_filenode.map not contain
16 global/pg_internal.init not contain
17 pg_commit_ts/0000 not contain
18 pg_dynshmem/mmap.NNNNNNN not contain
19 pg_logical/mappings/ not contain
20 pg_logical/replorigin_checkpoint not contain
21 pg_logical/snapshots/0-XXXXXXXX.snap not contain
22 pg_multixact/members/0000 not contain
23 pg_multixact/offsets/0000 not contain
24 pg_notify/0000 not contain
25 pg_replslot/Slotname/state not contain
26 pg_replslot/Slotname/xid-NNN-lsn-0-NNNNNNNN.snap contain Includes user data decoded from WAL files
27 pg_serial/ not contain
28 pg_snapshots/NNNNNNNN-N not contain
29 pg_stat/db_NNNNN.stat contain Statistics collector includes user data
30 pg_stat/global.stat contain Statistics collector includes user data
31 pg_stat_tmp/db_NNNNN.stat contain Statistics collector includes user data
32 pg_stat_tmp/global.stat contain Statistics collector includes user data
33 pg_subtrans/0000 not contain
34 pg_tblspc/PG_NN_NNNNNNNN/NNNNN contain Symlink files
35 pg_twophase/NNNNNNNN not contain Exclude user data
36 pg_wal/NNNNNNNNNNNNNNNNNN contain WAL data
37 pg_wal/*.backup not contain Exclude user data
38 pg_wal/*.history not contain Exclude user data
39 pg_wal/*.partial contain WAL data
40 pg_wal/archive_status/NNNNNNNNNNNNNNNNNN.done not contain Exclude user data
41 pg_wal/archive_status/NNNNNNNNNNNNNNNNNN.ready not contain Exclude user data
42 pg_xact/0000 not contain
43 postgresql.auto.conf not contain setting file
44 postmaster.opts not contain
45 postmaster.pid not contain
46 postgresql.conf not contain Setting file
47 pg_hba.conf not contain
48
49

TDE in other systems

MySQL (InnoDB)

MySQL supports per tablespace, data at rest encryption[13]. Please note that in MySQL the tablespace refers to that a data file that can hold data for one or more InnoDB tables and associated indexes, while tablespace refers to a directory in PostgreSQL. innodb_file_per_table option allows tables to be created in their own tablespace. As of MySQL 8.0.16 it supports redo log and undo log encryption[14] and system tables encryption[15]. It supports 2 tier key architecture; it has tablespace keys for each tablespace which are located on the header of tablespace file. The master key can be obtained from external systems via a keyring plugin[16]

MySQL encrypts each page of both redo log and undo log with dedicated keys, not with the keys used for table encryption. The encryption key is stored in the header of the first redo/undo log file in encrypted state.

Oracle DB

Oracle DB supports column-level and tablespace-level TDE, both approaches use a two-tiered key-based architecture [17]. The Master Encryption Key (MEK) is stored in an external keystore with both hardware and software keystores supported [18]. The MEK is used to secure the column- and tablespace-level keys. Column-level TDE uses one key per table, tablespace-level TDE uses one key per tablespace. Oracle TDE supports Triple-DES (3DES168) and AES (128, 192, 256 bit). Column-level TDE defaults to AES-192, tablespace-level TDE defaults to AES-128. Both methods add a salt to the plaintext before encryption by default [19]. Column-level TDE supports a NOMAC parameter to improve performance.

MS SQL Server

MS SQL Server supports database-level TDE with a three-tiered architecture using both symmetric and asymmetric key encryption [20]. The Service Master Key (SMK) is generated automatically during installation (e.g. `initdb` in PostgreSQL). The Database Master Key (DMK) is created in the `master` database (e.g. postgres default database) and is encrypted by the SMK. The DMK is then used to generate the certificates actually used to secure the Database Encryption Key (DEK). The DEK is the per-database symmetric key used to encrypt data and log files.


Filesystem-level encryption (fscrypt)

https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

Links

Personal tools