Oracle to Postgres Conversion

From PostgreSQL wiki
Jump to navigationJump to search

Note: This page was originally carried on the OpenACS Website, but seems to have been lost over the ages and so is now reproduced here. by James Shannon, Ben Adida, and Don Baccus

What you should know before you begin

You should know SQL relatively well. Knowing the details of Oracle SQL and Postgres SQL are obviously tremendous advantages, but the hints in this document should quickly bring you up to speed on what the differences are.

If you're porting Oracle SQL to Postgres SQL for the ACS/pg, you should also be quite familiar with AOLserver Tcl, especially the AOLserver database APIs.

In this document, we're talking about:

  • Oracle 10g to 11g (most stuff will work down to 8i)
  • Oracle 12c is in some aspects different (Multitenent DB) but migration may be even easier
  • Postgres 8.4, and sometimes this also works for earlier versions.
  • Enterprise vs. Oracle RAC migrations

Oracle Enterprise and RAC Considerations

Oracle Enterprise has a more direct migration to PostgreSQL than does Oracle Real Application Clusters (RAC) in some cases. With RAC you may have multiple, separate, heavy-hitting, DML applications usually of the OLTP type connected to the same RAC Cluster, where RAC serves as a type of application farm. A common mistake with migrations from Oracle RAC farms is to associate all of the farm applications with one PostgreSQL Instance. The big picture that is missed here is ACTIVE-ACTIVE (Oracle RAC) and ACTIVE-PASSIVE (PG). While Oracle RAC can divvy up the applications and load balance them across the Nodes in the cluster, there is no such thing in PostgreSQL. So, the "right" solution without some re-architecture and/or use of 3rd party tools and extensions is to migrate the applications off of Oracle RAC one at a time to separate PostgreSQL instances, one heavy-hitting application per PostgreSQL Instance. The next two sections illustrate CPU and Memory factors that come into play with PostgreSQL being a host to 2 or more heavy-hitting applications.

CPU Contention

A PostgreSQL instance that has multiple heavy-hitting, DML applications connected to it can have CPU load problems due to the fact that we are stacking up or accumulating the concurrent, active transactions of multiple applications. A general rule in PostgreSQL is that as the number of active, concurrent transactions (pg_stat_activity.state = 'active') exceeds 2 times the number of CPUs, we begin to experience CPU load saturation. This would be expected with multiple, heavy-hitting applications working against the same PostgreSQL Instance.

Memory Contention

With multiple, separate, heavy-hitting, DML applications hitting the same PostgreSQL instance, we start to see inherent problems when unrelated SQL workloads (separate databases, schemas, and tables) compete for the same memory resources. Normally with a single application, one thing good about it is that a lot of the disk to memory activity is with the same heavily used tables. So you usually get your 95% to 99% cache hit ratio. But when multiple, separate SQL Workloads are at work within a single PostgreSQL instance with their own set of tables, you may begin to see contention for memory-resident pages between the separate SQL workloads. In that case you have to make sure you have enough OS memory and shared_buffers memory to handle the surge when multiple SQL workloads compete for the same paging resources. Anticipating different degrees of load activity at any one point in time between the competing SQL workloads makes tuning shared buffers much harder and perhaps impossible to tune for both at the same time without adding significantly more reserved memory to shared_buffers even though it may not need it all most of the time.

Transactions

While the first statement after a COMMIT starts a new multi-statement transaction in Oracle RDBMS, Postgres operates in autocommit mode. Every piece of code doing some DML that is not to be committed immediately must start a transaction with a BEGIN statement. ROLLBACK and COMMIT have the same semantic meaning in both systems; also SAVEPOINTS mean the same. Postgres knows all the isolation levels Oracle knows (and a few more). In most cases the default isolation level of Postgres (Read Committed) will be sufficient.

Grammar Differences

There are a handful of grammar differences in Postgres for functionality that is actually the same. ACS/pg attempts to perform these changes automatically, leaving only the major functionality differences to be ported by hand. This is done by db_sql_prep which performs a number of regular expression substitutions on a piece of SQL.

Functions

Oracle has more than 250 built-in single row functions and more than 50 aggregate functions. See Oracle Functions.

Sysdate

Oracle uses the function sysdate to get the current date and time in server's timezone. Postgres uses 'now'::timestamp for current transaction start date and time in session's timezone, which ACS/pg has conveniently wrapped in a function named sysdate().

ACS/pg also includes a Tcl procedure named db_sysdate which should be used every time the sysdate term appears. Thus:

set now [database_to_tcl_string $db "select sysdate from dual"]

should become

set now [database_to_tcl_string $db "select [db_sysdate] from dual"]

The Dual Table

Oracle uses the table DUAL for selects where actually no table name would be necessary, since the FROM clause in Oracle is mandatory. In PostgreSQL we can omit the FROM clause at all. This table can be created in postgres as a view to ease porting problems. This allows code to remain somewhat compatible with Oracle SQL without annoying the Postgres parser.

In any case when migrating, if possible, just remove the "FROM DUAL" clause from the statement. Joins with dual are really rare - and peculiar.

ROWNUM and ROWID

Oracle's pseudocolumn ROWNUM returns a number which is assigned at the moment when data is read, but before the ORDER BY is executed. You might consider to replace it by using ROW_NUMBER() OVER (ORDER BY ...) which might work in many cases. Emulations with sequences are potentially slower.

Oracle's pseudocolumn ROWID returns the physical address of a table row, encoded in base64. In applications it can be used to (temporarily) cache the address of rows to locate them easier a second time. Postgres has ctid which serves the same purpose.

Sequences

Oracle's sequence grammar is sequence_name.nextval.

Postgres's sequence grammar is nextval('sequence_name').

In Tcl, getting the next sequence value can be abstracted by calling [db_sequence_nextval $db sequence_name]. In case you need to include a sequence's value in a more complex SQL statement, you can use [db_sequence_nextval_sql sequence_name] which will return the appropriate grammar.

Decode

Oracle's handy decode function works as follows:

decode(expr, search, result [, search, result...] [, default])

To evaluate this expression, Oracle compares expr to each search value one by one. If expr is equal to a search, Oracle returns the corresponding result. If no match is found, Oracle returns default, or, if default is omitted, returns null.

Postgres doesn't have the same construct. It can be replicated with:

CASE WHEN expr THEN expr [...] ELSE expr END

which returns the expression corresponding to the first true predicate. For example:

CASE WHEN c1 = 1 THEN 'match' ELSE 'no match' END

There is one tiny discrepancy between DECODE and its emulation with CASE:

DECODE (x,NULL,'null','else')

will return 'null' in case x is NULL, while

CASE x WHEN NULL THEN 'null' ELSE 'else' END

results in 'else'. This also applies within Oracle.

NVL

Oracle has another handy function: NVL. NVL returns its first argument if it is not null, otherwise it returns its second argument.

start_date := NVL(hire_date, SYSDATE);

The above statement will return SYSDATE if hire_date is null. Postgres (and Oracle) has a function that performs the same thing in a more generalized way: coalesce(expr1, expr2, expr3,....) returns the first non-null expression that is passed to it.

Subquery in FROM

PostgreSQL requires a sub-SELECT surrounded by parentheses, and an alias must be provided for it. The alias is not mandatory for Oracle.

a query for Oracle:

 SELECT * FROM (SELECT * FROM table_a)

in PostgreSQL will look like:

 SELECT * FROM (SELECT * FROM table_a) AS foo

Functional Differences

Postgres doesn't have all the functionality of Oracle. ACS/pg is forced to deal with these limitations with specific work-arounds. Almost everything can be done under Postgres, but some features are awaiting new versions of the open-source database.

Outer Joins

Outer Joins in old (pre version 9i) Oracle work as follows:

SELECT a.field1, b.field2
FROM a, b
WHERE a.item_id = b.item_id(+)

where the (+) indicates that, if there is no row in table b that matches the correct item_id, the match should still happen, with an empty row from table b. In this case, for example, for all rows in table a where there is no matching row in b, a row will still be returned where a.field1 is correct, but b.field2 is null.

In Postgresql and Oracle 9i and younger:

SELECT a.field1, b.field2
FROM a
LEFT OUTER JOIN b
ON a.item_id = b.item_id;

In certain other cases where only aggregate values are pulled out of the outer-joined table, it's possible to not use a join at all. If the original query is:

SELECT a.field1, sum (b.field2)
FROM a, b
WHERE a.item_id = b.item_id (+)
GROUP BY a.field1

then the Postgres query can look like:

SELECT a.field1, b_sum_field2_by_item_id (a.item_id)
FROM a

where you've defined the function:

CREATE FUNCTION b_sum_field2_by_item_id (integer)
RETURNS integer
AS '
DECLARE
     v_item_id alias for $1;
BEGIN
     RETURN sum(field2) FROM b WHERE item_id = v_item_id;
END;
' language 'plpgsql';

Starting with 9i Oracle also supports SQL 99 outer join syntax. Unfortunately some programmers still stick to the old syntax (even nearly 20 years after it was superseded). That's the reason why this chapter still is necessary.

CONNECT BY

Postgres doesn't have connect by statements. It does, however, have WITH RECURSIVE. As WITH RECURSIVE is Turing-complete, it is simple to translate CONNECT BY statements into WITH RECURSIVE ones.

Sometimes CONNECT BY is used as a simple iterator:

SELECT ... FROM DUAL CONNECT BY rownum <=10

may be equivalent to the Postgres Query

SELECT ... FROM generate_series(...)

NO_DATA_FOUND and TOO_MANY_ROWS

These exceptions are disabled by default for selects in PLpgSQL. You need to add keyword STRICT after any keyword INTO in all selects, when you need to keep single row checking in stored PLpgSQL code. See documentation.

Data Types

Postgres sticks mostly to SQL standards while Oracle in history chooses its own way - especially concerning data types.

Empty strings and NULL values

In Oracle empty strings ('') and NULL values in string context are the same till the latest versions. You can concatenate NULL result with a string and obtain this string as result. In PostgreSQL you will obtain null in this case. In Oracle you need to use IS NULL operator to check whether string is empty. In PostgreSQL you will obtain FALSE for empty string and TRUE for NULL.

On conversion from Oracle to PostgreSQL you will need to carefully analyze all your string handling code and separate NULLs and empty strings.

Numeric Types

Oracle programs mostly use NUMBER. The appropriate data type in Postgres is decimal or numeric, both are equivalent. The limit for numbers in Postgres (up to 131072 digits before the decimal point; up to 16383 digits after the decimal point) is much higher than in Oracle and is internally stored in a similar way.

Oracle allows you to use INTEGER, REAL, DOUBLE PRECISION and FLOAT, but it maps them to NUMBER.

Later versions of Oracle have BINARY_FLOAT and BINARY_DOUBLE, which correspond to PostgreSQL's real and double precision.

Date and Time

Oracle mostly uses the data type DATE which is a combination of date + time. In most cases it should be sufficient to replace it with Postgres' TIMESTAMP. In some cases it will yield not exactly the correct result, because date consists only of second, minute, hour, day, month and year. No fractions of seconds, no DST, no time zones.

Oracle's TIMEZONE is very much similar to Postgres'.

Oracle's TIMESTAMP WITH TIME ZONE is different from PostgreSQL's in that it stores the time zone along with the value. PostgreSQL stores the values in UTC and converts them to the current session time zone upon display. If you need to retain the original time zone information, you have to store it separately.

Oracle only knows INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND, so this can be used directly in Postgres too.

CLOBs

Postgres has decent CLOB support in the form of TEXT.

BLOBs

(Note that this paragraph is quite outdated: PostgreSQL now does not only have Large Objects as described below, but the type bytea that is a good replacement for BLOBs, except that it cannot be streamed - for that you still use Large Objects.)

Binary large object support in Postgres is very poor and unsuitable for use in a 24/7 environment, because you can't dump them with pg_dump. Backing up a database that makes use of Postgres large objects requires one to knock down the RDBMS and dump the files in the database directory.

Don Baccus put together a hack that extends AOLserver's postgres driver with BLOB-like support, by uuencoding/decoding binary files before stuffing them into or extracting them from the database. The resulting objects can be consistently dumped by "pg_dump" while the RDBMS is up and running. There is no need to interrupt service while making your backup.

To get around the one-block limit on the size of a tuple imposed by Postgres, the driver segments the encoded data into 8K chunks.

Postgres large objects are scheduled for a major overhaul in summer 2000. Because of this, only the BLOB functionality used by the ACS was implemented.

To use the BLOB driver extension, you must first create a column of type "integer" with the name "lob" in the table that will store the BLOB, and a trigger on it that calls "on_lob_ref". You must use the name "lob". Here's an example:

create table my_table (
    my_key integer primary key,
    lob integer references lobs,
    my_other_data some_type -- etc
);

create trigger my_table_lob_trig before insert or delete or update on my_table for each row execute procedure on_lob_ref();

To put a binary file into "my_table":

set lob [database_to_tcl_string $db "select empty_lob()"]

ns_db dml $db "begin"
ns_db dml $db "update my_table set lob = $lob where my_key = $my_key"
ns_pg blob_dml_file $db $lob $tmp_filename
ns_db dml $db "end"

Note that the call to ns_pg to stuff the file into the database MUST be wrapped in a transaction, even if you're not updating any other tables at the same time. The driver will return an error if you don't.

To return a large object stored in "my_table" to the user:

set lob [database_to_tcl_string $db "select lob from my_table
                                     where my_key = $my_key"]
ns_pg blob_write $db $lob


Note that you don't need to wrap the call to blob_write in a transaction, as the database isn't being modified.

The large objects are automatically deleted when no longer used. To replace the large object stored in an existing record, just allocate a new one by calling "empty_lob()" and assign the returned key to the "lob" column in your table.

External Tools

A couple of extension tools are available for Oracle migration.

Ispirer Toolkit
Ispirer Toolkit automatically migrates the entire database schema (tables, views, packages, stored procedures, functions, triggers, etc.) and transfers data from Oracle to PostgreSQL.
Convertum.ru
Convertum is a robust tool for automatic migration of more than 25 databases, including migration from Oracle to PostgreSQL. 30-days trial is available for free.
CYBERTEC Migrator
Enterprise-grade software for migrating from Oracle to PostgreSQL. GUI-driven, high performance, little downtime.
Full Convert
Database conversion between Oracle (and 40+ other database engines) and PostgreSQL. Uses up to 16 parallel workers for throughput of over a million records per second.
DBConvert - data conversion and sync
Seamless data migration and (bidirectional) synchronization between Oracle and PostgreSQL databases.
ESF Database Migration Toolkit
A toolkit migrates Oracle databae to PostgreSQL in wizard. It connects to Oracle and PostgreSQL database directly, and migrate its table structure, data, indexes, primary keys, foreign keys, comments and so on.
orafce
The goal of the project is to implement some functions from Oracle database. Some date functions (next_day, last_day, trunc, round, ...), string functions and some modules (DBMS_ALERT, DBMS_OUTPUT, UTL_FILE, DBMS_PIPE, ...) are implemented now. Functionality was verified on Oracle 10g and module is useful for production work.
ora2pg
Ora2Pg is a Perl module to export an Oracle database schema to a PostgreSQL compatible schema. It connects your Oracle database, extracts its structure, and generates an SQL script that you can load into your PostgreSQL database.
Oracle to Postgres
Program to migrate Oracle databases to PostgreSQL server. It does not use ODBC or any other middleware software. Table structures, data, indexes, primary keys and foreign keys are converted. Command line support allows to script, automate and schedule the conversion process.
ora_migrator
PostgreSQL extension written in PL/pgSQL providing a database migration toolkit that makes use of the Oracle Foreign Data Wrapper.
Splendid Data - Cortex
Cortex is the most advanced product on the market for automated migrations of Oracle databases to native PostgreSQL. This prevents vendor lock-in and delivers maximum freedom of deployment on-premise and/or in the Cloud. Because Cortex establishes all necessary dependencies between Data and Code objects and also migrates an average of 80% of the Code objects automatically, it is unique.