User:Ringerc

From PostgreSQL wiki

Revision as of 18:03, 31 December 2009 by Ringerc (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Craig Ringer

I'm a PostgreSQL user since around 7.3, though seriously since 8.0 / 8.1 . I spend a bit of time confusing people and blundering around on the mailing lists in the guise of trying to help out, even delving into the black arts of supporting PostgreSQL on Windows.

Most of my work with PostgreSQL is as an application programmer and small scale DBA. In terms of application interfaces to Pg, I work primarily with Hibernate on Java. There's also some basic JDBC, plenty of Python (psycopg2 driver) admin scripts, and huge piles of plain 'ol psql.

Contact me

I can be reached by executing the following operation:

dbname=> SELECT decode('Y3JhaWdAcG9zdG5ld3NwYXBlcnMuY29tLmF1', 'base64');

Articles

( I mostly keep this list for my own records )

The following Wiki articles are my fault, or at least started off that way, and I'm the one to yell at for them:

Plans and notes

Code to post

is transient from dbexceptionhelper

pgnotificationhelper and pgnotificationpoller

FAQ entry on plan caching and param stat availability

> FOR matchRecord IN > same query as above > LOOP > RETURN NEXT matchRecord.evaluationid; > END LOOP; > > And when I execute the function with the same parameters it takes well > over 5 minutes to execute.

It's as if you PREPAREd the query once, and each time you run the function it gets EXECUTEd. The query plan is cached. Unfortunately, when PostgreSQL builds a prepared statement (or query in a function) it doesn't have knowledge of exact parameter values, which limit its use of statistics for query optimisation.

Currently there is no way to ask PostgreSQL to re-plan such queries at each execution. You have to force it by using a query that cannot be cached. In PL/PgSQL the usual method is to use EXECUTE ... USING to provide the query as text that is parsed and executed each time the function gets invoked.

> It seems as though inside a function, the optimizer wants to tablescan > my 8M row table. Is there a way that I can see the query plans that my > functions are using?

Not directly. However, if you PREPARE your query, then

 EXPLAIN ANALYZE EXECUTE

it with the parameters you use, you'll see the same effects.

(Hmm, this needs to be a FAQ)

FAQ entry on NAT and tcp_keepalives

Mention "org.postgresql.util.PSQLException: An I/O error occured while sending to the backend."


> In either case, the problem is how to detect whether a Connection is valid or not.

Don't. Even if the connection is ok when you test it, it might not be when you then try to use it. Instead of trying to ensure that the connection is perfect, make your application tolerant of failure. Batch things into transactions, catch SQLExceptions, and re-try operations that fail.

FAQ entry on detecting whether in transaction

"How do I tell if I'm already in a transaction and start one if not?"

  • From PL/PGSQL
  • From SQL function
  • From top-level SQL statements

Discuss:

  • Everything always in a transaction whether explicit or implicit
  • Subtransactions, relationship to nested transactions
  • Lack of top-level stored procs with transaction control
  • Lack of autonomous transactions, dblink workaround

Article on proper handling of connection issues in app code

> I did work with a pure Connection before, until I realized things go wrong when application is idle for long time. > > In either case, the problem is how to detect whether a Connection is valid or not.

Don't. Even if the connection is ok when you test it, it might not be when you then try to use it. Instead of trying to ensure that the connection is perfect, make your application tolerant of failure. Batch things into transactions, catch SQLExceptions, and re-try operations that fail.

After all, database operations may fail due to serialization errors (in SERIALIZABLE mode), locking deadlocks if you make mistakes with lock ordering, etc. While you want to try to avoid these things, you should also handle them in the app if they come up.

If you try to ensure that your connection is ok and then rely on it staying OK, you *will* have problems. A user might lose wifi/cellular coverage, a network might drop out, etc. You're setting yourself up for a race condition you will inevitably lose some of the time.

> As far as I know there is no native mechanism for checking the state.

Trying something and catching the SQLException if it fails. That reflects the design of the JDBC interface and of transactional databases in general that you should try to do something and cope if it fails, rather than ensuring it cannot fail.

> So one approach may be to implement a DataSource which does check (i.e. "SELECT 1") validity before returning the Connection (and reconnect if its not valid, i.e. throwing a SQLException?).

Nononono....

Have the DataSource _consumer_ deal with it.


int retries = MAX_RETRIES;
do {
  try {
    try {
      Connection conn = myprovider.getConnection();
      // do work
      break;
    } finally {
      try {
        stmt.close();
      } catch (SQLException e) {
        // log to complain about statement close failure
      }
    }
  } catch (SQLException e) {
    myProvider.invalidateConnection(conn, e);
    retries--;
  }
} while (retries > 0);

... where "invalidateConnection(Connection, Throwable)" tells the provider/pool that the connection is broken.

If you're using Connection directly, you'll just close() one you think is broken and set the connection var to null so you know you need to create a new one next time around.

In practice, you'll also generally test e.getSQLSTate() to see if the exception might be a transient once tha doesn't imply the connection is broken, and re-try with the same connection if it is. There's no point dropping and re-creating the connection if it's a deadlock, serialization failure, or the like after all.

Here's some code I use to check for SQLStates that may mean an operation is worth retrying:

   private static final List<String> transientStates = Arrays.asList(
            "08",   // Connection exceptions - refused, broken, etc
            "53",   // Insufficient resources - disk full, etc
            "57P0", // Db server shutdown/restart
            "40001",// Serialization failure
            "40P01"// Deadlock detected
            );

   /**
     * Attempt to figure out whether a given database error could be 
     * transient
     * and might be worth a retry. Detects things like network timeouts,
     * transaction deadlocks, serialization failures, connection drops,
     * etc.
     * 
     * @param e Exception thrown by persistence provider
     * @return True if the error might be transient
     */
    public static boolean isDbErrorTransient(Throwable e) {
        final SQLException se = getSQLException(e);
        if (se == null)
            return false;
        final String sqlState = se.getSQLState();
        for (String s : transientStates) {
            Logger.getLogger(TheNameClass.class).warn(
               "Got possibly transient SQL error " + sqlState, e);
            if (sqlState.startsWith(s))
                return true;
        }
        return false;
    }

Idle in transaction

We really need a page on "idle in transaction", long running transactions, etc. Ideally it needs to cover version differences.

Relevant thread begins msgid <4B21737B.50009@hogranch.com> subject "[JDBC] Connection.setReadOnly()" date 11 dec 2009.

Recent mail from Tom Lane that's relevant (Message-ID: <17892.1260545186@sss.pgh.pa.us>):

Craig Ringer <craig@postnewspapers.com.au> writes:
> > Does Pg have issues with long-running read-only transactions in
> > read-committed mode? Or are they an exception to the usual "avoid
> > long-running mostly-idle transactions" guideline?

> > Pg doesn't have to worry about retaining old rows in vacuum with such a
> > statement, because each new statement creates a new snapshot, so if it's
> > idle it effectively has _no_ snapshot and vacuum is free to clear old
> > rows. It doesn't need to keep track of anything to roll back, since the
> > transaction is read only.

As of 8.4, I think the above statements are true.  In prior releases,
transactions just advertised the age of their first snapshot, so an
open transaction created an issue for VACUUM regardless of whether
it was serializable or not.  8.4 has more bookkeeping that allows a
transaction to report that it currently has no live snapshots.

			regards, tom lane

Setting up privileges

Guide to setting privileges in a db - ie db lockdown.

Explain roles, users, database objects, grants. Explain grants always additive, no negative "deny" privs.

Explain default security structure, how it works in terms of roles. Revoke all from public on database, public schema. Make basic user with connect and temp rights, usage rights on public schema. Make special roles for particular abilities, and admin role with all special roles. Mention functions default to public EVEN IF SECURITY DEFINER. Since funcs default to public execution, triggers are ok. Beware security definer triggers may be added by user to their own tables in attempt to exploit, so write carefully and/or check target table in trigger.

Note that with 8.4 column privs, triggers may set columns that the user themselves do not have the rights to set.

Mention 8.5's ALTER DEFAULT PRIVELEGES ( http://developer.postgresql.org/pgdocs/postgres/sql-alterdefaultprivileges.html )

Make yourdbowner role, and another role like canbecomeyourdbowner that has it and is NOINHERIT. That way you can grant the ability to SET ROLE to the dbowner to users without making them run constantly as dbowner. Kind of like passwordless sudo.

Create all objects as yourdbowner using SET ROLE before all create scripts. Add explicit GRANTs of rights to basic user role, plus special roles, as appropriate for depending on security design. Remember to GRANT USAGE ON SEQUENCE ... wherever you GRANT INSERT to a role on a table that uses that sequence as a defaults source. You can use column privs in 8.4 for more fine-grained access (highly recommended) - but discuss col privs inclusive only, no exclusive form.

Create your users, using IN ROLE ... and INHERIT. Grant extra special rights roles to users as appropriate. Never grant rights on objects directly to users, always use a role.

In your application, use information_schema.enabled_roles and information_schema.applicable_roles to decide user privs. Rely on db enforcement, app should only use info schema for ui decisions and user guidance (hide or disable options they can't perform etc).

Personal tools