8.1.4 et. al. Security Release Technical Info
An attacker able to submit crafted strings to an application that will embed those strings in SQL commands can use invalidly-encoded multibyte characters to bypass standard string-escaping methods, resulting in possible injection of hostile SQL commands into the database. The attacks covered here work in any multibyte encoding (but see also CVE-2006-2314). Affects all PostgreSQL versions before 8.1.4, 8.0.8, 7.4.13, 7.3.15.
The widely-used practice of escaping ASCII single quote "'" by turning it into "\'" is unsafe when operating in multibyte encodings that allow 0x5c (ASCII code for backslash) as the trailing byte of a multibyte character; this includes at least SJIS, BIG5, GBK, GB18030, and UHC. An application that uses this conversion while embedding untrusted strings in SQL commands is vulnerable to SQL-injection attacks if it communicates with the server in one of these encodings. While the standard client libraries used with PostgreSQL have escaped "'" in the safe, SQL-standard way of "" for some time, the older practice remains common. As of PostgreSQL versions 8.1.4, 8.0.8, 7.4.13, 7.3.15, the server has been modified to reject "\'" when the client is using one of these encodings. This does NOT in itself fix all variants of the problem, but it will make it obvious that such a client is broken and in need of repair. A possible workaround for affected clients is to avoid use of the vulnerable character encodings. Details:
The fundamental problem is that client-side and server-side code may have different behaviors when presented invalidly-encoded multibyte data. Given that client-side escaping has often been done by ad hoc code, the solution of "make sure they always act the same" seems unworkable, and we have instead tried to close off the problem at the server side by rejecting potentially-corrupted queries.
The particular issue described as CVE-2006-2313 stems from the fact that client-side escaping code commonly doesn't use any encoding knowledge at all, just treating any byte with the high bit set as a single non-ASCII character. In encodings such as UTF8 there is not anything obviously wrong with this practice (but see below for other encodings). However, the PostgreSQL server behaves differently, and that mismatch opens a hole. As an example, suppose that the client is operating in UTF8 encoding and an attacker submits a data string containing
0xc8 ' some text
(where 0xc8 represents a single byte with that hex value). In UTF8 0xc8 begins a two-byte character, but the second byte is supposed to have a value within 0xA0-0xFF; the above string is therefore not validly encoded. A non-encoding-aware client will ignore this problem and try to escape the quote mark, producing either
0xc8 ' ' some text 0xc8 \ ' some text
Previous versions of the PostgreSQL server would accept 0xc8 followed by any byte value at all as a two-byte character, so that the server would see the above as a two-byte character followed by a string-literal-ending quote mark. The text "some text" would then be successfully injected into the SQL query string supplied to the server. (In some contexts the server would warn about the invalid two-byte character, and then drop it, but a mere warning does not stop the attack.)
Our solution for CVE-2006-2313 is to modify the server to check multibyte encoding more carefully, and to treat invalid input as a query-stopping error rather than merely a warning. It will now reject 0xc8 followed by a byte that is not within 0xA0-0xFF. This closes off the problem without needing any assumptions about whether the client has been patched to understand about multibyte encodings explicitly. In "safe" encodings such as UTF8, we can now be sure that the client and server agree about which characters in the string are quotes and backslashes.
However, there are several Far Eastern character encodings in which 0x5c (the ASCII code for backslash) is a valid second byte for a two-byte character. For example, the two-byte sequence 0x95 0x5c is a valid character in SJIS. In these encodings, there are two additional risks which we have described as CVE-2006-2314.
First, a non-encoding-aware client is likely to try to escape what it thinks is a backslash. For example, the attacker sends the entirely legal SJIS string
0x95 0x5c ' some text
If the client is not encoding-aware it will take the 0x5c as a separate backslash character and will double it. It then also escapes the quote mark, producing either
0x95 0x5c \ ' ' some text 0x95 0x5c \ \ ' some text
Since the server will correctly perceive 0x95 0x5c as a single character, either case succeeds in injecting "some text": the last quote mark will appear unescaped to the server. A more subtle case is where 0x95 0x5c appears at the end of the string: the added backslash will effectively quote the intended string-ending quote mark. If the attacker can also control the next string literal in the query, he wins, eg
WHERE key1 = '0x95 0x5c \' AND key2 = 'injected text here' ...
Second, if the client escapes "'" as "\'", it can produce a valid multibyte character where there was none before. Again considering SJIS, suppose the attacker is able to send the (invalid) string
0x95 ' some text
If the client is not encoding-aware, and sees the quote mark as a separate character to be escaped, it will produce
0x95 \ ' some text
which is seen by the server as
0x95 0x5c ' some text
and thus again the injection attack succeeds.
The only real solution for these problems is to fix the client: escaping has to be done with awareness of the character set encoding (so that "backslashes" that are part of a multibyte character won't be escaped), and quotes need to be escaped using the SQL-standard representation "" rather than "\'". Note that in none of these encodings is 0x27 a valid trailing byte, so that there is no corresponding problem for "" escaping so long as the server checks multibyte encoding validity.
However, waiting around for clients to get fixed isn't a very workable security approach. We have instead modified the server so that (by default) it will reject "\'" as a representation of quote mark when operating in an encoding that allows embedded 0x5c. This change closes the security holes associated with a non-encoding-aware client trying to double an embedded 0x5c, so long as the client escapes "'" as ""; note that in the cases above where the client did not use "\'", the server sees "\'" as a result of the mistaken doubling of 0x5c. Notice however that in the examples where the client did try to use "\'", the server sees a perfectly valid query with no apparent use of "\'". Therefore, rejecting "\'" will NOT by itself prevent injection attacks. Rather, this is a partial solution that will also have the effect of making it obvious in normal use that clients using this representation in unsafe encodings are broken. This should allow them to get fixed before they are attacked.
(Another reason for rejecting "\'" is that it fits into our long-term plan to transition to SQL-standard string literal rules, wherein backslash is not a special character. Clients that are still using "\'" at that point would have a new SQL-injection risk to contend with.)
There are a number of mitigating factors that may keep particular applications from being subject to these security risks:
* If application always sends untrusted strings as out-of-line parameters, instead of embedding them into SQL commands, it is not vulnerable. * If client_encoding is a single-byte encoding (e.g., one of the LATINx family), there is no vulnerability. * If application cannot pass invalidly encoded data to the server, there is no vulnerability (this probably includes all Java applications, for example, because of Java's handling of Unicode strings).
It is clear that CVE-2006-2314 is a fairly nasty problem, since a proper fix may require changes in application code rather than just a quick update of a library or server. This is mitigated by the fact that it occurs only when certain character encodings are in use, and (at least outside the Far East) these are not popular encodings for Web-exposed applications. A workaround for applications that can't be updated promptly is to avoid use of these encodings while processing untrusted data.
The PostgreSQL Global Development Group thanks Akio Ishida and Yasuo Ohgaki for identifying and reporting these issues.