Baida | Post-mortem - Postgres "out of shared memory" error

A few days ago one of the IoT platforms that we developed at Igloo started receiving several out of shared memory errors from PostgreSQL, this sent me down a rabbit hole exploring how Postgres implements transaction isolation internally to understand what was causing this problem.

With a quick Google search I found the superficial explanation of the error: out of shared memory errors indicate that Postgres has reached the maximum number of locks it can keep in memory. The maximum number is max_locks_per_transaction * (max_connections + max_prepared_transactions), so you can easily increase it modifying the max_locks_per_transaction parameter of the Postgres server settings; indeed this is what the StackOverflow answers suggested and it did solve the problem. However I wanted to understand what was the root cause of this error to be sure it didn't happen again in the future; to fully understand the issue we need to make a step back and start from how Postgres implements transaction isolation.

Transaction isolation

PostgreSQL, as all SQL databases, implements transaction isolation, which means that they behave similarly to transactions executed sequentially; the precise definition of what is a "similar behavior" depends on the isolation level required for those transactions.

For example the Repeatable Read isolation level guarantees that within the same transaction you can read the same data several times and it will not change. For example consider the case with two concurrent sessions interacting with the database according to this timeline

Session A	Session B
	`BEGIN;`
	`SELECT balance from users where id = 12345; -- 400`
`BEGIN;`
`UPDATE users set balance = 500 where id = 12345;`
`COMMIT;`
	`SELECT balance from users where id = 12345; -- 400`
	`COMMIT;`

Thanks to the Repeatable Read guarantee the fact that the two transactions are running concurrently is hidden from the client: it sees the same result that would have been produced by running the two transactions sequentially.

In some cases PostgreSQL enforces the isolation requirements by returning an error instead of executing a query. For example in the following example the session B update fails with the error "could not serialize access due to concurrent update", because it would break the illusion that the transactions are executed sequentially:

if session A is run before session B, then session B reads balance=500
if session B is run before session A, then session A reads balance=600

Session A	Session B
	`BEGIN;`
`BEGIN;`
`SELECT balance from users where id = 12345; -- 400`
`UPDATE users set balance = 500 where id = 12345;`
`COMMIT;`
	`SELECT balance from users where id = 12345; -- 400`
	`UPDATE users set balance = 600 where id = 12345;`

Notice that to guarantee the transaction isolation by returning an error Postgres has to remember that transaction A has modified the user 12345 even after transaction A commits, it can only forget it after all transactions that have started before transaction A are finished.

SIReadLock and pg_locks

PostgreSQL keeps track of the rows that have been accessed by some transaction using a special lock type called SIReadLock. You can view all the active locks with the query

SELECT relation::regclass, * from pg_locks;

In our case this query returned a lot of SIReadLock locks, which explained why we were receiving the out of shared memory errors: Postgres was keeping track of the queries run in past transactions that have been already committed to ensure that the isolation guarantees are met.

Still this issue seemed weird, because we were running a fairly typical OLTP workload to store the data from the sensors and some analytical queries to process the collected data; I expect Postgres to be able to handle this kind of application without requiring extreme tuning or workarounds.

The answer came from PostgreSQL's documentation: we had a long-running analytical query which only performed read operations but was not marked as a READ ONLY transaction. When a transaction is marked as READ ONLY Postgres knows that a conflict like the second example above cannot happen, because that transaction will never run update queries, thus it can avoid storing the SIReadLock from committed transactions. If the long-running transaction is not marked as READ ONLY Postgres is forced to keep the SIReadLock because the transaction may run an UPDATE query in the future, which is why the SIReadLock locks were accumulating and filling the shared memory.

Another possible cause for this same problem are orphaned transactions: if for some bug you application does not commit or abort a transaction the same SIReadLock accumulation will occur. You can check if you have transactions older than 10 minutes and idle with the following query

SELECT age(clock_timestamp(), query_start) AS query_age, query, *
FROM pg_stat_activity
WHERE state = 'idle in transaction' AND age(clock_timestamp(), query_start) > '10 min'::INTERVAL
ORDER BY query_age DESC

You should of course find the bug that is causing these orphan transactions, but as a temporary fix or as a preventive measure you can set the idle_in_transaction_session_timeout connection parameter in your connection string; its value is the number of milliseconds of idling after which Postgres will automatically disconnect an open transaction. Setting this parameter ensures that an orphan transaction doesn't impact your database availability.