Postgres is a powerful and versatile database management system, now ubiquitously used in all types of applications, most notably web services. In this context, its most important property is the ability to serve multitudes of users at the same time - with the help of an application server, of course. To maintain the balance between serving clients concurrently and preserving data consistency, the database has to make trade-offs, and Postgres is no different.

We will explore these trade-offs and examine what can go wrong when transactions from multiple users interweave. To do so, we will build a very rudimentary booking platform where users can reserve seats for arbitrary events. The system will be written in Ruby, and if you are allergic to that choice, fret not - there will be minimal Ruby code and a lot of SQL.

If you want some interactivity, you are welcome to follow along by looking at the actual runnable code that reproduces the anomalies in this GitHub repository. In case the explanations in this article don’t work for you, examining and tweaking real code on your own may help you grok the subject matter more efficiently.

In case you are reading this in the distant future, keep in mind that everything you are about to read applies to Postgres 17. There is a chance that later versions have changed things.

Atomicity

Let’s lay the foundation for the booking system by creating the bookings table:

CREATE TABLE bookings (
  id uuid DEFAULT gen_random_uuid() NOT NULL,
  customer_name text NOT NULL,
  seat_count integer NOT NULL,
  PRIMARY KEY (id)
);

A booking has an ID, the name of the customer that owns it and the number of reserved seats.

Before we proceed to examining the isolation property of transactions, let us quickly remind ourselves of atomicity. This property ensures the all-or-nothing semantics for multiple SQL statements: either all succeed, or none do.

The first two customers of our booking platform are Alice and Bob.

Let’s say Alice decides to create two bookings: one for herself and one for Bob. If she does so without wrapping both INSERT statements in a transaction, she risks creating only one of the bookings:

Booking.create!(customer_name: 'Alice', seat_count: 1)
# Generates:
# INSERT INTO "bookings" ("customer_name", "seat_count")
# VALUES ('Alice', 1)
# RETURNING "id"
call_application_code # Raises an exception
Booking.create!(customer_name: 'Bob', seat_count: 1) # Is never reached

If execution is halted between the calls to create bookings - whether due to an exception, abrupt termination, or any other reason - the database remains in an inconsistent state from the application’s perspective: Alice does not want to go to the event alone, yet she has booked a seat only for herself.

If the table is queried at this point, Alice’s booking will show up:

SELECT "bookings"."customer_name" FROM "bookings";
-- => ["Alice"]

In order to remedy this, Alice can wrap the two INSERT statements in a transaction:

ActiveRecord::Base.transaction do
  Booking.create!(customer_name: 'Alice', seat_count: 1)
  call_application_code # Raises an exception
  Booking.create!(customer_name: 'Bob', seat_count: 1) # Is never reached
end

The following SQL is generated:

BEGIN;
INSERT INTO "bookings" ("customer_name", "seat_count") VALUES ('Alice', 1) RETURNING "id";
-- An exception raised in application code
ROLLBACK;

Once an exception is raised, the ActiveRecord ORM issues a ROLLBACK statement.

If the table is queried after this, it will show that Alice’s booking has not been committed:

SELECT "bookings"."customer_name" FROM "bookings";
-- => []

Isolation

Our booking system will almost certainly be used by many users simultaneously. To support this, the database needs to implement concurrency control.

The idea behind concurrency control is to give users the illusion that their transaction is the only one the database is currently processing, even if other transactions are reading or writing the same data.

There are two extremes the database could resort to in achieving this effect: it could either execute every transaction strictly sequentially - or serially - which would be disastrous for throughput, as it would prevent transactions that could otherwise safely run concurrently from doing so; or it could allow all transactions to execute concurrently without any restrictions, which would likely result in users seeing inconsistent data they would not expect to see.

Thankfully, it’s not a binary choice. As we will see, isolation exists on a spectrum.

Isolation levels are fixed points on this spectrum: the further left you go, the more anomalies occur and the less contention there is; the further right you go, the fewer anomalies occur at the cost of increased contention. This creates a subtle trade-off between performance and isolation.

Proper concurrency control allows for selectivity: while some transactions may execute at a lower isolation level, others that require absolute correctness in the presence of concurrent executions can use higher levels without directly impacting each other.

Postgres implements a multi-version flavour of concurrency control. Compared to lock-based concurrency control, its key distinction is that reads do not block writes. Interestingly, this approach completely prevents some anomalies that would otherwise be possible at the corresponding isolation level according to the SQL standard, which assumes lock-based control. However, as our exploration will show, MVCC also permits at least one anomaly that is impossible in databases using lock-based concurrency control.

Read Committed

Read committed is the default isolation level in Postgres unless reconfigured. It is the most relaxed level, allowing the highest number of anomalies to occur.

As we develop the booking system, we will see what exactly can go wrong at this level.

Dirty Read

Let’s test whether a transaction can see changes that have not yet been committed by another transaction executing at the same time. For clarity, the isolation level is explicitly specified in the following example, even though this is unnecessary since it is already the default.

Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
INSERT INTO "bookings" ("customer_name", "seat_count") VALUES ('Bob', 1) RETURNING "id"
BEGIN ISOLATION LEVEL READ COMMITTED
INSERT INTO "bookings" ("customer_name", "seat_count") VALUES ('Alice', 1) RETURNING "id"
SELECT "bookings"."customer_name" FROM "bookings" WHERE "bookings"."customer_name" = 'Bob'
-- => ["Bob"]
SELECT "bookings"."customer_name" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
-- => []
COMMIT
COMMIT

Bob is not able to see Alice’s booking, even though she has already executed an INSERT statement - this is because Alice has not yet executed a COMMIT. However, he is able to see his own booking, which he created at the beginning of the transaction.

This proves that Postgres does not allow dirty reads at its most relaxed isolation level.

Bob might assume that he can always rely on the consistency of the data he sees - if a booking appears, he might believe the transaction that committed it followed all consistency rules. His hypothesis is that if uncommitted changes from other concurrently executing transactions are invisible, then he can safely base decisions on any data he reads. However, this assumption is deceptive and could be fatal.

Lost Update

In order for the booking platform to evolve, some adjustments to the schema must be made. A new events table is created, which is now referenced by each booking through a new event_id column and a corresponding foreign key.

CREATE TABLE events (
  id text NOT NULL,
  available_seats integer NOT NULL,
  PRIMARY KEY (id)
);

ALTER TABLE bookings ADD COLUMN event_id TEXT NOT NULL;
ALTER TABLE bookings ADD CONSTRAINT fk_bookings_event FOREIGN KEY (event_id) REFERENCES events (id);

The purpose of the events table is to store the number of seats still available for booking. An alternative approach would be to store the total capacity of an event, calculate the number of already booked seats based on the bookings table, and determine the number of remaining seats each time it is needed. There are trade-offs between these choices. The latter option would be problematic for large events with many bookings - calculating the sum would not be a cheap operation in this case, and the problem would be exacerbated if the number of available seats had to be displayed frequently. Moreover, there are additional implications, which we will explore soon.

Let us populate the table with an event that has a grand total of two available seats.

INSERT INTO "events" ("id", "available_seats") VALUES ('event_a', 2) RETURNING "id"
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 2

Since the remaining capacity is tracked in a dedicated column, the responsibility for keeping it consistent and up to date falls on those making bookings - once they take a seat, they must decrement the counter.

Both Alice and Bob diligently follow this expectation. However, they attempt to book one seat each at the same time.

Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 2
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 2
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
UPDATE "events" SET "available_seats" = 1 WHERE "events"."id" = 'event_a'
COMMIT
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
UPDATE "events" SET "available_seats" = 1 WHERE "events"."id" = 'event_a'
COMMIT

Their algorithm is simple: start a transaction, read the number of available seats for the corresponding event, create a booking with one seat reserved, and then decrement the count of available seats.

They both read the event record before either of them has committed their transaction, seeing its initial state with a capacity of two. Bob then creates his booking, decrements the counter, and commits first.

The trouble arises when Alice commits her transaction. Since she based her decision to set available_seats to one on the initial capacity of two - before Bob had committed - and since Bob decremented it in the meantime, she overrides Bob’s update to the event instead of decrementing from the new capacity, which is one seat.

Let us query the tables after both transactions commit:

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 1
SELECT SUM("bookings"."seat_count") FROM "bookings"
-- => 2

The result is inconsistent - there are two bookings made, totaling two booked seats, while the number of available seats is one, as if only one seat was booked.

This anomaly is called a lost update. In this case, Bob’s update was overwritten and lost.

Bob and Alice decide to prevent such a situation by adjusting their statements.

Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
BEGIN ISOLATION LEVEL READ COMMITTED
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
COMMIT
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
COMMIT

Instead of calculating the new capacity based on a SELECT, they directly decrement the value by referencing the current value via the column name in their UPDATE statements.

Let’s see if this works.

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 0
SELECT SUM("bookings"."seat_count") FROM "bookings"
-- => 2

This time, Bob’s update is not lost, and the event’s capacity is correctly set to zero after both transactions commit.

Using UPDATE SET WHERE statements with additive updates is one way to avoid lost updates in scenarios like this, though there are some caveats, which we will uncover as we proceed.

You might be skeptical - does the new algorithm actually prevent lost updates, or was this outcome just a coincidence, with Bob and Alice simply getting lucky? Let’s reorder the operations so that both Alice and Bob first attempt to update the event before proceeding:

Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
-- Wait for ~3 seconds
BEGIN ISOLATION LEVEL READ COMMITTED
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a' (3245.9ms)
-- Issued before Bob's INSERT, returns after Bob commits
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
COMMIT
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
COMMIT

Alice’s update gets blocked until Bob’s transaction commits. Once it does, Alice’s UPDATE statement returns, allowing her to insert the booking and complete her transaction.

Alice and Bob make the following observations:

  • Concurrent updates to the same row are executed sequentially.
  • An update is not considered complete until the wrapping transaction commits.

This has an important implication: the longer a transaction runs and the more rows it updates, the greater its impact on other transactions that modify the same rows.

Of course, in this case, Bob deliberately waited three seconds just to demonstrate the effect. However, in real applications, inefficient application logic can easily lead to long-running transactions. This possibility is one reason why transaction duration should be carefully managed and closely monitored.

Another key detail that Alice infers is that once Bob’s transaction commits, her UPDATE statement must have fetched the latest version of the event - otherwise, how could it have correctly subtracted one from the new value of available_seats?

Constraints & Validations

The current implementation has a flaw: an event’s capacity can become negative since no checks are in place. One way to add these checks is at the application layer - in the code that implements the booking platform’s logic. The specific approach depends on the language and framework. Since our system uses Ruby’s ActiveRecord ORM, let’s add a model validation:

class Event < ApplicationRecord
  has_many :bookings

  validates :available_seats, numericality: { greater_than_or_equal_to: 0 }
end

This validation works by checking the new value of available_seats, which is generated by the application code when the ORM’s update! method is called on an instance of the Event model. If validation fails, the underlying UPDATE is not executed, and an exception is raised in the application.

Let’s reset the existing event and set it to have only one available seat:

INSERT INTO "events" ("id", "available_seats") VALUES ('event_a', 1) RETURNING "id"
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 1
Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a' LIMIT 1
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a' LIMIT 1
UPDATE "events" SET "available_seats" = 0 WHERE "events"."id" IS NULL
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
COMMIT
UPDATE "events" SET "available_seats" = 0 WHERE "events"."id" IS NULL
COMMIT

Final state:

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 1
SELECT SUM("bookings"."seat_count") FROM "bookings"
-- => 2

We are back to where we started - lost updates are possible again. For this validation to work, a record must first be loaded before it can be updated. In SQL terms, this means an UPDATE must be preceded by a SELECT - which is precisely the situation that caused Bob’s update to be lost in a previous scenario.

Advanced libraries or ORMs could work around this issue, or the application code could be adjusted to implement a more optimistic validation. The validation could modify the UPDATE statement to include the column in its RETURNING clause and execute only after the update finishes. At this point, the new value is guaranteed to be up to date since, as we’ve seen, UPDATEs block other transactions from modifying the same row until they commit. If the updated value violates the constraint, the transaction is aborted with a ROLLBACK.

The optimism of this check lies in the assumption that the vast majority of updates will not need to be rolled back. However, this approach could negatively impact database performance - if frequently triggered, it could lead to many pointless updates followed by rollbacks. Its practicality depends on the application’s expected workload and other factors, but it could be a viable option. In the case of our booking system, every attempt to reserve a seat after the event is fully booked would result in an update and rollback.

Let’s shift responsibility to the database and enforce the invariant that prevents overbooking using a constraint:

ALTER TABLE events
ADD CONSTRAINT events_available_seats_check
CHECK (available_seats >= 0);

Now that the available_seats values for events are constrained by the database and cannot go below zero, let’s see what happens when Alice and Bob try to book seats at the same time again:

Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
BEGIN ISOLATION LEVEL READ COMMITTED
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
COMMIT
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
ROLLBACK
-- -> raised ActiveRecord::StatementInvalid PG::CheckViolation: ERROR: new row for relation "events" violates check constraint "events_available_seats_check" DETAIL: Failing row contains (event_a, -1).

The moment Bob’s update executes, Postgres raises an error, indicating that the new value violates the constraint we introduced. As a result, his transaction is rolled back, and overbooking is prevented.

While the database constraint helps avoid a lost update by ensuring the invariant is upheld on its own, this might not be the optimal solution. What if, in the future, our booking system supports multiple event types? It might be acceptable to overbook certain types of events based on business requirements. Additionally, events of the same type might have more complex overbooking rules that cannot (and arguably should not) be enforced at the database level. While each event type could get its own table and constraints, this approach may not always be the best solution.

Given these concerns, it might be wiser to apply a minimal set of database constraints for fundamental invariant rules that should be upheld for all data shapes within the table, while more complex, contextual constraints could be handled at the application level.

Furthermore, database constraints should not be viewed primarily as tools for controlling concurrency. While we’ve used them here to avoid an anomaly, their primary role is to enforce data invariants and maintain its correctness.

Write Skew

For some reason, Bob and Alice do not want to go to an event now in case the other one is going.

Since the total capacity of the event is only two seats, and since Bob and Alice are the only users of the booking system, they intend to simply check whether available_seats is 2 in order to understand if the other person is going.

Unknowingly, they both rush to reserve a seat at the same time:

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 2
# Alice
ActiveRecord::Base.transaction do
  if Event.where(id: 'event_a').pluck(:available_seats).first > 1
    Event.decrement_counter(:available_seats, 'event_a', by: 1)

    Booking.create!(customer_name: 'Alice', seat_count: 1, event_id: 'event_a')
  end
end

# Bob
ActiveRecord::Base.transaction do
  if Event.where(id: 'event_a').pluck(:available_seats).first > 1
    Booking.create!(customer_name: 'Bob', seat_count: 1, event_id: 'event_a')

    Event.decrement_counter(:available_seats, 'event_a', by: 1)
  end
end
Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
COMMIT
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
COMMIT

Bob starts his transaction first and checks the amount of available seats. He sees that there are two seats, which makes him think that Alice has not booked hers yet. He then creates a booking, gets distracted, and does not finish his transaction.

In the meantime, Alice does the same - she checks the event’s capacity, infers that Bob is not going, books a seat, and commits her transaction.

Bob then comes back and finishes the booking process by decrementing the seat count and committing his transaction.

They both end up booking a seat, despite trying to avoid this exact situation:

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 0
SELECT SUM("bookings"."seat_count") FROM "bookings"
-- => 2

Bob encounters a write skew anomaly. The data that was read in a SELECT statement, which Bob used to make his decision, became stale by the time of the UPDATE.

In transactions with the default READ COMMITTED isolation level and without additional synchronization, it is generally never safe to make decisions based on data previously read with SELECT.

Disappointed by the outcome, Alice and Bob decide to change their statements to prevent this from happening in the future:

# Alice
ActiveRecord::Base.transaction do
  updated_event = ActiveRecord::Base.connection.execute(<<~SQL).to_a
    UPDATE events
    SET available_seats = available_seats - 1
    WHERE id = 'event_a' AND available_seats > 1
    RETURNING id
  SQL

  raise ActiveRecord::Rollback if updated_event.first.nil?

  Booking.create!(customer_name: 'Alice', seat_count: 1, event_id: updated_event.first['id'])
end

# Bob
ActiveRecord::Base.transaction do
  updated_event = ActiveRecord::Base.connection.execute(<<~SQL).to_a
    UPDATE events
    SET available_seats = available_seats - 1
    WHERE id = 'event_a' AND available_seats > 1
    RETURNING id
  SQL

  raise ActiveRecord::Rollback if updated_event.first.nil?

  Booking.create!(customer_name: 'Bob', seat_count: 1, event_id: updated_event.first['id'])
end
Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
BEGIN ISOLATION LEVEL READ COMMITTED
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE id = 'event_a' AND "available_seats" > 1 RETURNING "id"
-- => [{"id" => "event_a"}]
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE id = 'event_a' AND "available_seats" > 1 RETURNING "id"
-- Blocked until Alice commits
-- => []
COMMIT
ROLLBACK

Instead of doing a SELECT, they again resort to conditional UPDATEs. The invariant is preserved using the WHERE "available_seats" > 1 condition.

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 1
SELECT SUM("bookings"."seat_count") FROM "bookings"
-- => 1

As we’ve established, whenever UPDATE executes, it re-fetches the latest state of the row that it is modifying. In Bob’s case, the update waits until Alice commits, and once she does, the event no longer satisfies the condition, since available_seats gets set to 1 by Alice in the meantime. This makes Bob roll the transaction back.

Using single statements for conditional updates may help avoid some instances of write skew (and lost updates, as we’ve seen) in situations where the condition can be expressed directly in the WHERE statement. But be careful: directly is the key word here - read on to learn when it is not safe to rely on single commands.

Explicit Locking

Now let’s imagine that the event’s capacity is not two seats, but four.

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 4

The existing check that Alice and Bob use is no longer correct. If either of them books a seat, the available_seats counter will be equal to 3, which will incorrectly satisfy their conditions. Besides, their check was dubious to begin with, as someone else entirely could have booked a seat. They decide to make it right by querying for the actual booking belonging to the other person.

However, this means that they have to issue SELECT statements again, which could potentially result in a write skew.

In the end, some application logic rules and invariants cannot be adequately expressed at the database level, and even if they can be, arguably, they should not. There are enough cautionary tales of systems whose entire business logic resides in stored procedures, triggers, and other extended database functionality, documenting the aftermath - or rather, the fallout.

So, in some situations, data has to be read first and then modified. One way to make it safe at the read committed isolation level is to use explicit locking.

Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "events".* FROM "events" WHERE "events"."id" = 'event_a' FOR UPDATE
SELECT "bookings"."id" FROM "bookings" WHERE "bookings"."customer_name" = 'Bob'
-- => []
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
UPDATE "events" SET "available_seats" = 3 WHERE "events"."id" = 'event_a'
-- Wait for a bit
COMMIT
SELECT "events".* FROM "events" WHERE "events"."id" = 'event_a' FOR UPDATE (1600.5ms)
-- Issued before Alice's commit, returns afterwards
SELECT "bookings"."id" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
-- => ["1180b37e-edb3-4ef3-b53b-310d65a27c42"]
ROLLBACK
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 3
SELECT SUM("bookings"."seat_count") FROM "bookings"
-- => 1

The first statement that both Alice and Bob issue in their transactions is SELECT FOR UPDATE. It exhibits a behavior similar to UPDATE once obtained, this lock prevents other concurrent transactions from modifying, deleting, or obtaining any other kind of lock on the row. In fact, the reason why we earlier observed updates executing sequentially is that they obtain the exact same lock.

SELECT FOR UPDATE allows the application to serialize everything in a portion of a transaction after the lock is obtained and until the transaction either commits or rolls back.

However, even though it may be obvious, the guarantee that no concurrent modifications will happen applies only to the row being locked. In Alice and Bob’s case, if either of them did not try to lock the event at the beginning of the transaction, they would not have avoided a write skew, even though their decrement would have blocked as before. Even now, somebody else could sneakily insert Bob’s booking on his behalf after Alice’s SELECT FOR UPDATE and before her commit without trying to lock the event.

If explicit locks are needed to protect against other rows being created or updated, including rows in other tables, extra care should be taken when designing the application logic, as they can easily be bypassed.

While explicit locking makes it much easier to reason about concurrent scenarios, it, of course, incurs a cost.

For starters, the event cannot be updated by other transactions while the lock is being held. Other customers trying to book a seat will need to wait in line. In the case of a basic booking system, this might be desirable - after all, we don’t want to lose track of the event’s capacity, and allowing users to overbook, only to revoke their bookings later, would not result in a good user experience.

As with constraints, the appropriateness of explicit locking depends on the application and its workload. Some operations have to be executed serially for invariants to be preserved. The general rule of thumb is to avoid locking if possible, and when it is not, to lock the smallest possible sections of application code.

In the scenario above, if Alice or Bob decided to take their time with the commits for whatever reason, they would prevent other non-slacking customers from booking their own seats. Alternatively, the logic Alice and Bob use to decide whether they want to go might simply be too computationally complex. They might, due to a lack of oversight, run complex database queries one could consider analytical - add a missing index to this equation, and it becomes a recipe for disaster.

It is worth knowing that Postgres offers several other types of explicit locks in addition to FOR UPDATE. The main difference between is them is in the sets of conflicting and non-conflicting operations.

Unique Indices

A new requirement came in: each customer should be able to make no more than one booking for any given event.

It might not seem like an issue of concurrency - why would, say, Bob, try to create a booking concurrently? It makes sense to prevent a customer from making more than one booking consecutively, but why bother with race conditions? The answer is that a lot can go wrong - for example, Bob’s request to the application server might time out from his perspective but still execute on the backend. His client is most likely going to retry the request immediately, which opens a significant possibility that both of his requests will execute concurrently. Even outside of transient network issues or accidental duplicates, Bob might try to bypass the restriction with malevolent intent if he really wants to make separate bookings and suspects that the booking platform might not handle concurrent requests correctly.

With the experience we have gained so far, it becomes clear that checking for the presence of a booking via SELECT and deciding whether to INSERT or not leads to a write skew. Although an explicit lock could help with that, Alice and Bob have reconciled and are now ready to visit the event together, so they no longer need to check for each other’s booking. Since reading data is no longer necessary, and since we want to avoid excessive lock contention on the event, we decide to drop the SELECT FOR UPDATE statement in favour of an alternative.

There is an elegant way to enforce the new rule - a unique index.

CREATE UNIQUE INDEX index_bookings_one_per_customer
ON bookings (customer_name, event_id);

Now, let’s simulate two concurrent attempts to book a seat:

Bob Session A
Bob Session B
BEGIN ISOLATION LEVEL READ COMMITTED
BEGIN ISOLATION LEVEL READ COMMITTED
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
-- Wait for a bit
COMMIT
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id" (1533.4ms)
-- Issued before session A's commit, returns afterwards
ROLLBACK
-- -> raised ActiveRecord::RecordNotUnique PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_bookings_one_per_customer"
DETAIL: Key (customer_name, event_id)=(Bob, event_a) already exists.
SELECT COUNT(*) FROM "bookings" WHERE "bookings"."customer_name" = 'Bob'
-- => 1

Bob’s INSERT from the second session raises an error, indicating that the unique index was violated. The attempt to create a new row behaves like an UPDATE in this case - session B’s INSERT waits until session A’s transaction commits. Once it does, session B detects that a row with the same combination of customer_name and event_id already exists and raises an error.

There is another way to achieve uniqueness of rows - via unique constraints. Their benefit over unique indices is that constraints, in general, can be deferred, allowing the constraint to be evaluated at the time of COMMIT rather than immediately when the affecting statement is executed. However, while deferring foreign key constraints is nifty for certain schemas, the usefulness of deferring unique constraints is arguably dubious.

The benefit of unique indices over constraints is that indices can be built and dropped concurrently - a property that is extremely important for large tables, where ordinary index manipulations may take minutes, if not hours, exacerbated by the necessity to lock writes to the table when the index is being operated on non-concurrently.

INSERT ON CONFLICT

Let’s imagine a situation in which Bob books one seat but then immediately realizes that he wanted to book two. He is not sure if his booking was already created. He wants to avoid accidentally attempting to create a second booking and to avoid having to restart his second session in case his first transaction commits before it.

SELECT "events"."available_seats"
FROM "events"
WHERE "events"."id" = 'event_a';
-- => 4
Bob Session A
Bob Session B
BEGIN ISOLATION LEVEL READ COMMITTED
BEGIN ISOLATION LEVEL READ COMMITTED
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
-- Wait for a bit
COMMIT
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Bob', 2, 'event_a')
ON CONFLICT ("customer_name", "event_id")
DO UPDATE SET "seat_count" = "bookings"."seat_count" + 1
RETURNING CASE WHEN xmax != 0 THEN 'updated' ELSE 'inserted' END AS status
(1572.3ms)
-- Issued before session A's commit, returns afterwards
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
COMMIT

Here is the code of Bob’s session B:

ActiveRecord::Base.transaction do
  result = Booking.upsert(
    { customer_name: 'Bob', seat_count: 2, event_id: 'event_a' },
    on_duplicate: Arel.sql('seat_count = bookings.seat_count + 1'),
    unique_by: :index_bookings_one_per_customer,
    returning: Arel.sql("CASE WHEN xmax != 0 THEN 'updated' ELSE 'inserted' END AS status;")
  ).to_a.first

  case result['status']
  in 'inserted'
    Event.decrement_counter(:available_seats, 'event_a', by: 2)
  in 'updated'
    Event.decrement_counter(:available_seats, 'event_a', by: 1)
  end
end
SELECT COUNT(*) FROM "bookings" WHERE "bookings"."customer_name" = 'Bob';
-- => 1

SELECT "events"."available_seats"
FROM "events"
WHERE "events"."id" = 'event_a';
-- => 2

SELECT "bookings"."seat_count"
FROM "bookings"
WHERE "bookings"."customer_name" = 'Bob' ORDER BY "bookings"."id" ASC LIMIT 1;
-- => 2

Bob adds the ON CONFLICT clause to his INSERT. It achieves deduplication logic based on the given set of columns, for which a unique index is inferred and must be present for the clause to function in this particular case. If there is a conflict - which happens if a row already exists - the DO UPDATE SET clause is executed, incrementing the number of seats by one.

The ON CONFLICT clause guarantees that either the row will be inserted or the existing row will be updated, no matter how high the contention is.

The insert waits until session A commits, just like in the previous scenario.

Bob uses a trick to determine whether the row was created or updated by the statement, involving the xmax system column. In short, the xmax is zero when a row has just been inserted or contains the ID of a transaction that deleted it. However, since no deletion occurs here - and even if you are aware of the details of MVCC in Postgres and know that an update causes the previous version of the row to be ‘deleted’ - it may still not be obvious why this trick works. The RETURNING clause is supposed to return the new version of the row that was just created, not the previous one. The truth is that in this case, xmax is populated because the INSERT ON CONFLICT statement obtains a lock on the new version during its execution. This is an implementation detail and could change in future releases of Postgres, so this method should be used with caution.

Since session B updates the booking that was concurrently created by session A, the event’s available_seats counter is decremented by one. If session B was the first to execute the INSERT, it would have created the booking with two seats and decremented the counter by two.

As a result, Bob’s session B does not end up raising an error, and he is able to efficiently update his booking.

The ON CONFLICT clause is a powerful way to achieve sequential execution of ‘create-or-update’ logic with simple rules while avoiding certain cases of lost updates and write skew, provided that the object of the update can be uniquely identified using a unique index or constraint.

Postgres 15 introduced the MERGE command, which is similar to INSERT ON CONFLICT in functionality, save for one important difference that makes it less useful in scenarios like the one we just examined - MERGE would have raised in session B instead of re-fetching the concurrently inserted booking and updating it.

Optimistic Locking

Let’s introduce a new feature: the ability for multiple customers to share bookings. Sharing implies that users should be able to modify bookings that they have access to.

It becomes clear that bookings must be protected from concurrent updates - otherwise, at the very least, lost updates are almost guaranteed.

While using explicit locking, also known as pessimistic locking, may be tempting, there is another option - optimistic locking. To support it, the bookings schema needs to be adjusted.

ALTER TABLE bookings 
ADD COLUMN lock_version INTEGER NOT NULL DEFAULT 0;

The idea of optimistic locking is simple: have a column that tracks the logical version of the row. Each read should include the lock’s current value in its result set, and each update should be conditionally based on the lock version that the client previously retrieved. If the lock version changes in the meantime, the update simply does not run, and the client is expected to treat this as an error.

The name of the column can be anything, but we name it lock_version since ActiveRecord supports optimistic locking out of the box and expects this to be the column name by default.

Let’s see it in practice. Alice initially booked a seat for herself, but now she and Bob have decided to attend the event together.

INSERT INTO "bookings" ("customer_name", "seat_count", "event_id", "lock_version")
VALUES ('Alice', 1, 'event_a', 0) RETURNING "id";

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 3

Now, both Alice and Bob concurrently attempt to update Alice’s booking to have two seats using the same application code.

ActiveRecord::Base.transaction do
  booking = Booking
    .select(:id, :seat_count, :lock_version)
    .find_by!(customer_name: 'Alice')

  if booking.seat_count == 1
    Event.decrement_counter(:available_seats, 'event_a', by: 1)

    # ActiveRecord automatically makes necessary changes to the
    # generated SQL statement if it sees that the `lock_version`
    # column exists.
    booking.update!(seat_count: 2)
  end
end
Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "bookings"."id", "bookings"."seat_count", "bookings"."lock_version" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice' LIMIT 1
-- => [1, 0]
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "bookings"."id", "bookings"."seat_count", "bookings"."lock_version" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice' LIMIT 1
=> [1, 0]
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
UPDATE "bookings" SET "seat_count" = 2, "lock_version" = 1 WHERE "bookings"."id" = '5e3758e7-ffbd-4471-b603-c08da0a5987d' AND "bookings"."lock_version" = 0
COMMIT
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
UPDATE "bookings" SET "seat_count" = 2, "lock_version" = 1 WHERE "bookings"."id" = '5e3758e7-ffbd-4471-b603-c08da0a5987d' AND "bookings"."lock_version" = 0
ROLLBACK
-- -> raised ActiveRecord::StaleObjectError Attempted to update a stale object: event_with_optimistic_locking.
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 2
SELECT SUM("bookings"."seat_count") FROM "bookings"
-- => 2

The outcome is correct: the available_seats counter was decremented only once, and the booking was updated to have two seats.

Both Bob and Alice read the lock_version along with the seat count of Alice’s booking, on which they base their decision to modify it. They both receive the same values: 1 seat and lock version 0.

Bob then updates the booking to have a seat count of 2, while at the same time bumping the lock version from 0 to 1. The most important detail is that the UPDATE is conditional: it has a WHERE clause that ensures execution only if the current lock_version is still 0. This condition holds, so Bob successfully commits his transaction.

Alice has also decided to update the booking, but when she issues her UPDATE statement, it re-fetches the latest version of the booking and does not execute - since the lock_version has been bumped to 1 in the meantime.

The application code - or more precisely, the ORM’s framework code in this case - detects that no rows were altered by the update and raises a corresponding exception, rolling the transaction back.

Optimistic locking, like pessimistic locking, has helped Bob and Alice prevent both a lost update and a write skew.

How do these two locking approaches to locking compare?

The biggest advantage of optimistic locking is the ability to span transactions. Let’s extrapolate this situation to a more realistic scenario: customers will most likely read and modify their bookings through a browser, which sends requests to the booking system’s application server. This process consists of three stages: displaying the data, modifying it, and then saving it. Although it is possible to implement, most modern applications do not hold a transaction open from stage one to stage three - i.e., across multiple requests to the server. Doing so would be considered an egregious waste of resources. We have already seen one of the consequences of long-running transactions: other customers would have to wait.

However, our earlier discussion of long-running transactions involved explicitly locking an event - a point of high contention, which is not the case here. Locking a single booking would not have a significant impact on customers other than those sharing the booking. But long-running transactions, in general, can have severe performance implications on the vacuuming process, timeliness of which is essential maintaining database health.

Due to this, one of the most efficient ways to prevent lost updates across transactions (i.e. across separate requests to the application server), is to use optimistic locking.

Bob
Alice
-- Request 1
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "bookings".* FROM "bookings" WHERE "bookings"."customer_name" = 'Alice' LIMIT 1
COMMIT
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "bookings"."id", "bookings"."seat_count", "bookings"."lock_version" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice' LIMIT 1
UPDATE "events" SET "available_seats" = COALESCE("available_seats", 0) - 1 WHERE "events"."id" = 'event_a'
UPDATE "bookings" SET "seat_count" = 2, "lock_version" = 1 WHERE "bookings"."id" = 'd3502e1a-9d84-4af5-9979-f9e997e72e68' AND "bookings"."lock_version" = 0
COMMIT
-- Request 2
BEGIN ISOLATION LEVEL READ COMMITTED
UPDATE "events" SET "available_seats" = COALESCE("available_seats", 0) - 1 WHERE "events"."id" = 'event_a'
UPDATE "bookings" SET "customer_name" = 'Bob', "seat_count" = 2, "lock_version" = 1 WHERE "bookings"."id" = 'd3502e1a-9d84-4af5-9979-f9e997e72e68' AND "bookings"."lock_version" = 0
ROLLBACK
-- -> raised ActiveRecord::StaleObjectError Attempted to update a stale object: event_with_optimistic_locking.

Bob reads the data in one transaction and updates it in another - a much more realistic scenario. The outcome remains the same since the lock is baked into the data itself.

You might notice that optimistic locking employs the same technique we used earlier to prevent lost updates: a conditional UPDATE with a WHERE clause. In a way, this is a more generic approach that allows the application code to implement any kind of checks without being restricted to formulating them within the WHERE clause.

However, optimistic locking has a disadvantage compared to pessimistic locking in certain scenarios. Imagine if we used it in the previous case, where the event was locked using SELECT FOR UPDATE. The vast majority of requests from unlucky customers would fail to execute successfully, as each successful update increments the lock version, causing all concurrent transactions to fail and forcing them to retry - often multiple times. Under high contention, optimistic locking performs significantly worse than pessimistic locking, since the number of rollbacks outweighs the benefits of forming an explicit blocking queue of transactions.

As always, the applicability of optimistic locking depends on the nature of the application and its workload.

Non-Repeatable Read

Before we proceed, let’s examine some of the most fundamental anomalies. We have already encountered them indirectly, but now we’ll look at minimal reproducible examples.

INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Alice', 1, 'event_a')
RETURNING "id";
Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
-- => [1]
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice'
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
-- => [2]
COMMIT

Bob encounters a non-repeatable read anomaly. He first reads the initial state of Alice’s booking. However, after repeating the SELECT, he sees the updated version that Alice has committed concurrently. Both reads occur within the same transaction.

This demonstrates that the state of data can change between commands at the read committed isolation level. Moreover, multiple versions of the same row can be seen throughout a single transaction. If you think about it, you might realize that the UPDATE SET WHERE condition was able to prevent lost updates for the same reason non-repeatable reads are possible - once the update executes, it re-fetches the latest version of the row, even if it was not visible at the start of the transaction. While this behavior was desirable in the earlier case, it can lead to inconsistencies in other situations.

Phantom Read

The other anomaly is quite similar to non-repeatable reads from the technical standpoint, but it is perceived in a slightly different manner.

INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Alice', 1, 'event_a')
RETURNING "id"
Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
-- => [1]
BEGIN
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
COMMIT
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
-- => [1, 1]
COMMIT

Now, Bob experiences a phantom read anomaly. He observes completely new rows that were inserted concurrently by another transaction.

If you are aware that each row’s data is stored as multiple versions in PostgreSQL, you might consider phantom and non-repeatable reads to be the same. After all, both occur because concurrently created versions of rows become visible after they are committed. The only difference is that non-repeatable reads typically occur due to concurrent updates, whereas phantom reads occur due to concurrent inserts.

From an application perspective, phantom reads are arguably more problematic. Applications may not frequently re-read the same rows1 within a single transaction, making non-repeatable reads less common. However, phantom reads are more likely to affect certain read patterns, where new rows appearing concurrently could cause unexpected behavior, as we will explore in the next anomaly.

Read Skew

Let’s now simulate a scenario where Alice updates two bookings that she shares with Bob, while Bob attempts to read them at the same time.

INSERT INTO "events" ("id", "available_seats")
VALUES ('event_a', 3)
RETURNING "id";

INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Alice', 1, 'event_a')
RETURNING "id";

INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Bob', 2, 'event_a')
RETURNING "id";
Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
-- => [1]
BEGIN ISOLATION LEVEL READ COMMITTED
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice'
UPDATE "bookings" SET "seat_count" = 1 WHERE "bookings"."customer_name" = 'Bob'
COMMIT
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Bob'
-- => [1]
COMMIT

Bob witnesses a situation which should be impossible: in his view, Alice’s booking remains in its initial state, with a seat count of 1, while his own booking’s seat count has already been updated from 2 to 1. The result is inconsistent, as it neither reflects the initial state of both bookings nor their final state after Alice’s updates. This anomaly is known as read skew.

Bob realizes that what truly matters to him is the total number of seats he and Alice booked together. Instead of selecting each booking individually, he decides to use the SUM operator:

Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
BEGIN ISOLATION LEVEL READ COMMITTED
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice'
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob')
-- => 3
UPDATE "bookings" SET "seat_count" = 1 WHERE "bookings"."customer_name" = 'Bob'
COMMIT
COMMIT

Even though this time the result appears correct, Bob still questions whether this is merely a coincidence or if using a single statement with an aggregate function is a reliable way to avoid read skew. After all, Alice’s booking could have been committed while his SELECT query was iterating over the table.

The answer is that a single statement, no matter how long it executes, will only see data as it existed at the start of its execution.

However, there is an exception - queries that use VOLATILE functions. If such a function reads data independently and is executed within a transaction using the read committed isolation level, read skew can occur again. Fortunately, SUM is not volatile.

But what happens if the query is too complex to be expressed as a single function?

Bob and Alice now face a more intricate issue. They planned to attend two events together, but without coordination, they made separate bookings. Bob, assuming they would book together, reserved two seats for each event. Alice, thinking they would book separately, made one-seat bookings for herself.

INSERT INTO "events" ("id", "available_seats")
VALUES ('event_a', 3) RETURNING "id";
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Alice', 1, 'event_a') RETURNING "id";
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Bob', 2, 'event_a') RETURNING "id";

INSERT INTO "events" ("id", "available_seats")
VALUES ('event_b', 4) RETURNING "id";
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Alice', 1, 'event_b') RETURNING "id";
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Bob', 2, 'event_b') RETURNING "id";

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_b'
-- => 4

Later, they realize the discrepancy and rush to fix it - again, without coordinating.

Since their bookings are shared, Alice decides to assist Bob by adjusting his booking for event B, reducing his reserved seats to just one.

Meanwhile, Bob takes a more holistic approach: he decides to return one seat from all of his bookings for both events A and B.

Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
UPDATE "bookings" SET seat_count = seat_count - 1 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_b'
UPDATE "events" SET "available_seats" = "available_seats" + 1 WHERE "events"."id" = 'event_b'
BEGIN ISOLATION LEVEL READ COMMITTED
COMMIT
UPDATE "bookings" SET "seat_count" = "seat_count" - 1 WHERE "bookings"."id" IN ( SELECT "bookings"."id" FROM "bookings" WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" IN ( SELECT "bookings"."event_id" FROM "bookings" WHERE "bookings"."customer_name" IN ('Bob', 'Alice') GROUP BY "bookings"."event_id" HAVING (SUM("seat_count") > 2) ) ) RETURNING "id", "event_id" (1891.5ms)
-- Issued before Alice' commit, returns afterwards
=> [{"id" => "b1c61073-b916-47be-ba81-f5d049e44b9b", "event_id" => "event_b"}, {"id" => "c8b630c2-ab93-4da6-8201-ec4571d9e584", "event_id" => "event_a"}]
UPDATE "events" SET "available_seats" = "available_seats" + 1 WHERE "events"."id" = 'event_b'
UPDATE "events" SET "available_seats" = "available_seats" + 1 WHERE "events"."id" = 'event_a'
COMMIT

In the end, Bob’s booking for event B ends up with zero seats - an unintended outcome.

SELECT "events"."available_seats"
FROM "events"
WHERE "events"."id" = 'event_b';
-- => 6

SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."event_id" = 'event_b';
-- => 1

Let’s break it down. Alice starts her transaction and decrements the seat_count of Bob’s booking. Shortly after, Bob begins his transaction and executes a hefty UPDATE statement with nested SELECTs. Around the same time, Alice commits her transaction.

We have already seen that UPDATEs on the same row are queued. Once Alice commits, Bob’s update executes, affecting two rows - both of his bookings for events A and B. He then returns the seats by incrementing the counters of both events.

But why did the update affect Bob’s booking for event A, even though it was concurrently modified by Alice and, theoretically, should have been excluded from Bob’s selection?

For clarity, let’s examine Bob’s application code and the query itself:

ActiveRecord::Base.transaction do
  updated_bookings = AcitveRecord::Base.connection.execute(<<~SQL).to_a
    UPDATE bookings SET seat_count = seat_count - 1
    WHERE bookings.id IN (
      SELECT bookings.id
      FROM bookings
      WHERE bookings.customer_name = 'Bob'
      AND bookings.event_id IN (
        SELECT bookings.event_id
        FROM bookings
        WHERE bookings.customer_name IN ('Bob', 'Alice')
        GROUP BY bookings.event_id
        HAVING (SUM(seat_count) > 2)
      )
    )
    RETURNING id, event_id
  SQL

  updated_bookings.each do |booking|
    Event.increment_counter(:available_seats, booking['event_id'], by: 1)
  end
end

The query consists of three parts:

  • The innermost SELECT reads the bookings belonging to Bob and Alice, groups them by event_id, and selects the event IDs where Bob and Alice together reserved more than two seats in total. This query is meant to identify events affected by their initial predicament.
  • The outer SELECT selects only Bob’s bookings for the affected events.
  • The UPDATE then applies a WHERE clause that affects only the bookings whose IDs were returned by the inner SELECT - Bob’s bookings.

Bob intended for the query to ignore his bookings that had already been updated, from two seats to one. Given that Alice updated Bob’s booking for event A first, the outcome might seem surprising.

This result becomes less surprising when you realize that a single SQL command is not guaranteed to execute with a consistent, static view of the database.

Both nested SELECT statements were executed shortly before Alice’s commit - the innermost query returned the event IDs, and the outer query returned both of Bob’s bookings. Once Alice committed, the UPDATE started executing. It updated Bob’s booking for event A since it was not affected by Alice’s change. However, when the query reached Bob’s booking for event B, it was blocked by a concurrent update from Alice’s uncommitted transaction.

Once Bob’s UPDATE was unblocked and continued execution, the row representing Bob’s booking for event B was re-read. The IN condition in the WHERE clause simply checked whether the booking’s ID was in the result set from the previous SELECT. Once the row was re-read after Alice’s update, the condition was satisfied, since the booking’s ID was still part of the previously materialized result set, and the update was applied to the booking for event B as well.

While Bob’s update wasn’t lost - since it involves a decrement, which must fetch the latest version of the booking - the decision to perform the update was based on an outdated version of the data. When first read by the query, Bob’s and Alice’s bookings for event B made it appear that Bob needed to decrement the seat count, so the update was applied. However, by the time the update was actually performed, Bob’s booking was in a different state, resulting in a sneaky instance of read skew.

Another important observation is that during the UPDATE, event A was seen at its original state, while event B’s state was re-read due to Alice’s update. This means event B’s data was “from the future” relative to event A’s. While this wasn’t the biggest issue in this example, it could lead to inconsistencies in more complex scenarios. It’s important to be cautious with multi-row updates that may have concurrent counterparts, which also applies to explicit locking mechanisms like FOR UPDATE.

This situation highlights a key issue: even when multiple queries are bundled into a single SQL command, significant gaps may exist between their execution phases - gaps during which concurrent transactions can modify data that was previously read.

At this point, you might realize that the key difference between this scenario and the one where we prevented write skews earlier (using the WHERE "available_seats" > 1 clause) lies in how the condition is expressed. The previous example worked because once the update unblocked and the row was re-read, the condition could be fully re-evaluated, as there were no nested queries and it was fully expressed using the > operator. But when the condition is expressed as WHERE IN (SELECT ...), the inner query is not re-run after the update is unblocked. The original result set, which contains outdated data, is used instead.

If the logic must be implemented using separate queries, the transaction should employ appropriate locking mechanisms, which we’ve already discussed. Another alternative is to elevate the isolation level.

Repeatable Read

We have observed several anomalies that can occur at PostgreSQL’s lowest isolation level. Now, let’s explore the next isolation level - repeatable read.

Before diving into practical examples, let’s clarify some terminology we previously glossed over.

Our observations show that the anomalies observed at the read committed isolation level arise because each SQL query operates with its own view of the database, which is called a snapshot. A snapshot is essentially a set of special numbers that dictate which row versions should be visible. We have already encountered one of these special numbers: the hidden xmax system column, which Bob used in his INSERT ON CONFLICT statement to determine whether a row was created or updated. These numbers track the IDs of transactions that created, deleted, or otherwise modified a particular version of a row. These row versions are also referred to as tuples.

To understand the difference between read committed and repeatable read isolation levels, it is enough to realize that repeatable read pins the snapshot at the moment the first query is executed in a transaction. All subsequent queries within the transaction will use this same snapshot instead of capturing their own. In contrast, read committed transactions capture a new snapshot for every query, leading to potential inconsistencies between queries within the same transaction.

Let’s revisit previous examples and see how they change when the isolation level is upgraded.

Lost Update

Lost updates are not possible at the repeatable read level:

Bob
Alice
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
=> 2
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
=> 2
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
UPDATE "events" SET "available_seats" = 1 WHERE "events"."id" = 'event_a'
COMMIT
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
UPDATE "events" SET "available_seats" = 1 WHERE "events"."id" = 'event_a'
ROLLBACK
-> raised ActiveRecord::SerializationFailure PG::TRSerializationFailure: ERROR: could not serialize access due to concurrent update

What previously resulted in an overwrite on Alice’s part now triggers a serialization failure.

Once the update detects that the row has changed in the meantime, the transaction gets aborted.

If Bob’s transaction were at the repeatable read level, here is what would have happened:

Bob
Alice
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
=> 2
BEGIN ISOLATION LEVEL READ COMMITTED
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
=> 2
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
UPDATE "events" SET "available_seats" = 1 WHERE "events"."id" = 'event_a'
COMMIT
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
UPDATE "events" SET "available_seats" = 1 WHERE "events"."id" = 'event_a'
COMMIT

As we can see, the serialization failure is not raised for Bob. This makes sense: Bob’s update did not have to contend with another concurrent update. It was Alice’s update that was supposed to detect contention, but since she reverted to the READ COMMITTED isolation level, her update proceeded without any issues.

Even though it might seem obvious, every transaction that has potential to encounter such race conditions should be started with the repeatable read isolation level.

Write Skew

Write skew is impossible at the repeatable read isolation level if updates happen to the same rows.

Bob
Alice
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Bob', 1, 'event_a') RETURNING "id"
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
COMMIT
UPDATE "events" SET "available_seats" = "available_seats" - 1 WHERE "events"."id" = 'event_a'
ROLLBACK
-> raised ActiveRecord::SerializationFailure PG::TRSerializationFailure: ERROR: could not serialize access due to concurrent update
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
COMMIT

Bob encounters a serialization failure because the event he attempts to update has been concurrently modified by Alice. Serialization failures, despite being errors, are meant to be retried. The failure simply indicates that the operation failed due to a conflict - one of the subsequent retries of the same operation will eventually succeed.

ActiveRecord::Base.transaction do
  if Event.where(id: 'event_a').pluck(:available_seats).first > 1
    Booking.create!(customer_name: 'Bob', seat_count: 1, event_id: 'event_a')

    Event.decrement_counter(:available_seats, 'event_a', by: 1)
  end
rescue ActiveRecord::SerializationFailure
  retry
end

Once Bob retries, his SELECT check works correctly, and he avoids updating the booking.

Non-Repeatable Read

Non-repeatable reads are impossible at the repeatable read isolation level:

Bob
Alice
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
=> [1]
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice'
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
=> [1]
COMMIT

Phantom Read

Phantom reads are impossible at the repeatable read isolation level:

Bob
Alice
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
=> [1]
BEGIN
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id") VALUES ('Alice', 1, 'event_a') RETURNING "id"
COMMIT
SELECT "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" = 'Alice'
=> [1]
COMMIT

Read Skew

Simple read skews, such as the one we examined in the read committed isolation level, are impossible at the repeatable read level.

Let’s see what happens to the seemingly more complicated scenario, the one with a big UPDATE statement:

INSERT INTO "events" ("id", "available_seats")
VALUES ('event_a', 3) RETURNING "id";
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Alice', 1, 'event_a') RETURNING "id";
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Bob', 2, 'event_a') RETURNING "id";

INSERT INTO "events" ("id", "available_seats")
VALUES ('event_b', 4) RETURNING "id";
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Alice', 1, 'event_b') RETURNING "id";
INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Bob', 2, 'event_b') RETURNING "id";

SELECT "events"."available_seats" FROM "events" WHERE "events"."id" = 'event_a'
-- => 4
Bob
Alice
BEGIN ISOLATION LEVEL REPEATABLE READ
UPDATE "bookings" SET seat_count = seat_count - 1 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_b'
UPDATE "events" SET "available_seats" = "available_seats" + 1 WHERE "events"."id" = 'event_b'
BEGIN ISOLATION LEVEL REPEATABLE READ
UPDATE "bookings" SET "seat_count" = "seat_count" - 1 WHERE "bookings"."id" IN (SELECT "bookings"."id" FROM "bookings" WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" IN (SELECT "bookings"."event_id" FROM "bookings" WHERE "bookings"."customer_name" IN ('Bob', 'Alice') GROUP BY "bookings"."event_id" HAVING (SUM("seat_count") > 2))) RETURNING "id", "event_id"
-- Issued before Alice's commit
COMMIT
ROLLBACK
-> raised ActiveRecord::SerializationFailure PG::TRSerializationFailure: ERROR: could not serialize access due to concurrent update
BEGIN ISOLATION LEVEL REPEATABLE READ
UPDATE "bookings" SET "seat_count" = "seat_count" - 1 WHERE "bookings"."id" IN (SELECT "bookings"."id" FROM "bookings" WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" IN (SELECT "bookings"."event_id" FROM "bookings" WHERE "bookings"."customer_name" IN ('Bob', 'Alice') GROUP BY "bookings"."event_id" HAVING (SUM("seat_count") > 2))) RETURNING "id", "event_id"
=> [{"id" => "3499dad3-124a-49df-b387-5e5836a87a89", "event_id" => "event_a"}]
UPDATE "events" SET "available_seats" = "available_seats" + 1 WHERE "events"."id" = 'event_a'
COMMIT
SELECT "events"."available_seats"
FROM "events"
WHERE "events"."id" = 'event_b';
-- => 5

SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."event_id" = 'event_b';
-- => 2

This time, the outcome is correct. Despite their lack of coordination, Bob and Alice safely return exactly one seat each from Bob’s bookings, as intended.

Bob’s initial update triggers a serialization failure, similar to a lost update or write skew at the repeatable read level. When he retries his transaction, only his booking for event A is updated - exactly as he had originally intended.

Write Skew With Disjoint Write Sets

However, the repeatable read isolation level does not eliminate all anomalies. It is known to allow at least two types of anomalies, and potentially more that are yet to be discovered.

As a small token of appreciation for their unintentional testing of the booking system, Bob and Alice receive one extra seat. Instead of adjusting the event’s available_seats counter, they simply add the seat to one of their bookings by increasing seat_count.

Excited by the unexpected gift, they decide to invite a friend. In their enthusiasm, they both attempt to add the extra seat - once again, without coordinating.

INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Alice', 1, 'event_a') RETURNING "id";

INSERT INTO "bookings" ("customer_name", "seat_count", "event_id")
VALUES ('Bob', 1, 'event_a') RETURNING "id";
# Alice
ActiveRecord::Base.transaction(isolation: :repeatable_read) do
  seat_count = Booking
    .where(customer_name: %w[Alice Bob], event_id: 'event_a')
    .sum(:seat_count)

  if seat_count == 2
    Booking
      .where(customer_name: 'Alice', event_id: 'event_a')
      .update_all(seat_count: 2)
  end
end

# Bob
ActiveRecord::Base.transaction(isolation: :repeatable_read) do
  seat_count = Booking
    .where(customer_name: %w[Alice Bob], event_id: 'event_a')
    .sum(:seat_count)

  if seat_count == 2
    Booking
      .where(customer_name: 'Bob', event_id: 'event_a')
      .update_all(seat_count: 2)
  end
end

This time, they assume their transactions are safe since both are running at the repeatable read level. After all, they previously saw that basing write decisions on data read earlier in the same transaction was now reliable. However, the result does meet their expectations:

Bob
Alice
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a'
COMMIT
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
COMMIT
SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a';
-- => 2

SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a';
-- => 2

They are both allowed to claim the free seat as none of them encounter a serialization failure, which from their perspective is unexpected. Their checks, which would have worked had they tried to update the same booking, do not catch on to the fact that they are about to book two extra seats, while they were only given one.

This happens because they do not update any shared rows - Alice updates her booking, while Bob updates his. They encounter a write skew with disjoint write sets, since there is no overlap in data that they modify.

Advisory Locking

Alice and Bob remember that locking has saved them from similar situations before.

They wonder whether optimistic locking could help here but quickly realize that it cannot, as there is no clear shared row whose version they can track and update.

They arrive to a conclusion that explicit locking of a specific row also seems dubious - they could have locked the event, but they decide against it since they do not actually need to update it in the first place, and they want to minimize impact on other users.

Instead, they opt for advisory locks. These locks function similarly to explicit row-level locks, except that they are not tied to a particular row. Advisory locks are obtained using a bigint key defined by the application.

With advisory locks, their application code looks like this:

# Alice
ActiveRecord::Base.transaction(isolation: :repeatable_read) do
  lock_key = Zlib.crc32('alice:bob')
  execute("SELECT pg_advisory_xact_lock('#{lock_key}')")

  seat_count = Booking
    .where(customer_name: %w[Alice Bob], event_id: 'event_a')
    .sum(:seat_count)

  if seat_count == 2
    Booking
      .where(customer_name: 'Alice', event_id: 'event_a')
      .update_all(seat_count: 2)
  end
end

# Bob
ActiveRecord::Base.transaction(isolation: :repeatable_read) do
  lock_key = Zlib.crc32('alice:bob')
  execute("SELECT pg_advisory_xact_lock('#{lock_key}')")

  seat_count = Booking
    .where(customer_name: %w[Alice Bob], event_id: 'event_a')
    .sum(:seat_count)

  if seat_count == 2
    Booking
      .where(customer_name: 'Bob', event_id: 'event_a')
      .update_all(seat_count: 2)
  end
end

Even though the lock’s key must be an integer, it is possible to map arbitrary data (such as strings) onto a number using hashing or checksumming - which Alice and Bob do. Instead of picking an arbitrary number with no semantic meaning, they agree to use their concatenated, lowercased names as input for the key.

Bob
Alice
BEGIN ISOLATION LEVEL REPEATABLE READ
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT pg_advisory_xact_lock('2127939797')
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a'
COMMIT
SELECT pg_advisory_xact_lock('2127939797') (1493.7ms)
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
COMMIT

However, even with the advisory lock in place, they still encounter the same problem. This is because they use transaction-level advisory locks. These locks can only be obtained within a transaction and are released automatically after it commits or rolls back.

The problem is that when combined with repeatable read (or a higher isolation level), obtaining a transaction-level advisory lock - which involves executing a SELECT statement with a special function - pins the transaction’s snapshot. Since this snapshot can be taken before the other transaction commits, Bob, after acquiring the lock and continuing his execution, will still be using the old snapshot - meaning he does not see Alice’s updated booking.

This issue does not occur at the read committed isolation level, because at that level, each SELECT statement gets a new snapshot. If they had used a lower isolation level, Bob’s SELECT SUM query would have correctly retrieved a fresh snapshot.

There is a way to remedy this at the repeatable read isolation level by using session-level advisory locks.

# Bob's code, equivalent for Alice
lock_key = Zlib.crc32('alice:bob')
ActiveRecord::Base.connection.execute("SELECT pg_advisory_lock('#{lock_key}')")

ActiveRecord::Base.transaction(isolation: :repeatable_read) do
  seat_count = Booking
    .where(customer_name: %w[Alice Bob], event_id: 'event_a')
    .sum(:seat_count)

  if seat_count == 2
    Booking
      .where(customer_name: 'Bob', event_id: 'event_a')
      .update_all(seat_count: 2)
  end
end

ActiveRecord::Base.connection.execute("SELECT pg_advisory_unlock('#{lock_key}')")

Session-level locks are obtained using a slightly different function and naturally, have can be acquired outside of a transaction.

Bob
Alice
SELECT pg_advisory_lock('2127939797')
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a'
COMMIT
SELECT pg_advisory_lock('2127939797')
SELECT pg_advisory_unlock('2127939797')
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 3
COMMIT
SELECT pg_advisory_unlock('2127939797')

This time, Alice and Bob acquire the locks before starting their transactions. As a result, Bob sees Alice’s updated booking before proceeding. Since he is no longer forced to operate on a stale snapshot due to having to acquire the lock, he correctly avoids claiming the extra seat again.

Advisory locks are useful when there is no concrete row to lock. However, they require careful handling, as any update not using the same lock will bypass it completely. This differs from explicit row-level locking, where a locked row cannot be updated by any concurrent transaction - even if that transaction does not explicitly execute SELECT FOR UPDATE first.

Read-Only Transaction Anomaly

In order to explore the final anomaly, let’s make a small adjustment to the bookings schema: its id is no longer a uuid but rather an integer:

DROP TABLE IF EXISTS bookings;
CREATE TABLE bookings (
  id integer PRIMARY KEY,
  customer_name text NOT NULL,
  seat_count integer NOT NULL,
  event_id text NOT NULL,
  FOREIGN KEY (event_id) REFERENCES events (id)
);

Now, let’s reimagine the previous scenario where Bob and Alice are gifted a free seat. This time, they decide to book it differently. The initial state of bookings is as follows:

INSERT INTO "bookings" ("id", "customer_name", "seat_count", "event_id")
VALUES (1, 'Alice', 1, 'event_a') RETURNING "id";

INSERT INTO "bookings" ("id", "customer_name", "seat_count", "event_id")
VALUES (2, 'Bob', 1, 'event_a') RETURNING "id";

At first, Alice and Bob agree that Alice will update her own booking to include two additional seats, while Bob will return his seat.

However, time passes, and Alice forgets about their arrangement. Instead of adding two extra seats to her booking, she only adds one.

Bob, wanting to play it safe, decides to back Alice up. If she forgets to update her booking, he will take the extra seat himself. On the other hand, if he detects that the total number of seats reserved between them is no longer two, he assumes Alice has already added two extra seats as agreed and returns his own seat by setting his booking’s seat_count to zero.

Alas, they again proceed without coordination.

Bob
Alice
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
BEGIN ISOLATION LEVEL REPEATABLE READ
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."id" = 1
-- Alice's booking has ID 1
COMMIT
...

At this point, Alice has already committed her transaction, but Bob’s transaction started before that - he has not committed yet. If Bob commits now, he is doomed to overbook because his decision to return the extra seat is based on data he read before Alice’s transaction occurred. A serialization failure is not raised, just as in the write skew with disjoint sets scenario - because they do not update any shared rows.

Conceptually, Bob’s transaction runs entirely before Alice’s. The outcome is unfortunate but expected, as Alice does not implement any checks on her end.

However, if a third transaction starts now before Bob commits and reads the bookings table, it will see an interesting result:

Bob
Observer
Alice
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
BEGIN ISOLATION LEVEL REPEATABLE READ
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."id" = 1
-- Alice's booking has ID 1
COMMIT
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT "bookings"."customer_name", "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob')
=> [["Bob", 1], ["Alice", 2]]
COMMIT
COMMIT
SELECT "bookings"."customer_name", "bookings"."seat_count"
FROM "bookings"
WHERE "bookings"."customer_name" IN ('Alice', 'Bob');
-- => [["Bob", 2], ["Alice", 2]]

The observer transaction sees Bob’s booking with one seat and Alice’s booking with two seats.

But such an outcome should be impossible when Alice’s and Bob’s transactions occur concurrently. The two possible valid sequences of execution are:

  • Alice updates her booking first, and Bob then returns his seat.
  • Bob takes the extra seat first and then Alice, unaware of this, also claims the seat, making both bookings end up with two seats.

If the query is re-ran by the observer in a new transaction, it sees the consistent state of four seats booked in total, even though it is not what Alice and Bob would have wanted.

The observer’s transaction initially sees an impossible state, one that should never occur from the application’s perspective - this is a read-only transaction anomaly.

You might question whether this should truly be considered an anomaly. After all, the state where Alice’s booking has two seats and Bob’s has one is possible if:

  • Alice’s transaction happens first;
  • The observer’s transaction happens after Alice’s;
  • Bob’s transaction happens after observer’s.

To understand why this is a problem, we first need to explore Postgres’ highest possible isolation level.

Serializable

Serializable is the most restrictive isolation level in Postgres. At this level, no anomalies are possible because it makes concurrent transactions behave as if they are executed sequentially.

The anomalies that are possible under repeatable read are also prevented under serializable.

Let’s take a look at two anomalies which we have seen occurring at the previous level.

Write Skew With Disjoint Write Sets

When two concurrent transactions update different rows based on shared data they read at the serializable isolation level, the following happens:

Bob
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a'
COMMIT
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
ROLLBACK
-> raised ActiveRecord::SerializationFailure PG::TRSerializationFailure: ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 3
COMMIT
SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a';
-- => 2

SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a';
-- => 1

Bob’s update raises a serialization failure, the same way his other update did in the write skew scenario with a shared row. The end result is correct - after retrying the transaction, Bob does not book an extra seat that Alice has already taken concurrently.

But how does PostgreSQL detect that these transactions lead to an inconsistent state from the application’s perspective, even though no shared rows are modified?

The main difference between repeatable read and serializable is that the latter obtains predicate locks. These locks help detect dependencies between reads and writes, preventing cases where a concurrent write could have altered the result of a read that has already occurred - this is exactly what happens in this scenario.

One might assume that PostgreSQL detects this dependency only because both reads and conditional updates reference bookings directly by customer_name - after all, to a human, this clearly shows they refer to the same rows.

Let’s try to fool Postgres by making it less obvious that these are the same bookings they read and base their writes on:

Bob
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."id" IN (1, 2)
-- => 2
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."id" = 1
COMMIT
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
ROLLBACK
-> raised ActiveRecord::SerializationFailure PG::TRSerializationFailure: ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.

Alice now references both bookings by their IDs, rather than by customer_name. The outcome is the same - PostgresSQL is able to understand that even though the rows are fetched using different queries with seemingly no evident overlap, both transactions still read the same physical rows.

There is an important rule that all applications which use serializable transactions must follow:

Bob
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
BEGIN ISOLATION LEVEL REPEATABLE READ
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a'
COMMIT
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
COMMIT
SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a';
-- => 2

SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a';
-- => 2

As we have previously seen, under repeatable read, even if only one transaction uses the appropriate level, it will still raise a serialization failure in case it commits last.

This is not the case with serializable transactions: the end result of two transactions with only one of them being serializable will always result in an anomaly, no matter which transaction uses the appropriate level and which one commits first.

When serializable is used, all participating transactions must be at this level, otherwise they will not obtain predicate locks and the dependency between reads and writes will not be detected.

Possible Serial Executions

Now let’s adjust Bob’s logic and reshuffle the order of commits:

Bob
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
BEGIN ISOLATION LEVEL SERIALIZABLE
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."id" = 2
COMMIT
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a'
COMMIT

Alice starts her transaction first once more and reads the sum. Bob starts his transaction afterwards.

However, this time, Bob does not read the sum and brazenly takes the seat without checking whether Alice has already claimed it. He is able to commit the transaction, after which Alice commits hers - no serialization failures are raised.

SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a';
-- => 2

SELECT SUM("bookings"."seat_count")
FROM "bookings"
WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a';
-- => 2

In the end, they both claim the seat again. Why did serializable transactions allow this to happen?

Serializable isolation level makes a specific guarantee: the final outcome of concurrent transactions will either match one of the possible serial executions of those transactions, or a serialization failure will be raised.

Alice does not encounter a serialization failure because there is no anomaly. In this case, the observed outcome of their concurrent execution matches the serial execution in which Bob’s transaction happens entirely before Alice’s:

Bob
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a'
COMMIT
BEGIN ISOLATION LEVEL SERIALIZABLE
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."id" = 2
COMMIT

Of course, this unfortunate result would not have been possible had Bob checked the state of their bookings beforehand, as he did earlier.

When a serializable transaction produces what the application developer perceives as an inconsistent or undesired outcome, it indicates a flaw in the application logic. The inconsistency exists not because of an anomaly but because one of the possible serial orderings allows for that outcome.

An important thing to keep in mind when working with serializable transactions, is that some serialization failures can be false positives.

Bob
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
=> 2
BEGIN ISOLATION LEVEL SERIALIZABLE
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
COMMIT
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a'
ROLLBACK
-> raised ActiveRecord::SerializationFailure PG::TRSerializationFailure: ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.

Bob slightly adjusts his code: this time, he references his booking by customer_name instead of by ID. Because of this, Alice’s update now fails with a serialization failure. However, note that nothing else has changed - the order of reads, writes, and commits remains the same.

This minor change in Bob’s query causes Postgres to falsely detect a serialization anomaly. While this is beneficial in this particular case - accidentally preventing Bob and Alice from double-booking the same seat - it does not technically prevent an anomaly. We’ve already established that the outcome of this concurrent execution matches one of the valid serial orderings.

This highlights an important nuance: predicate locks obtained by serializable transactions may sometimes trigger false alarms. In this case, it just so happens that the false positive worked out in the application’s favour.

Read-Only Transaction Anomaly

Before we analyze how serializable transactions handle read-only anomalies - and now that we have a clearer understanding of when serialization failures occur - let’s take another look at Alice’s and Bob’s transactions without the read-only observer transaction.

Bob
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
BEGIN ISOLATION LEVEL SERIALIZABLE
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."id" = 1
-- Alice's booking has ID 1
COMMIT
COMMIT

Despite raising the isolation level to serializable, no serialization failure occurs - this matches the nearly identical case we examined earlier, and we’ve already established why.

There are two possible serial executions of these transactions:

  • If Bob’s transaction happens before Alice’s, the result is B2A2.
  • If Alice’s transaction happens before Bob’s, the result is B0A2.

From now on, I will refer to the state of bookings using the notation above for brevity, where the letter signifies who the booking belongs to, and the number signifies its seat_count.

In this concurrent execution, the undesired B2A2 outcome occurs. While this result is unintended from the application’s perspective - and Alice is at fault for not checking beforehand - it remains perfectly valid from the database’s perspective.

Now, let’s introduce the observer’s read-only transaction and attempt to reproduce the anomaly.

Bob
Observer
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
BEGIN ISOLATION LEVEL SERIALIZABLE
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."id" = 1
COMMIT
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT "bookings"."customer_name", "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob')
-- => [["Bob", 1], ["Alice", 2]]
COMMIT
ROLLBACK
-> raised ActiveRecord::SerializationFailure PG::TRSerializationFailure: ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 3
UPDATE "bookings" SET "seat_count" = 0 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
COMMIT

Alice’s and Bob’s transactions remain unchanged, yet this time, Bob encounters a serialization failure. Meanwhile, the observer still sees the B1A2 state, just as in repeatable read isolation - an outcome that should be incorrect.

In order to understand what happened, let’s try to order the transactions logically. In order to make the explanation terser, I will refer to each transaction by the first letter of its initiator: A for Alice’s, B for Bob’s and O for observer’s.

Alice’s transaction starts after Bob’s. Given the final outcome of this concurrent execution, we can conclude that A happens after B; otherwise, the result would have been B0A2 instead of B2A2.

Since the observer sees the outcome of Alice’s transaction, A must occur before O. However, the observer does not see the result of Bob’s transaction, which implies that O happens before B.

The possible serial executions, along with their intermediate and final states, are:

  • A happens first, resulting in B1A2; B happens afterward, resulting in B0A2;
  • B happens first, resulting in B2A1; A happens afterward, resulting in B2A2.

Since the observer sees the B1A2 state, it implies that the order of transactions is A first, B second. But it doesn’t match the reality, as we have already established that the concurrent execution we are looking at matches the serial execution of B first, A second. This state should be impossible to see - it violates the serial order which the current concurrent execution emulates.

The dependencies between transactions can be visualized using a simple graph, where edges represent execution order:

A->O O->B B->A

As evident, there is a cycle - this indicates that the transactions can not be serialized. B must happen before A for the end result to be B2A2, yet because of O, A seemingly happens before B, which is impossible.

At the serializable level, the read-only observer’s transaction reads a state it shouldn’t be able to read. Once Bob tries to commit, his transaction realizes this and breaks the dependency cycle by raising a serialization failure. Here’s how the dependency graph looks like after Bob’s failure and a subsequent retry:

A->O O->B A->B

Since Bob’s update fails, it is required to retry and thus his retry happens after Alice’s. The state that observer has seen now becomes consistent with the serial execution of A first, B second. After Bob retries, the end result becomes B0A2 instead of B2A2.

Because the observer was able to see the incorrect state, the end result changes, as transactions get re-ordered due to a serialization failure. This makes the incorrect intermediate result correct in the end.

A passive read-only observer alters the course of Alice and Bob’s booking process. The anomaly is prevented - but Bob’s expense. even though he was the first to start, he does not finish his first transaction. In this case, this works in his favour - he can now detect that Alice has claimed the seat in the meantime. However, had the observer not caught a pivotal glimpse at the right moment2, Bob would have incorrectly claimed the seat as before.

Despite the accidental usefulness of the observer’s transaction here, its ability to affect execution may not always be desirable. In order to circumvent a serialization failure where a read-only transaction anomaly occurs otherwise, the transaction can be started in a READ ONLY mode and DEFERRABLE configuration.

Bob
Observer
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
=> 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
BEGIN ISOLATION LEVEL SERIALIZABLE
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."id" = 1
COMMIT
BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE
COMMIT
SELECT "bookings"."customer_name", "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') (2118.4ms)
-- Issued before Bob's commit, returns afterwards
-- => [["Bob", 2], ["Alice", 2]]
COMMIT

Now, the observer’s transaction waits until it is safe to read the state of bookings. Instead of reading an incorrect intermediate state and altering the course of Alice’s and Bob’s transactions, Bob’s update is allowed to commit first. Only then does the observer read the final state, preventing the dependency cycle that O could have introduced.

This is the only situation in Postgres where a SELECT query can be blocked by a concurrent update.

Now, let’s reverse the order of operations and revert the observer to a non-read-only transaction.

Bob
Observer
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."id" = 1
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
-- => 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
COMMIT
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT "bookings"."customer_name", "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob')
-- => [["Alice", 1], ["Bob", 2]]
COMMIT
COMMIT

This time, Alice starts her transaction first but does not commit until Bob’s transaction is fully executed. The observer sees Bob’s update but not Alice’s.

Why doesn’t Alice’s update cause a serialization failure? Let’s order the transactions logically once again.

The final outcome of this concurrent execution matches the serial execution where B happens before A - the state transitions from the initial state to B2A1, then to B2A2.

SELECT "bookings"."customer_name", "bookings"."seat_count"
FROM "bookings"
WHERE "bookings"."customer_name" IN ('Alice', 'Bob');
-- => [["Alice", 2], ["Bob", 2]]

O sees the result of B, but not A. The dependency graph looks like this:

B->A B->O O->A

Since there are no cycles or anomalies, what O sees is a valid intermediate state, consistent with a serial execution of B before A. This is similar to the previous example where a read-only transaction caused an anomaly, except here, the edges are inverted. The same result is achieved without needing to reorder transactions after an initial failure.

As a final exercise, let’s re-run the example where a read-only transaction anomaly was avoided by the observer’s transaction being deferred, but with a slight tweak:

Bob
Observer
Alice
BEGIN ISOLATION LEVEL SERIALIZABLE
SELECT SUM("bookings"."seat_count") FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') AND "bookings"."event_id" = 'event_a'
=> 2
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Bob' AND "bookings"."event_id" = 'event_a'
BEGIN ISOLATION LEVEL SERIALIZABLE
UPDATE "bookings" SET "seat_count" = 2 WHERE "bookings"."customer_name" = 'Alice' AND "bookings"."event_id" = 'event_a'
COMMIT
BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE
SELECT "bookings"."customer_name", "bookings"."seat_count" FROM "bookings" WHERE "bookings"."customer_name" IN ('Alice', 'Bob') (2.2ms)
-- => [["Alice", 2], ["Bob", 1]]
COMMIT
ROLLBACK
-> raised ActiveRecord::SerializationFailure PG::TRSerializationFailure: ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.

Alice now references the booking she wants to update by customer_name instead of an ID.

Although observers’ transaction is READ ONLY DEFERRABLE, its SELECT does not wait for Bob’s transaction to commit this time. He sees the B1A2 state.

The sequential execution in this scenario is B before A. Let’s quickly see the states again, including the alternative ones:

  • B before A: from initial to B2A1 to B2A2;
  • A before B: from initial to B1A2 to B0A2.

Ultimately, Bob must retry, leading to the final outcome of B0A2:

SELECT "bookings"."customer_name", "bookings"."seat_count"
FROM "bookings"
WHERE "bookings"."customer_name" IN ('Alice', 'Bob');
-- => [["Alice", 2], ["Bob", 0]]

The observer’s read does not wait and Bob encounters a serialization failure - what causes this?

The answer is that due to Alice’s change, Postgres obtains a predicate lock and gets baited into detecting a false positive again. Bob’s transaction is doomed to fail from the start, even if the observer were absent. This is why the read-only transaction does not wait - it knows that what it will read will not constitute an anomaly, since the reordering of transactions is inevitable.

A->O O->B B->A

This is the exact same graph as in the initial read-only transaction anomaly case. Except that this time, there is no read-only transaction anomaly because the cycle is detected early.

While B1A2 remains incorrect relative to the execution order, the cycle is broken by Bob’s and Alice’s transactions themselves via a serialization failure. Once Bob retries, the ordering flips to A before B, making the previously incorrect B1A2 state - which the observer read without waiting - retroactively consistent.

Postgres’ Vast Toolkit

Our exploration of transaction anomalies in Postgres has come to an end. While we haven’t examined every possible anomaly - simply because nobody knows how many there are - we have covered the most common ones.

The default READ COMMITTED isolation level allows for the greatest number of anomalies. Proper use of constraints, SQL operators, unique indices, and explicit pessimistic, optimistic, and advisory locking can help mitigate these, even those possible at the higher repeatable read level.

It’s important to keep in mind that without the appropriate application of the features listed above, it is generally unsafe to base write decisions on data previously read within a transaction at read committed, since every query gets its own snapshot, and there is no automatic conflict detection.

The REPEATABLE READ level pins the snapshot at the moment the first query within a transaction runs, helping to prevent a lot of anomalies. However, it does not detect write skews when concurrent transactions update different rows based on shared data.

SERIALIZABLE, the strictest isolation level, addresses such cases by tracing dependencies between reads and writes. This ensures that the only possible outcomes of concurrent executions are those that match possible valid serial executions.

Which isolation level should you use? Should read committed with explicit locks be preferred over higher levels? The answer depends on the application and its workload.

While repeatable read and serializable transactions make it necessary for the application to be able to retry any transaction due to the potential of a serialization failure3, this might not be the best approach in high-contention scenarios, where conflicting transactions are almost guaranteed to happen regularly and in great volumes. In such situations, explicit row locking using SELECT FOR UPDATE or advisory locks may perform better.

It has to be understood that higher isolation levels and coarse locking defeat the advantages of Postgres’ multi-version concurrency control, as in these cases transactions get executed serially. In many situations, this is absolutely necessary to maintain consistency. However, if the application logic offers opportunities to avoid the need for serialization altogether, then it is worth exploring these alternatives.

While some of these examples may have seemed contrived, they are far from unrealistic. Even in systems without an allegorical Bob or Alice, mistakes happen all the time, and your application code may not be exempt. Just because you have not encountered the effects of these anomalies does not mean they are absent. Even if they occur without apparent impact today, they could still wreak havoc on your business, revenue, or another critical metric on some unlucky day in the future.

Finally, it is important to note that our little booking system does not make use of read-only replicas or multiple primary servers. All the anomalies we explored occur within a single database server handling both reads and writes. Introducing replicas brings a whole new layer of distributed computing complexity, along with a plethora of even more intricate anomalies.

  1. Or not, depending on the language, framework and practices; ORMs may proliferate careless re-instantiations of model instances, which trigger such re-reads. 

  2. Or the wrong moment, depending on whose side you are on - application’s or database’s. 

  3. Which is not dissimilar to how optimistic locking forces the application to behave.