The Hidden Cost of Distributed Transactions

Two-phase commit feels free in the happy path. Then a coordinator dies mid-prepare and you learn exactly what "blocking protocol" means.

Two-phase commit is one of those ideas that looks trivial on a whiteboard. The coordinator asks every participant to prepare, everyone votes yes, the coordinator says commit, done. The happy path is so clean that it's easy to ship it and move on.

Then production happens. A participant votes yes, moves into the prepared state — holding locks — and waits for the decision. The coordinator picks that exact moment to crash. Now the participant is stuck: it can't commit without permission and can't abort without risking a split decision. It holds those locks until the coordinator comes back. That's what "blocking protocol" means, and it's not in the diagram.

# prepared is the dangerous state

The prepared window is where all the pain lives. Every millisecond a participant spends prepared is a millisecond of held locks and blocked throughput. Shrinking that window — faster coordinators, tighter timeouts, fewer participants — buys you more than any clever optimization elsewhere.

2PC doesn't remove failure. It concentrates it into the coordinator and dares you to keep that node alive.

# when to just say no

Often the right answer is to not need a distributed transaction at all: co-locate the data that changes together, or restructure the operation as an idempotent saga you can retry. A protocol that blocks on a single point of failure should be a last resort, not a default.