Database High severity

State Machine Race Conditions

Status transitions implemented as read-modify-write sequences without optimistic locking, pessimistic locking, or idempotency guards, allowing two concurrent requests to observe the same starting state and each perform a supposedly one-time transition, for example charging a customer twice for the same order.

Before / After

Problematic Pattern

# Two concurrent webhooks both see status='pending'
# Both charge the customer. Double billing.

def process_payment(order)
return if order.status == 'paid'

charge_stripe(order)
order.update!(status: 'paid')
end

Target Architecture

# Step 1: atomic state transition wins the race.
# Step 2: idempotency_key on the external call.
# Step 3: reconciliation job for stranded state.

def process_payment(order)
updated = Order
  .where(id: order.id, status: 'pending')
  .update_all(status: 'processing')
return unless updated == 1

begin
  charge_stripe(
    order,
    idempotency_key: "order-#{order.id}-v1"
  )
  Order.where(id: order.id).update_all(status: 'paid')
rescue Stripe::APIError => e
  # Keep status='processing'. A separate
  # ReconcileStripeChargesJob polls Stripe for
  # charges matching the idempotency_key and
  # finalizes the status without risking
  # duplicate capture.
  raise
end
end

Why this hurts

The classic read-modify-write race works like this: two concurrent requests both execute order.status == 'paid', both see false because the first UPDATE has not yet committed, both proceed to charge Stripe. PostgreSQL’s default isolation level (READ COMMITTED) does not prevent this because the conflict is not on a single row update but on the sequence of read-then-write across two transactions. Payment processors see duplicate charges, accounting reconciliation becomes manual, and customer support handles chargebacks for weeks.

A conditional UPDATE (WHERE status = 'pending') closes the database side of the race because only one transaction can satisfy the predicate once the other has committed. But this opens a second, subtler failure window: if the worker process dies between the state transition and the external HTTP call (OOM kill, container eviction, network partition), the database records status = 'processing' forever and no one charges the customer. Recovery requires operational knowledge of Stripe’s API, manual status flips, and careful handling of partial state.

The correct model treats payment processing as three coordinated concerns. Optimistic state transition wins the race at the database layer. An idempotency key passed to the external API guarantees that a retried call after a crash returns the original result rather than charging again. A reconciliation job running on a schedule queries the payment provider for transactions matching the idempotency key and advances or rolls back the local status based on authoritative remote state. Missing any of the three produces a distinct failure mode: missing the first causes double-billing, missing the second causes double-billing on retry, missing the third strands orders in processing indefinitely.

Get Expert Help

Inheriting a legacy Rails codebase with this problem? Request a Technical Debt Audit.