Infrastructure High severity

Synchronous API Calls

Blocking Puma or Unicorn web workers on HTTP calls to external services from inside the request-response cycle, making tail latency and availability of the Rails app directly dependent on every third-party API in the dependency chain.

Before / After

Problematic Pattern

class SignupsController < ApplicationController
def create
  @user = User.create!(user_params)

  # Each of these blocks a Puma worker.
  # If any is slow or down, the request hangs.
  MailchimpClient.subscribe(@user.email)
  SlackWebhook.post("New signup: #{@user.email}")
  CrmClient.create_contact(@user)
  SegmentClient.track(@user.id, 'signup')

  render json: @user
end
end

Target Architecture

class SignupsController < ApplicationController
def create
  @user = User.create!(user_params)

  # All side effects are enqueued.
  # Response returns in ~30 ms.
  PostSignupJob.perform_later(@user.id)

  render json: @user
end
end

class PostSignupJob < ApplicationJob
retry_on Net::OpenTimeout, wait: :exponentially_longer
retry_on Faraday::TimeoutError, wait: :exponentially_longer

def perform(user_id)
  user = User.find(user_id)
  MailchimpClient.subscribe(user.email)
  SlackWebhook.post("New signup: #{user.email}")
  CrmClient.create_contact(user)
  SegmentClient.track(user.id, 'signup')
end
end

Why this hurts

Each synchronous external call adds its latency directly to request time, and latency to any single upstream affects throughput on all downstream endpoints in the same process. Four calls at 200 ms each produce a guaranteed 800 ms lower bound per signup even if everything else executes in zero time. When one upstream slows to multi-second response times, the Puma worker is unavailable for any other request for the duration, so a 10-second timeout from Stripe effectively removes one worker from the fleet for 10 seconds.

The head-of-line blocking is the operational risk. Puma maintains a fixed set of worker threads sized via RAILS_MAX_THREADS. When external dependencies degrade, slow requests accumulate in the worker pool and the server runs out of capacity to serve fast requests. The load balancer observes elevated queue time and routes traffic to other instances, shifting the problem rather than solving it. Autoscaling reacts to increased latency by adding pods, which flood the unhealthy upstream with more concurrent requests and accelerate its failure.

Connection pool semantics interact badly with blocking calls. The request holds a database connection through the whole HTTP call even though no queries run during it, so database connections starve under upstream slowness. Sidekiq workers on the same host compete for the same PgBouncer slots, and queue lag grows even for jobs unrelated to the failing upstream.

Circuit breakers (semian, circuitbox) improve the blast radius but do not change the fundamental shape: they fail fast instead of fail slow, which is better for throughput but still propagates the upstream failure into user-visible errors. The correct model separates request-path work (must complete synchronously) from secondary effects (can retry, can fail). The signup response depends on user creation and session establishment, not on whether Mailchimp acknowledged the subscription.

Get Expert Help

Inheriting a legacy Rails codebase with this problem? Request a Technical Debt Audit.