Infrastructure High severity

Connection Pool Exhaustion

PostgreSQL runs out of available connections, and every new query raises ActiveRecord::ConnectionNotEstablished or could not open a new connection in 5.000 seconds, caused by Puma and Sidekiq together requesting more connections than max_connections permits.

Before / After

Problematic Pattern

# config/database.yml
production:
pool: 25

# Puma: 4 workers x 16 threads = 64 connections
# Sidekiq: 25 concurrency = 25 connections
# 6 pods: 6 * (64 + 25) = 534 connections needed
# Postgres max_connections = 100
# Random "connection timeout" errors under load.

Target Architecture

# Put PgBouncer between Rails and Postgres.
# Transaction pooling mode for Rails 7+.

# config/database.yml
production:
host: pgbouncer.internal
port: 6432
pool: <%= ENV.fetch('RAILS_MAX_THREADS', 5) %>
prepared_statements: false
advisory_locks: false

# PgBouncer
# pool_mode = transaction
# default_pool_size = 25
# max_client_conn = 2000
# 2000 client connections multiplexed
# over 25 real Postgres sessions.

Why this hurts

Every Puma thread, Sidekiq concurrent worker, and background process holds one PostgreSQL connection for the duration of the query checkout. ActiveRecord’s connection pool is per-process, so each Ruby worker maintains its own full pool. At scale this requires far more connections than max_connections permits. Requests that cannot check out a connection from the pool raise ActiveRecord::ConnectionTimeoutError after 5 seconds, which clients experience as intermittent 500s.

PostgreSQL backends are heavyweight. Each connection forks a dedicated backend process consuming 5-10 MB of RAM for the process itself plus additional buffers for work_mem, temp_buffers, and per-query caches. Raising max_connections from 100 to 500 costs several gigabytes of memory and reduces query throughput because the OS scheduler spends more time context-switching between backends. The relationship between connection count and throughput is not linear: performance degrades past a few hundred connections regardless of hardware capacity.

Autoscaling amplifies the problem. Every new pod starts with a full pool of connections ready to serve traffic, which spikes connection count at exactly the moment traffic surges. Horizontal scaling under load can tip the database from healthy to saturated in seconds. Connection-pool errors cascade: a request that cannot acquire a connection returns an error, the client retries, the retry takes a connection that another request needed, and the failure spreads through unrelated endpoints.

PgBouncer in transaction-pooling mode is the conventional solution. Rails connects to PgBouncer instead of PostgreSQL; PgBouncer maintains a small pool of real connections to PostgreSQL and multiplexes them across a much larger number of Rails client connections. Transaction pooling requires disabling prepared_statements and advisory_locks because those features rely on session state that transaction pooling does not preserve across multiplexed transactions.

Get Expert Help

Inheriting a legacy Rails codebase with this problem? Request a Technical Debt Audit.