ActiveRecord Medium severity

In-Memory Sorting

Loading a large ActiveRecord collection into Ruby and sorting it in application memory with sort_by or sort, instead of delegating the ordering to PostgreSQL with an ORDER BY clause.

Before / After

Problematic Pattern
# Loads 250,000 rows into Ruby, allocates objects,
# then sorts in-place. Request hangs for 8+ seconds.
Invoice.all.sort_by(&:due_date).first(20)

# Or with a computed column:
User.all.sort_by { |u| u.orders.sum(:total) }
Target Architecture
# Database does the work, returns 20 rows.
Invoice.order(:due_date).limit(20)

# Aggregate in SQL with a subquery or joined sum:
User
.joins(:orders)
.group('users.id')
.order('SUM(orders.total) DESC')
.limit(20)

Why this hurts

The Rails process allocates one Invoice instance per row, which means 250,000 object allocations for a 250,000-row table. Each instance carries the attribute hash, type-cast values, association caches, and ActiveModel dirty tracking, averaging 2-5 KB per row. The 500 MB to 1 GB spike triggers the major GC cycle and stalls the process for 200-500 ms of “stop the world” collection. Subsequent allocations fight for heap pages that were just released, causing page faults and TLB misses in the underlying memory allocator (glibc malloc or jemalloc).

Puma or Unicorn workers do not release connection pool slots during the sort because the ActiveRecord connection stays checked out for the duration of .all.each. Concurrent requests stall on pool checkout, and the queue time propagates up to the load balancer. Rack::Timeout fires after 15-30 seconds, Puma marks the worker stuck, and the orchestrator (Kubernetes, systemd) kills and restarts the pod. Requests that were queued behind the slow sort are all dropped.

The database remains idle while Ruby struggles, which is the costly part: PostgreSQL could have produced the top 20 rows sorted in single-digit milliseconds using a B-tree index scan, streaming 20 rows across the network rather than 250,000. Memory profilers consistently show ActiveRecord::Relation#to_a as the top allocator, but the root cause is visible only in request traces: a single .all call buried inside a report, dashboard, or export path. Caching layers offer no help because the sort operates on live data and materializes a different result set per request.

Get Expert Help

Inheriting a legacy Rails codebase with this problem? Request a Technical Debt Audit.