Your Rails app worked fine with 1,000 users. At 50,000, pages take 3 seconds to load. At 200,000, background jobs queue for minutes and the database connection pool is exhausted. Scaling is not a future problem. It is the problem you hit the moment your product succeeds.
This guide covers the full scaling path: from finding bottlenecks to database optimization, caching, background jobs, and production deployment.
Scaling Rails & Postgres to Millions of Users at Microsoft: Lessons & Takeaways
Where are the bottlenecks hiding?
Before optimizing anything, you need data. Guessing where performance problems live wastes time and often makes things worse.
Development tools
- Rack MiniProfiler: Real-time profiling in your browser showing SQL times, Ruby execution, and memory usage
- Bullet: Detects N+1 queries, unused eager loading, and missing counter caches
- StackProf: Sampling-based profiler for method-level CPU analysis
- Memory Profiler: Tracks object allocations to find memory bloat
Production monitoring
- Skylight or AppSignal: Low-overhead request profiling with endpoint breakdowns and database query analysis
- Derailed Benchmarks: Integrate into CI to catch performance regressions before they ship
What to look for in logs and metrics
| Metric | Warning sign | Action |
|---|---|---|
| Response time | P95 above 500ms | Profile the endpoint, check for N+1 queries |
| Database query count | More than 10 queries per request | Add eager loading or restructure queries |
| Memory usage | Steady upward trend | Look for object allocation leaks |
| Background job queue depth | Growing faster than processing | Add workers or optimize job logic |
| Error rate | Spike in 5xx responses | Check logs for timeouts or connection failures |
Prioritizing fixes
Focus on user-facing endpoints during peak traffic. A slow admin dashboard matters less than a slow checkout page. Use benchmark-ips for comparing implementations and flamegraphs for visualizing where time is spent.
How do you fix database performance at scale?
Database queries cause the majority of performance issues in Rails apps. Active Record makes it easy to write clean code that generates terrible SQL.
Fix N+1 queries first
Loading 100 users without eager loading generates 101 queries. One fix changes everything:
# Bad: 101 queries
users = User.all
users.each { |u| puts u.company.name }
# Good: 2 queries
users = User.includes(:company).all
users.each { |u| puts u.company.name }
Add targeted indexes
Use EXPLAIN ANALYZE to find missing indexes:
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com' AND created_at > '2024-01-01';
If you see a sequential scan on a large table, add a composite index:
add_index :users, [:email, :created_at]
This can reduce query time from 200ms to under 5ms.
Use batch processing for large datasets
Never load millions of records into memory:
# Bad: loads all records at once
User.all.each { |u| u.recalculate_stats }
# Good: processes in batches of 1,000
User.find_each(batch_size: 1_000) { |u| u.recalculate_stats }
Add counter caches
Replace repeated COUNT queries with automatic counters:
# Migration
add_column :users, :posts_count, :integer, default: 0
# Model
class Post < ApplicationRecord
belongs_to :user, counter_cache: true
end
This turns an O(n) query into an O(1) column read.
Database scaling strategies
| Strategy | When to use | Complexity |
|---|---|---|
| Vertical scaling (more CPU/RAM) | Quick fix, single-digit thousands of users | Low |
| Read replicas | Read-heavy apps (80%+ SELECT queries) | Medium |
| Connection pooling (PgBouncer) | High concurrency, connection exhaustion | Medium |
| Table partitioning | Time-series data, tables with millions of rows | High |
| Database sharding | Massive datasets, multi-tenant architectures | Very high |
Rails 6+ supports multiple databases natively. Route reads to replicas:
# config/database.yml
production:
primary:
adapter: postgresql
database: myapp_production
primary_replica:
adapter: postgresql
database: myapp_production
replica: true
Managing large datasets
- Cursor-based pagination instead of OFFSET (which gets slower as page numbers increase)
- Archive old data to separate tables (moving 2-year-old orders can reduce table size by 80%)
- Partition by date for time-series tables so queries skip irrelevant ranges
- PgBouncer for connection pooling when PostgreSQL’s default 100 max connections is not enough
How does caching reduce database load?
Caching stores computed results in memory so you skip expensive database queries and Ruby processing on repeated requests.
Layer your caching strategy
Fragment caching for partial views:
<% cache product do %>
<div class="product-card">
<%= render product.description %>
<%= number_to_currency product.price %>
</div>
<% end %>
Low-level caching for expensive computations:
Rails.cache.fetch("user_#{user.id}_dashboard_stats", expires_in: 15.minutes) do
{
total_orders: user.orders.count,
revenue: user.orders.sum(:total),
last_login: user.last_sign_in_at
}
end
HTTP caching with proper headers:
class ProductsController < ApplicationController
def show
@product = Product.find(params[:id])
fresh_when(@product) # Sets ETag and Last-Modified
end
end
Redis vs Memcached
| Feature | Redis | Memcached |
|---|---|---|
| Data structures | Strings, lists, sets, hashes | Strings only |
| Persistence | Optional disk persistence | In-memory only |
| Pub/sub | Yes | No |
| Best for | Cache + sessions + Sidekiq | Pure key-value caching |
Redis is the standard choice for Rails apps since it doubles as the cache store, session store, and Sidekiq backend.
Cache invalidation
Use versioned cache keys so stale data expires automatically:
# ActiveRecord models include updated_at in cache keys by default
cache_key = "product/#{product.id}-#{product.updated_at.to_i}"
For collection caching, use cache_key_with_version to invalidate when any item in the collection changes.
How do you scale background job processing?
Background jobs handle work that does not need to happen during a web request: emails, file processing, API calls, report generation.
Sidekiq configuration for scale
# config/sidekiq.yml
:concurrency: 25
:queues:
- [critical, 6]
- [default, 4]
- [low, 2]
Queue priority rules:
- Critical: Payment confirmations, security alerts
- Default: Emails, webhook deliveries
- Low: Reports, data exports, analytics
Key monitoring metrics
- Queue depth: If it grows faster than workers process, add concurrency or workers
- Processing time per job: Sudden increases indicate upstream issues (slow APIs, database locks)
- Failure rate: Set alerts for spikes; use Sentry for detailed error context
- Memory per worker: Watch for leaks in long-running processes
Job design best practices
- Keep jobs idempotent: Safe to retry without side effects
- Use exponential backoff for retries (built into Sidekiq)
- Pass IDs, not objects: Serialize minimal data, load fresh from the database
- Set timeouts for jobs that call external APIs
- Use dead job queues to review persistent failures without blocking active queues
Resource allocation
CPU-heavy jobs (image processing, PDF generation) benefit from a worker count matching CPU cores. I/O-heavy jobs (API calls, email sending) can handle higher concurrency since they spend most time waiting.
During peak traffic, consider running background workers on dedicated servers so they do not compete with web request processing for CPU and memory.
How do you deploy for horizontal scaling?
Docker for consistent environments
A well-defined Dockerfile ensures your app runs identically in development, staging, and production:
FROM ruby:3.3-slim
RUN apt-get update && apt-get install -y postgresql-client
WORKDIR /app
COPY Gemfile* ./
RUN bundle install --without development test
COPY . .
CMD ["bundle", "exec", "puma", "-C", "config/puma.rb"]
Kubernetes for auto-scaling
Kubernetes automatically scales container instances based on traffic:
- Horizontal Pod Autoscaler: Adds pods when CPU or memory thresholds are exceeded
- Health checks: Replaces crashed containers automatically
- Resource limits: Prevent background workers from starving web processes
- Rolling deploys: Zero-downtime deployments by replacing pods incrementally
Load balancing with Nginx or HAProxy
Nginx is the standard choice for Rails:
- Serves static assets directly (CSS, JS, images)
- Handles SSL termination, reducing backend workload
- Supports session persistence via IP hashing or cookies
HAProxy excels at:
- Advanced health checking (detecting slow backends, not just dead ones)
- Detailed traffic statistics and monitoring
- Sophisticated routing algorithms
Platform considerations
For teams that want managed infrastructure, platforms like Heroku, Render, or Fly.io handle scaling, SSL, and database management. For full control, self-managed Kubernetes on AWS/GCP provides maximum flexibility at the cost of operational complexity.
Practical Implementation: The USEO Approach
Scaling advice is easy to write and hard to execute. Here is what we learned from scaling real production systems over 15 years of Rails development.
Yousty: Scaling a 13-year-old Rails monolith
The Yousty apprenticeship platform serves hundreds of thousands of users across Switzerland. Over our 13-year partnership, we scaled it through multiple growth phases without a full rewrite.
Database optimization came first. The biggest wins were mundane: adding missing indexes on polymorphic associations, replacing N+1 queries in listing pages, and adding counter caches for frequently displayed counts. These changes alone reduced average response times by 40%.
Caching was layered incrementally. We started with fragment caching on the heaviest pages (apprenticeship listings with complex filtering), then added low-level caching for computed statistics, and finally HTTP caching for public pages. Each layer was added when monitoring showed it was needed, not preemptively.
Background jobs required strict boundaries. Search index updates, email delivery, and PDF generation all moved to Sidekiq queues with clear priority levels. The critical insight: jobs that touch the database during peak hours need their own connection pool configuration to avoid starving web requests.
Triptrade: Scaling decisions for an MVP
When building the Triptrade MVP, we made deliberate choices to avoid premature optimization:
- Single database, no replicas until read traffic justified the complexity
- In-process caching with
Rails.cachebefore introducing Redis - Inline jobs during early development, migrated to Sidekiq when response times required it
The lesson: scaling infrastructure should follow proven demand, not anticipated demand. Every scaling layer adds operational complexity. The Triptrade MVP launched in weeks because we deferred scaling decisions until real usage data guided them.
What we consistently see in client codebases
The most common scaling mistake is optimizing the wrong layer. Teams add Redis, read replicas, and Kubernetes before fixing the N+1 queries that cause 80% of their latency. Start with EXPLAIN ANALYZE on your slowest queries. The fix is almost always an index or eager loading, not more infrastructure.
FAQs
What are the best ways to optimize database performance in a Rails app?
Start with the highest-impact, lowest-effort fixes: add indexes on frequently queried columns, fix N+1 queries with includes, and add counter caches for displayed counts. Use EXPLAIN ANALYZE to verify your changes. Layer in caching with Redis for computed values that do not change on every request.
How do you manage background jobs during high traffic?
Organize jobs into priority queues (critical, default, low). Use Sidekiq with concurrency tuned to your workload type: match CPU cores for compute-heavy jobs, go higher for I/O-bound work. Monitor queue depth and processing times. During traffic peaks, dedicate separate servers for background processing to protect web request latency.
When should you add read replicas to your Rails database?
When your app is read-heavy (80%+ SELECT queries) and single-database optimization (indexes, caching, query fixes) is no longer sufficient. Rails 6+ has built-in multi-database support for automatic read/write routing. Add read replicas before considering sharding, which introduces significantly more complexity.