Tests & Security Medium severity

Flaky System Tests

Capybara or RSpec system tests that pass locally but fail randomly in CI, typically because the test uses sleep or polls explicit timings to wait for JavaScript-driven DOM updates instead of Capybara’s built-in asynchronous matchers.

Before / After

Problematic Pattern
it 'submits the form and shows success' do
visit new_contact_path
fill_in 'Email', with: 'user@example.com'
click_button 'Send'

sleep 2 # hope the AJAX finishes
expect(page).to have_content('Thanks!')
# Fails ~5% of the time in CI.
# Adds 2 seconds to every run.
end
Target Architecture
it 'submits the form and shows success' do
visit new_contact_path
fill_in 'Email', with: 'user@example.com'
click_button 'Send'

# Capybara retries until default_max_wait_time (5s)
# or the matcher becomes true. Zero flakiness,
# zero wasted time.
expect(page).to have_content('Thanks!')
expect(page).to have_no_css('.loading-spinner')
end

# For element absence, use have_no_* not ! have_*
# For precise waits, use have_selector with wait: opt
expect(page).to have_selector('.toast-success', wait: 10)

Why this hurts

System tests run in a multi-process dance: RSpec drives Capybara, Capybara drives Selenium or a ChromeDriver subprocess, the browser loads the Rails application running on a separate Puma process, and JavaScript on the page makes AJAX calls back to Rails. Every step adds timing variability, and the test runner has no synchronization primitive with the browser’s event loop. sleep 2 is a fixed-time bet that the slowest step in the chain finishes within the chosen window, which holds on a fast developer machine and fails on a shared CI runner under load.

The failure mode compounds the cost. Flaky tests erode trust in the suite; developers start retrying failed runs out of habit instead of investigating, which means real regressions slip through when they are marked as “probably flaky again”. CI retry logic masks the problem further: jobs pass after 3 attempts, the green checkmark appears, and the data about actual failure rate is lost. Over time, deployment confidence degrades even when the codebase is healthy, because no one can distinguish signal from noise in the test output.

Fixed sleeps also waste CI compute at scale. A suite of 200 system specs with a 2-second sleep each wastes 400 seconds per run. If CI runs 50 times per day across parallel jobs, that is 5.5 hours of paid compute daily doing nothing except waiting. Capybara’s asynchronous matchers (have_content, have_selector, have_no_content) retry until the condition holds or the timeout expires, which takes as long as necessary and returns as soon as possible. The same suite with retry-matchers completes faster on average and never flakes on ordinary timing.

Parallelization makes fixed sleeps strictly worse. When 8 spec workers share one CPU core, each worker gets 1/8 the CPU and sleep 2 measures 2 seconds of wall time during which the JavaScript has had maybe 250 ms of CPU to do its work. Retry-matchers scale gracefully: if the CPU is busy, the matcher waits longer automatically.

See also: Capybara System Tests Flakiness in Legacy Rails.

Get Expert Help

Inheriting a legacy Rails codebase with this problem? Request a Technical Debt Audit.