Engineering Intermittent Bugs Debugging

Investigate Intermittent Bugs

It fails once in twenty runs: the occurrence pattern IS the evidence. Hunt the difference between failing and passing runs — forensically.

Overview

Intermittent bugs gaslight engineers: every look at the code says it's fine, and one run in twenty says otherwise. This setup investigates a flaky CI test under forensic mode: occurrence patterns treated as primary evidence (when it fails and when it doesn't is the strongest signal available), the difference hunted between failing and passing runs — timing, load, data, environment — and concurrency suspected first, per the checklist: races on shared state, timing dependencies, retry behavior, stale state, cache inconsistencies. Forensic mode matters here most: with evidence this thin, guesses are expensive.

Workflow

  1. Log the pattern before the code

    Failure rate, time of day, parallelism, data — the occurrence pattern is the dataset; collect it first.

  2. Diff the runs

    What is different about run 17? CI parallelism, shared state, timing — the difference IS the lead.

  3. Demand a longer verification window

    The contract's post-fix rule: intermittent problems need extended observation — one green run proves nothing.

Why This Works

  • Pattern-as-evidence framing extracts signal from a bug that hides from direct observation
  • The concurrency-first checklist starts where sometimes-bugs usually live
  • Forensic rules prevent expensive guesses when evidence is thinnest

Best for

  • Flaky tests that pass locally and fail in CI
  • Bugs that vanish when observed (and return after the standup)
  • Race conditions suspected but never caught

Not for

  • Deterministic failures — the standard strategies diagnose them faster
  • Fixing flaky tests by rewriting them — pin the cause first; the Test Case Prompt Generator's discipline prevents the next one

Use cases

  • Hunting the test that fails 5% of CI runs
  • Diffing failing runs against passing ones systematically
  • Catching the race that only fires under CI parallelism

Tip: Save time by exploring related resources and tools that integrate with this workflow.

Explore all resources