ordeal — Chaos testing for Python¶
Your tests pass. Your code still breaks in production. Ordeal finds what you missed.
Why ordeal
Your code works until it doesn't. Ordeal finds the failures you didn't think to test — the crash when two things happen at once, the silent wrong answer on a weird input, the timeout that only happens under load. It tries thousands of combinations no human would write by hand, and when it finds a problem, it hands you the exact steps to reproduce it. One command, real bugs, no test code required.
What ordeal does¶
You give it your Python code. It gives you back:
- What your functions actually do — not what you think they do, what they provably do across hundreds of random inputs
- What your tests miss — gaps in coverage, mutations your tests don't catch, edge cases you haven't considered
- Exactly what to fix — line numbers, specific inputs, concrete suggestions
No test code to write. No configuration. Just point and run.
Try it right now¶
Open a terminal and paste this (uvx runs Python tools without installing them):
This analyzes ordeal's built-in demo module. You'll see output like:
mine(score): 500 examples
ALWAYS output in [0, 1] (500/500) ← score() always returns a value between 0 and 1
ALWAYS monotonically non-decreasing ← bigger input = bigger output, always
mine(normalize): 500 examples
ALWAYS len(output) == len(xs) (500/500) ← output is always the same length as input
97% idempotent (29/30) ← normalizing twice SHOULD give the same result
...but ordeal found 1 case where it doesn't
Ordeal called each function hundreds of times with random inputs and told you what's always true — and what isn't. That 97% idempotent is a real finding: there's an edge case where normalize(normalize(x)) gives a different result than normalize(x).
Point it at your code¶
If your project has a file like myapp/scoring.py, the module path is myapp.scoring — the file path with slashes replaced by dots, without the .py:
uvx ordeal scan myapp.scoring --save-artifacts # find a bug, save report + regressions
uvx ordeal verify fnd_123456789abc # re-run one saved finding later
uvx ordeal init myapp # bootstrap starter tests for an existing package
uvx ordeal mine myapp.scoring # what do my functions actually do?
uvx ordeal audit myapp.scoring # what are my tests missing?
audit goes further — it generates tests for you, measures coverage, and mutation-tests the result:
myapp.scoring
migrated: 12 tests | 130 lines | 96% coverage [verified]
mutation: 14/18 (78%) ← ordeal flipped operators in your code;
4 changes went undetected by your tests
suggest:
- L42 in compute(): test when x < 0
- L67 in normalize(): test that ValueError is raised
Those suggest lines are real. Line 42 of compute() behaves differently with negative inputs, and your tests never check that.
Let your AI assistant do it¶
You don't need to learn ordeal's API. Open Claude Code, Cursor, Copilot, or any AI coding assistant and paste:
"Run
uv tool install ordealto install ordeal. Then runordeal mineon each module in my project andordeal auditon the ones with existing tests. Read the output, explain what it found, and fix the issues it suggests."
Or without installing anything:
"Run
uvx ordeal mineon my main modules. Show me the output and explain what the findings mean."
ordeal ships with an AGENTS.md — your AI assistant reads it automatically and knows every command, every option, and how to interpret every result.
Install¶
When you're ready to make ordeal part of your workflow:
Then ordeal mine, ordeal audit, and ordeal explore are available directly from your terminal.
Find what you need¶
Every goal maps to a starting point — a command to run, a module to import, and a page to read. Nothing is hidden.
| I want to... | Start here | Learn more |
|---|---|---|
| Capture a bug and lock it in | ordeal scan mymodule --save-artifacts |
Bug Bundle |
Understand why scan promoted or demoted a crash |
Read the scan finding rules | Scan Finding Rules |
| Re-run one saved finding | ordeal verify fnd_123456789abc |
Bug Bundle |
| Bootstrap tests for an existing package | ordeal init mymodule |
CLI |
| Find bugs without writing tests | ordeal mine mymodule |
Auto Testing |
| Check if my tests are good enough | ordeal audit mymodule |
Mutations |
| Write a chaos test | from ordeal import ChaosTest |
Getting Started |
| Inject specific failures (timeout, NaN, ...) | from ordeal.faults import timing |
Fault Injection |
| Explore all failure combinations | ordeal explore |
Explorer |
| Reproduce and shrink a failure | ordeal replay trace.json |
Shrinking |
| Add fail-safe gates to production code | from ordeal.buggify import buggify |
Fault Injection |
| Make assertions across all runs | from ordeal import always, sometimes |
Assertions |
| Control time / filesystem in tests | from ordeal.simulate import Clock |
Simulation |
| Compare two implementations | ordeal mine-pair mod.fn1 mod.fn2 |
Auto Testing |
| Test API endpoints for faults | from ordeal.integrations.openapi import chaos_api_test |
Integrations |
| Extend ordeal with a new fault | Follow the pattern in ordeal/faults/*.py |
Fault Injection |
| Configure reproducible runs | Create ordeal.toml |
Configuration |
| See the next functionality-coverage priorities | Read the roadmap | Roadmap |
| Inspect every capability before choosing a tool | ordeal catalog --detail |
API Reference |
| Discover all available faults, assertions, strategies | from ordeal import catalog; catalog() |
API Reference |
Pick your starting point
Every path leads somewhere useful — pick whichever matches what you need right now.
- "I just want to see what ordeal does" → Run
uvx ordeal mine ordeal.demoin your terminal, then read Getting Started - "I have code and want to find bugs" → Run
ordeal mine mymodule— see Auto Testing - "I want to write chaos tests for my service" → Start with Getting Started, then Writing Tests
- "I want to understand the ideas behind ordeal" → Read Philosophy, then the Concepts
- "I need to check if my tests are any good" → Run
ordeal audit— see Mutations - "I want to run ordeal in CI" → See the Explorer guide and Configuration
- "I want to explore the source code" → See the Architecture section in the README for a full code map
Start here¶
-
Why ordeal exists. What problem it solves. Why it matters for the future of code quality.
-
Write your first chaos test in 5 minutes. From install to your first failure.
Understand¶
-
What is chaos testing? Faults, nemesis, swarm mode — explained from the ground up.
-
How the explorer finds bugs: edge hashing, checkpoints, energy scheduling.
-
always, sometimes, reachable, unreachable — the Antithesis assertion model.
-
External faults, inline buggify, the FoundationDB model — and when to use each.
-
How ordeal minimizes failures: delta debugging, step elimination, fault simplification.
Use¶
- Explorer — Run and configure coverage-guided exploration
- Writing Tests — Patterns for effective chaos tests
- Auto Testing — Zero-boilerplate: scan_module, fuzz, mine, diff, chaos_for
- Simulation — Deterministic Clock and FileSystem
- Mutations — Validate that your tests catch real bugs
- Integrations — Atheris fuzzing, built-in API chaos testing
Reference¶
- CLI — ordeal explore, ordeal replay, pytest --chaos
- CLI — ordeal scan, verify, init, explore, replay
- Configuration — ordeal.toml schema and tuning
- API Reference — Every function, every parameter, every type
- Troubleshooting — Common issues and how to fix them
What ordeal brings together¶
| Capability | Idea | Origin |
|---|---|---|
| Stateful chaos testing | Nemesis toggles faults while Hypothesis explores interleavings | Jepsen + Hypothesis |
| Coverage-guided exploration | Checkpoint interesting states, branch from productive ones | Antithesis |
| Property assertions | always, sometimes, reachable, unreachable |
Antithesis |
| Inline fault injection | buggify() — no-op in production, fault in testing |
FoundationDB |
| Boundary-biased generation | Test at 0, -1, empty, max — where bugs cluster | Jane Street |
| Mutation testing | Verify tests catch real code changes | Meta ACH |
| Differential testing | Compare two implementations on random inputs | Regression testing |
| Property mining | Discover invariants from execution traces | Specification mining |
| Metamorphic testing | Check output relationships across transformed inputs | Metamorphic relations |
| Network faults | HTTP errors, rate limiting, DNS failure, connection reset | Real-world API failures |
| Concurrency faults | Lock contention, thread boundaries, stale state | Thread-safety testing |