Skip to content

CLI

Install

uv tool install ordeal     # global, `ordeal` on PATH
uvx ordeal explore         # ephemeral, no install
uv run ordeal explore      # inside project venv

Commands

ordeal mine-pair

Discover relational properties between two functions — roundtrip (g(f(x)) == x), reverse roundtrip, and commutative composition:

ordeal mine-pair mymod.encode mymod.decode
ordeal mine-pair json.dumps json.loads -n 500
mine_pair(encode, decode): 200 examples
  ALWAYS  roundtrip g(f(x)) == x (200/200)
  ALWAYS  roundtrip f(g(x)) == x (200/200)
Flag Default Description
f required First function (dotted path)
g required Second function (dotted path)
--max-examples, -n 200 Examples to sample

ordeal audit

Measure your existing tests vs what ordeal auto-scan achieves — verified numbers, not estimates:

ordeal audit myapp.scoring --test-dir tests/
ordeal audit myapp.scoring myapp.pipeline -t tests/ --max-examples 50

Output:

ordeal audit

  myapp.scoring
    current:   33 tests |   343 lines | 98% coverage [verified]
    migrated:  12 tests |   130 lines | 96% coverage [verified]
    saving:   64% fewer tests | 62% less code | same coverage
    mined:    deterministic(compute, normalize), output in [0, 1](compute)
    mutation: 14/18 (78%)
    suggest:
      - L42 in compute(): test when x < 0
      - L67 in normalize(): test that ValueError is raised

Every number is [verified] (measured via coverage.py JSON, cross-checked for consistency) or FAILED: reason. Mined properties are grouped by kind. The mutation score shows how many code mutations the mined properties catch — if it's below 100%, the surviving mutants reveal property gaps.

The "migrated" column shows what a real ordeal test file looks like: fuzz() for crash safety plus explicitly mined properties (bounds, determinism, type checks). It generates the test file a developer would write after adopting ordeal.

Use --show-generated to inspect the generated test, or --save-generated to save it and use it directly:

ordeal audit myapp.scoring --show-generated          # print generated test
ordeal audit myapp.scoring --save-generated test_migrated.py  # save to file
Flag Default Description
modules required Module paths to audit (positional, one or more)
--test-dir, -t tests Directory containing existing tests
--max-examples 20 Hypothesis examples per function
--show-generated off Print the generated test file
--save-generated Save generated test to this path

ordeal mine

Discover properties of a function or all public functions in a module. Prints what mine() finds — type invariants, algebraic laws, bounds, monotonicity, length relationships — with confidence levels.

ordeal mine myapp.scoring.compute           # single function
ordeal mine myapp.scoring                   # all public functions
ordeal mine myapp.scoring.compute -n 1000   # more examples = tighter confidence

Output:

mine(compute): 500 examples
  ALWAYS  output type is float (500/500)
  ALWAYS  deterministic (50/50)
  ALWAYS  output in [0, 1] (500/500)
  ALWAYS  observed range [0.0, 0.9987] (500/500)
  ALWAYS  monotonically non-decreasing (499/499)
    n/a: commutative, associative

Use this to understand a function before writing tests. The ALWAYS properties are candidates for assertions; the n/a list shows what doesn't apply. result.not_checked (visible in the Python API) lists what mine() structurally cannot verify — those are the tests you write manually.

Flag Default Description
target required Dotted path: mymod.func or mymod (positional)
--max-examples, -n 500 Examples to sample

ordeal mine-pair

Discover relational properties between two functions: roundtrip (g(f(x)) == x), reverse roundtrip (f(g(x)) == x), and commutative composition (f(g(x)) == g(f(x))).

ordeal mine-pair myapp.encode myapp.decode           # roundtrip?
ordeal mine-pair myapp.serialize myapp.parse -n 500  # more examples

Output:

mine(encode <-> decode): 200 examples
  ALWAYS  roundtrip decode(encode(x)) == x (48/48)
  ALWAYS  roundtrip encode(decode(x)) == x (45/45)
     52%  commutative composition (26/50)

Use this when you have function pairs that should be inverses (encode/decode, serialize/parse, compress/decompress) or that should commute.

Flag Default Description
f required First function (positional)
g required Second function (positional)
--max-examples, -n 200 Examples to sample

ordeal benchmark

Measure how parallel exploration scales on your machine and test class. Runs the Explorer at N=1, 2, 4, 8... workers, measures throughput, and fits the Universal Scaling Law (USL):

ordeal benchmark                          # uses ordeal.toml, first [[tests]] entry
ordeal benchmark -c ci.toml               # custom config
ordeal benchmark --max-workers 16         # test up to 16 workers
ordeal benchmark --time 30                # 30s per trial (default: 10s)
ordeal benchmark --metric edges           # fit on edges/sec instead of runs/sec
Scaling Analysis (Universal Scaling Law)
  sigma (contention):  0.080755
  kappa (coherence):   0.005578
  Regime:              usl
  Optimal workers:     13.4
  Peak throughput:     7.64x

  Diagnosis:
    Contention (sigma): 8.1% serialized fraction.
    Coherence (kappa):  0.005578 cross-worker sync cost.
Flag Default Description
--config, -c ordeal.toml Config file
--max-workers CPU count Maximum workers to test
--time 10 Seconds per trial
--metric runs "runs" (runs/sec) or "edges" (edges/sec)

ordeal explore

Your main command for deep exploration. Reads ordeal.toml, loads each ChaosTest class, and runs coverage-guided exploration with fault injection, energy scheduling, and swarm mode.

Use for: pre-commit validation, pre-release exploration runs, CI pipelines, and finding deep bugs that unit tests miss.

ordeal explore                          # reads ordeal.toml
ordeal explore -c ci.toml              # custom config
ordeal explore -v                       # live progress
ordeal explore --max-time 300          # override time
ordeal explore --seed 99               # override seed
ordeal explore --no-shrink             # skip failure minimization
ordeal explore -w 4                    # 4 parallel workers

The --workers / -w flag runs exploration across multiple processes. Each worker gets a unique seed for independent state-space exploration. Results are aggregated: runs/steps are summed, edges are unioned for true unique count. Use --workers $(nproc) for full CPU utilization.

ordeal replay

Reproduce a failure from a saved trace. The trace file contains the exact sequence of rules and fault toggles that triggered the failure, so replaying it re-executes the same steps.

Use for: triaging a CI failure, sharing a reproducible bug with a colleague, verifying that a fix actually resolves the issue.

ordeal replay .ordeal/traces/fail-run-42.json          # reproduce
ordeal replay --shrink trace.json                       # minimize
ordeal replay --shrink trace.json -o minimal.json      # save minimized

The --shrink flag runs delta-debugging to remove unnecessary steps from the trace. Use it when: the trace is too long to understand, or you want the minimal sequence of operations that reproduces the failure. The shrunk trace is often 5-10x shorter than the original.

Workflows

Local development

Quick exploration with live progress. Run this before committing to catch obvious issues:

ordeal explore -v --max-time 30

The -v flag prints a progress line showing runs, steps, edges discovered, and failures found. Thirty seconds is enough to catch most shallow bugs.

CI pipeline

Longer exploration with a dedicated config, JSON report, and a nonzero exit code on failure:

ordeal explore -c ci.toml

Where ci.toml might set max_time = 120, report.format = "json", and report.output = "ordeal-report.json". The exit code is 1 if any failure is found, so your CI script can gate on it directly.

Bug triage

When a CI run or colleague reports a failure trace:

ordeal replay trace.json                          # confirm it reproduces
ordeal replay --shrink trace.json -o minimal.json # minimize it

The shrunk trace gives you the shortest sequence of operations that triggers the bug. Read through the steps: which rules ran, which faults were active, and where the exception occurred.

Reproducibility

Fix the seed for deterministic exploration. The same seed produces the same sequence of rule interleavings and fault schedules:

ordeal explore --seed 42

Useful for: bisecting changes (did this commit introduce the failure?), comparing exploration runs across branches, and ensuring consistent CI behavior.

pytest integration

ordeal also works as a pytest plugin (auto-registered when ordeal is installed). No configuration needed -- pytest picks it up automatically via the pytest11 entry point.

How --chaos works

pytest --chaos                    # enable chaos mode
pytest --chaos --chaos-seed 42    # reproducible seed
pytest --chaos --buggify-prob 0.2 # higher fault probability

When you pass --chaos, three things happen:

  1. PropertyTracker activates: all always(), sometimes(), reachable(), and unreachable() calls start recording hits and results instead of being no-ops.
  2. buggify() activates: every buggify() call in your code has a chance of returning True (default 10%, controlled by --buggify-prob).
  3. Chaos-only tests run: tests marked with @pytest.mark.chaos are collected instead of skipped.

Without --chaos, your test suite runs normally. buggify() returns False, assertions are no-ops, and chaos-marked tests are skipped.

@pytest.mark.chaos

Mark tests that should only run under chaos mode. These are skipped without the --chaos flag, so your normal CI runs are not affected:

import pytest

@pytest.mark.chaos
def test_under_chaos():
    ...

This is useful for tests that are slow (because they explore fault interleavings), flaky by design (because faults cause nondeterminism), or only meaningful under fault injection.

The property report

When --chaos is active, ordeal prints a property report at the end of the test run. It shows every tracked property, its type, hit count, and pass/fail status:

--- Ordeal Property Results ---
  PASS  cache hit (sometimes: 47 hits)
  PASS  no data loss (always: 312 hits)
  FAIL  stale read (sometimes: never true in 200 hits)

  1/3 properties FAILED

always properties pass if they held every time they were evaluated. sometimes properties pass if they held at least once. reachable properties pass if the code path was reached. unreachable properties pass if it was never reached.

chaos_enabled fixture

For tests that need chaos in a specific scope without requiring the global --chaos flag:

def test_something(chaos_enabled):
    # buggify() is active, PropertyTracker is recording
    result = my_function()
    assert result is not None

The fixture activates buggify and the PropertyTracker for the duration of the test, then restores the previous state.

Pytest patterns

Pattern 1: Separate chaos tests from unit tests. Keep chaos tests in their own directory so you can run them independently:

tests/
├── unit/              # fast, deterministic — always run
│   └── test_scoring.py
├── chaos/             # slower, exploratory — run with --chaos
│   └── test_scoring_chaos.py
└── conftest.py
pytest tests/unit/                          # fast CI gate
pytest tests/chaos/ --chaos --chaos-seed 0  # thorough validation

Pattern 2: Use chaos_enabled for targeted chaos in unit tests. You don't need --chaos for everything. Use the fixture when a specific test needs fault injection:

def test_retry_logic(chaos_enabled):
    """This test specifically checks retry behavior under buggify."""
    from ordeal.buggify import buggify
    # buggify() is now active — it will sometimes return True
    result = service_with_retries.call()
    assert result is not None  # should succeed despite faults

Pattern 3: Combine @pytest.mark.chaos with ChaosTest.TestCase. ChaosTest classes work with or without --chaos, but marking them ensures they're skipped in fast CI runs:

import pytest
from ordeal import ChaosTest, rule, always

@pytest.mark.chaos
class ScoreServiceChaos(ChaosTest):
    faults = [...]
    @rule()
    def score(self): ...

TestScoreServiceChaos = ScoreServiceChaos.TestCase

Pattern 4: Auto-scan via ordeal.toml. When you add [[scan]] entries to ordeal.toml, pytest auto-discovers and runs them. No test files needed:

# ordeal.toml
[[scan]]
module = "myapp.scoring"
max_examples = 100
pytest ordeal.toml --chaos  # auto-scans myapp.scoring

Each public function in the module becomes a test item. Functions without type hints are skipped unless fixtures are provided in the TOML.

Pattern 5: Different buggify probabilities for different environments.

pytest --chaos --buggify-prob 0.05   # gentle: 5% fault rate (local dev)
pytest --chaos --buggify-prob 0.1    # moderate: 10% (default, CI)
pytest --chaos --buggify-prob 0.3    # aggressive: 30% (pre-release stress)

Higher probability = more faults per run = finds more bugs but also more noise. Start gentle, increase as your error handling matures.

Exit codes

ordeal explore returns 0 on success (no failures found) and 1 if any failure is found or if there is a configuration error. Use this directly in CI scripts:

ordeal explore -c ci.toml || exit 1

ordeal replay returns 0 if the failure did not reproduce (which can happen if the code has changed) and 1 if the failure reproduced.