API Reference¶

Complete public API with signatures, parameters, and usage.

Core¶

ChaosTest¶

from ordeal import ChaosTest

Base class for stateful chaos tests. Extends Hypothesis's RuleBasedStateMachine.

Class attributes:

Attribute	Type	Default	Description
`faults`	`list[Fault]`	`[]`	Faults to inject during testing
`swarm`	`bool`	`False`	Random fault subsets per run

Methods:

Method	Returns	Description
`active_faults`	`list[Fault]`	Property: currently active faults
`teardown()`	`None`	Deactivate all faults, clean up

class MyServiceChaos(ChaosTest):
    faults = [timing.timeout("myapp.api.call")]
    swarm = True

    @rule()
    def do_something(self):
        ...

TestMyServiceChaos = MyServiceChaos.TestCase

Hypothesis re-exports¶

These are re-exported from hypothesis.stateful for convenience:

from ordeal import rule, invariant, initialize, precondition, Bundle

Import	Description
`rule(**kwargs)`	Declare a test rule (decorator)
`invariant()`	Declare an invariant check (decorator)
`initialize(**kwargs)`	Declare an initialization rule (decorator)
`precondition(condition)`	Gate a rule on current state (decorator)
`Bundle(name)`	Named collection for data flow between rules

auto_configure¶

auto_configure(
    buggify_probability: float = 0.1,
    seed: int | None = None,
) -> None

Enable chaos testing programmatically. Alternative to --chaos flag.

from ordeal import auto_configure
auto_configure(buggify_probability=0.2, seed=42)

Assertions¶

from ordeal import always, sometimes, reachable, unreachable

Thread safety: The PropertyTracker is fully lock-guarded — safe for free-threaded Python 3.13+/3.14. All access to active and _properties is synchronized.

always¶

always(
    condition: bool,
    name: str,
    *,
    mute: bool = False,
    **details: Any,
) -> None

Assert condition is True every time. Raises AssertionError immediately on violation — whether or not --chaos is active. Violations are never silent by default.

Pass mute=True to record the violation without raising. The violation still shows in the property report — tracked, not hidden. Use when a known issue is too loud and you need to focus on something else.

always(result >= 0, "result is non-negative")
always(not math.isnan(score), "score is never NaN", value=score)
always(response.ok, "API healthy", mute=True)  # known flaky, tracked not fatal

sometimes¶

sometimes(
    condition: bool | Callable[[], bool],
    name: str,
    *,
    attempts: int | None = None,
    **details: Any,
) -> None

Assert condition is True at least once across the session. Deferred — checked at session end via PropertyTracker.

If condition is callable and attempts is set, polls the callable up to attempts times for standalone use.

sometimes(cache_hit, "cache is exercised")
sometimes(lambda: service.ready(), "service starts", attempts=10)

reachable¶

reachable(
    name: str,
    **details: Any,
) -> None

Record that a code path executed. Deferred — must be hit at least once by session end.

except TimeoutError:
    reachable("timeout-handling-path")
    handle_timeout()

unreachable¶

unreachable(
    name: str,
    *,
    mute: bool = False,
    **details: Any,
) -> None

Assert code path never executes. Raises AssertionError immediately — whether or not --chaos is active. Violations are never silent by default. Pass mute=True to record without raising.

if data is None and not error_occurred:
    unreachable("data-lost-silently")

PropertyTracker¶

from ordeal.assertions import tracker

Global singleton. Accumulates property results across runs.

Method	Returns	Description
`reset()`	`None`	Clear all tracked properties
`record(name, prop_type, condition, details)`	`None`	Record a property result
`record_hit(name, prop_type)`	`None`	Record a hit without condition
`results`	`list[Property]`	All tracked properties
`failures`	`list[Property]`	Only failed properties

Property¶

from ordeal.assertions import Property

Attribute	Type	Description
`name`	`str`	Property name
`type`	`str`	`"always"`, `"sometimes"`, `"reachable"`, `"unreachable"`
`hits`	`int`	Times evaluated
`passes`	`int`	Times condition was True
`failures`	`int`	Times condition was False
`first_failure_details`	`dict \| None`	Details from first failure
`passed`	`bool`	Whether property passed (per type semantics)
`summary`	`str`	One-line `"PASS ..."` or `"FAIL ..."`

Buggify¶

from ordeal.buggify import buggify, buggify_value, activate, deactivate, set_seed, is_active

buggify¶

buggify(probability: float | None = None) -> bool

Returns True during chaos testing with configurable probability. Returns False in production (zero cost).

if buggify():
    raise ConnectionError("simulated failure")

if buggify(0.5):  # 50% chance when active
    time.sleep(random.random())

buggify_value¶

buggify_value(normal: _T, faulty: _T, probability: float | None = None) -> _T

Returns faulty during chaos testing, normal otherwise.

return buggify_value(computed_result, float('nan'))
return buggify_value(response, TimeoutError("simulated"), 0.3)

activate / deactivate / set_seed / is_active¶

activate(probability: float = 0.1) -> None     # enable for current thread
deactivate() -> None                             # disable for current thread
set_seed(seed: int) -> None                      # seed RNG for reproducibility
is_active() -> bool                              # check if enabled

Faults¶

Base classes¶

from ordeal.faults import Fault, PatchFault, LambdaFault

Thread safety: The active flag and activate/deactivate transitions are lock-guarded. intermittent_crash and jitter call counters are also lock-protected. Deep-copying faults creates fresh locks (for checkpoint serialization). Safe for free-threaded Python 3.13+.

Fault (ABC):

Method	Description
`activate()`	Enable fault injection
`deactivate()`	Disable fault injection
`reset()`	Deactivate and clear state
`name: str`	Human-readable name
`active: bool`	Whether currently active

PatchFault:

PatchFault(
    target: str,                                    # dotted path: "myapp.api.call"
    wrapper_fn: Callable[[Callable], Callable],     # receives original, returns replacement
    name: str | None = None,
)

Resolves target to a function, replaces it with wrapper_fn(original) when active, restores on deactivation. Lazy resolution (resolved on first activation).

LambdaFault:

LambdaFault(
    name: str,
    on_activate: Callable[[], None],
    on_deactivate: Callable[[], None],
)

I/O faults¶

from ordeal.faults import io

Function	Signature	Description
`error_on_call`	`(target: str, error: type = IOError, message: str = "Simulated I/O error") -> PatchFault`	Target raises error on every call
`return_empty`	`(target: str) -> PatchFault`	Target returns `None`
`corrupt_output`	`(target: str) -> PatchFault`	Target returns random bytes (same length)
`truncate_output`	`(target: str, fraction: float = 0.5) -> PatchFault`	Target output truncated to fraction
`disk_full`	`() -> Fault`	Global: writes fail with `OSError(ENOSPC)`
`permission_denied`	`() -> Fault`	Global: opens fail with `PermissionError`

faults = [
    io.error_on_call("myapp.storage.save", IOError, "disk unreachable"),
    io.corrupt_output("myapp.cache.read"),
    io.disk_full(),
]

Numerical faults¶

from ordeal.faults import numerical

Function	Signature	Description
`nan_injection`	`(target: str) -> PatchFault`	Numeric output becomes NaN
`inf_injection`	`(target: str) -> PatchFault`	Numeric output becomes Inf
`wrong_shape`	`(target: str, expected: tuple, actual: tuple) -> PatchFault`	Returns array with wrong shape
`corrupted_floats`	`(corrupt_type: str = "nan") -> Fault`	Standalone corrupt float source; use `fault.value()`

faults = [
    numerical.nan_injection("myapp.model.predict"),
    numerical.wrong_shape("myapp.embed", (1, 512), (1, 256)),
]

Timing faults¶

from ordeal.faults import timing

Function	Signature	Description
`timeout`	`(target: str, delay: float = 30.0, error: type = TimeoutError) -> PatchFault`	Target raises instantly (no real sleep)
`slow`	`(target: str, delay: float = 1.0, mode: str = "simulate") -> PatchFault`	Add delay; `"simulate"` = instant, `"real"` = actual sleep
`intermittent_crash`	`(target: str, every_n: int = 3, error: type = RuntimeError) -> Fault`	Crash every Nth call; resets on `reset()`
`jitter`	`(target: str, magnitude: float = 0.01) -> Fault`	Add deterministic numeric jitter to return value

faults = [
    timing.timeout("myapp.api.call"),
    timing.intermittent_crash("myapp.worker.process", every_n=5),
    timing.jitter("myapp.sensor.read", magnitude=0.001),
]

Network faults¶

from ordeal.faults import network

For any code making HTTP/API calls. Simulates real-world network failures without requiring network access.

Function	Signature	Description
`http_error`	`(target: str, status_code: int = 500, message: str = "Internal Server Error") -> PatchFault`	Raise `HTTPFaultError` with status code and fake response
`connection_reset`	`(target: str) -> PatchFault`	Raise `ConnectionError`
`rate_limited`	`(target: str, retry_after: float = 30.0) -> PatchFault`	Raise HTTP 429 with `Retry-After` header
`auth_failure`	`(target: str, status_code: int = 401) -> PatchFault`	Raise HTTP 401/403
`dns_failure`	`(target: str) -> PatchFault`	Raise `OSError` (simulated DNS resolution failure)
`partial_response`	`(target: str, fraction: float = 0.5) -> PatchFault`	Truncate response to fraction of content
`intermittent_http_error`	`(target: str, every_n: int = 3, status_code: int = 503, message: str = "Service Unavailable") -> Fault`	HTTP error every Nth call; resets on `reset()`

faults = [
    network.http_error("myapp.client.post", status_code=503),
    network.rate_limited("myapp.client.get", retry_after=60),
    network.connection_reset("myapp.client.post"),
    network.dns_failure("myapp.client.resolve"),
]

HTTPFaultError carries .status_code and a duck-typed .response object compatible with requests/httpx patterns.

Concurrency faults¶

from ordeal.faults import concurrency

For testing thread-safety, resource contention, and concurrent access patterns.

Function	Signature	Description
`contended_call`	`(target: str, contention: float = 0.05, mode: str = "simulate") -> PatchFault`	Wrap target with a shared lock; simulates resource contention
`delayed_release`	`(target: str, delay: float = 0.5, mode: str = "simulate") -> PatchFault`	Add delay after target returns (simulates slow cleanup)
`thread_boundary`	`(target: str, timeout: float = 5.0) -> Fault`	Execute target on a background thread (finds thread-local state bugs)
`stale_state`	`(obj: Any, attr: str, stale_value: Any) -> Fault`	When active, set `obj.attr = stale_value`; restore on deactivation

faults = [
    concurrency.contended_call("myapp.pool.acquire", contention=0.1),
    concurrency.thread_boundary("myapp.cache.get"),
    concurrency.stale_state(my_service, "config", old_config),
]

Explorer¶

from ordeal.explore import Explorer, ExplorationResult, Failure, ProgressSnapshot, CoverageCollector, Checkpoint

Explorer¶

Explorer(
    test_class: type,                           # ChaosTest subclass
    *,
    target_modules: list[str] | None = None,    # modules to track for coverage
    seed: int = 42,
    max_checkpoints: int = 256,
    checkpoint_prob: float = 0.4,               # probability of starting from checkpoint
    checkpoint_strategy: str = "energy",        # "energy", "uniform", "recent"
    fault_toggle_prob: float = 0.3,
    record_traces: bool = False,
    workers: int = 1,                           # 0 = auto (os.cpu_count())
    share_edges: bool = True,                   # shared-memory edge bitmap for workers
)

explorer.run(
    *,
    max_time: float = 60.0,
    max_runs: int | None = None,
    steps_per_run: int = 50,
    shrink: bool = True,
    max_shrink_time: float = 30.0,
    progress: Callable[[ProgressSnapshot], None] | None = None,
) -> ExplorationResult

explorer = Explorer(
    MyServiceChaos,
    target_modules=["myapp"],
    checkpoint_strategy="energy",
)
result = explorer.run(max_time=120, steps_per_run=100)
print(result.summary())

ExplorationResult¶

Attribute	Type	Description
`total_runs`	`int`	Runs completed
`total_steps`	`int`	Total steps across all runs
`unique_edges`	`int`	Unique control-flow edges discovered
`checkpoints_saved`	`int`	Checkpoints in corpus
`failures`	`list[Failure]`	Failures found
`duration_seconds`	`float`	Wall-clock time
`edge_log`	`list[tuple[int, int]]`	`(run_id, cumulative_edges)`
`traces`	`list[Trace]`	Recorded traces (if `record_traces=True`)
`summary()`	`str`	Human-readable report

Failure¶

Attribute	Type	Description
`error`	`Exception`	The exception raised
`step`	`int`	Step number when failure occurred
`run_id`	`int`	Run that found this failure
`active_faults`	`list[str]`	Faults active at failure time
`rule_log`	`list[str]`	Sequence of rules/faults leading to failure
`trace`	`Trace \| None`	Full trace for replay

ProgressSnapshot¶

Attribute	Type	Description
`elapsed`	`float`	Seconds since start
`total_runs`	`int`	Runs completed
`total_steps`	`int`	Steps completed
`unique_edges`	`int`	Edges discovered
`checkpoints`	`int`	Checkpoints saved
`failures`	`int`	Failures found
`runs_per_second`	`float`	Throughput

CoverageCollector¶

CoverageCollector(target_paths: list[str])

Method	Returns	Description
`start()`	`None`	Begin collecting edge coverage via `sys.settrace`
`stop()`	`frozenset[int]`	Stop and return observed edges
`snapshot()`	`frozenset[int]`	Current edges without stopping

Trace¶

from ordeal.trace import Trace, TraceStep, TraceFailure, replay, shrink

Trace¶

Attribute	Type	Description
`run_id`	`int`	Run identifier
`seed`	`int`	RNG seed
`test_class`	`str`	`"module.path:ClassName"`
`from_checkpoint`	`int \| None`	Checkpoint run_id, or `None` if fresh
`steps`	`list[TraceStep]`	Ordered steps
`failure`	`TraceFailure \| None`	Failure info if applicable
`edges_discovered`	`int`	New edges found
`duration`	`float`	Run duration

Method	Returns	Description
`to_dict()`	`dict`	JSON-serializable dict
`save(path)`	`None`	Write to JSON file
`Trace.from_dict(data)`	`Trace`	Reconstruct from dict
`Trace.load(path)`	`Trace`	Load from JSON file

TraceStep¶

Attribute	Type	Description
`kind`	`str`	`"rule"` or `"fault_toggle"`
`name`	`str`	Rule name or `"+fault"` / `"-fault"`
`params`	`dict`	Parameters drawn for this step
`active_faults`	`list[str]`	Faults active at this step
`edge_count`	`int`	Cumulative edges at this step
`timestamp_offset`	`float`	Time since run start

replay¶

replay(
    trace: Trace,
    test_class: type | None = None,     # auto-resolved from trace.test_class if None
) -> Exception | None

Replay a trace step-by-step. Returns the exception if it reproduces, None otherwise.

shrink¶

shrink(
    trace: Trace,
    test_class: type | None = None,
    *,
    max_time: float = 30.0,
) -> Trace

Shrink a failing trace to the minimal reproducing sequence. Three phases: delta debugging, step elimination, fault simplification.

QuickCheck¶

from ordeal.quickcheck import quickcheck, strategy_for_type, biased

quickcheck¶

@quickcheck
def test_fn(x: int, y: str) -> None:
    ...

@quickcheck(max_examples=500)
def test_fn(x: float) -> None:
    ...

@quickcheck(x=st.integers(min_value=0))  # override specific parameter
def test_fn(x: int, y: str) -> None:
    ...

Decorator. Infers strategies from type hints, runs as property test with max_examples=100 (default).

strategy_for_type¶

strategy_for_type(tp: type, *, _depth: int = 0) -> st.SearchStrategy

Derive a boundary-biased strategy from a type hint. Handles: int, float, str, bool, bytes, None, list[T], dict[K, V], tuple, set, Union, Optional, dataclass, and Pydantic BaseModel (v2+ — derives strategies from model_fields with constraint support: ge/le/gt/lt, min_length/max_length). Recursion depth limited to 5.

biased¶

Namespace of boundary-biased strategies:

biased.integers(min_value=None, max_value=None) -> SearchStrategy[int]
biased.floats(min_value=None, max_value=None, *, allow_nan=False, allow_infinity=False) -> SearchStrategy[float]
biased.strings(min_size=0, max_size=100) -> SearchStrategy[str]
biased.bytes_(min_size=0, max_size=100) -> SearchStrategy[bytes]
biased.lists(elements, min_size=0, max_size=50) -> SearchStrategy[list]

Biased toward boundary values: 0, -1, +1, empty, max-length, powers of 2, range endpoints.

Invariants¶

from ordeal.invariants import (
    Invariant, no_nan, no_inf, finite, bounded, monotonic,
    unique, non_empty, unit_normalized, orthonormal, symmetric,
    positive_semi_definite, rank_bounded, mean_bounded, variance_bounded,
)

Invariant¶

Invariant(name: str, check_fn: Callable[..., None])

Method	Description
`__call__(value, *, name=None)`	Run check, raise `AssertionError` on violation
`__and__(other)`	Compose: `(a & b)(x)` checks both

Built-in invariants¶

Invariant	Signature	Description
`no_nan`	singleton	Reject NaN in scalars, sequences, numpy arrays
`no_inf`	singleton	Reject Inf/-Inf
`finite`	singleton	`no_nan & no_inf`
`bounded`	`(lo: float, hi: float)`	All values in `[lo, hi]`
`monotonic`	`(*, strict: bool = False)`	Non-decreasing (or strictly increasing)
`unique`	`(*, key: Callable \| None = None)`	No duplicates (optionally by key)
`non_empty`	`()`	Not empty/falsy
`unit_normalized`	`(*, tol: float = 1e-6)`	Row vectors have L2 norm ~1.0
`orthonormal`	`(*, tol: float = 1e-6)`	Rows form orthonormal set
`symmetric`	`(*, tol: float = 1e-6)`	Matrix equals its transpose
`positive_semi_definite`	`(*, tol: float = 1e-6)`	All eigenvalues >= -tol
`rank_bounded`	`(min_rank=0, max_rank=None)`	Matrix rank in range
`mean_bounded`	`(lo: float, hi: float)`	Mean in `[lo, hi]`
`variance_bounded`	`(lo: float, hi: float)`	Variance in `[lo, hi]`

valid_score = finite & bounded(0, 1)
valid_score(model_output)

valid_embedding = unit_normalized() & bounded(-1, 1)
valid_embedding(embedding_matrix)

Simulate¶

from ordeal.simulate import Clock, FileSystem

Clock¶

Clock(start: float = 0.0)

Method	Signature	Description
`time()`	`-> float`	Current simulated time
`sleep(seconds)`	`-> None`	Advance by seconds (instant)
`advance(seconds)`	`-> None`	Advance, firing timers whose deadline passed
`set_timer(delay, callback)`	`-> int`	Schedule callback; returns timer ID
`pending_timers`	`-> int`	Property: unfired timer count
`patch()`	context manager	Patch `time.time()` and `time.sleep()`

clock = Clock()
clock.set_timer(10.0, lambda: print("fired"))
clock.advance(15.0)  # timer fires at t=10

with clock.patch():
    import time
    time.sleep(3600)  # instant

FileSystem¶

FileSystem()

Method	Signature	Description
`write(path, data)`	`(str, str \| bytes) -> None`	Write data, respecting faults
`read(path)`	`(str) -> bytes`	Read raw bytes, respecting faults
`read_text(path, encoding="utf-8")`	`(str, str) -> str`	Read decoded string
`exists(path)`	`(str) -> bool`	True if path exists (no "missing" fault)
`delete(path)`	`(str) -> None`	Remove path
`list_dir(prefix="/")`	`(str) -> list[str]`	Paths starting with prefix
`inject_fault(path, fault)`	`(str, str) -> None`	Inject: `"corrupt"`, `"missing"`, `"readonly"`, `"full"`
`clear_fault(path)`	`(str) -> None`	Remove fault on path
`clear_all_faults()`	`-> None`	Remove all faults
`reset()`	`-> None`	Remove all files and faults

Mutations¶

from ordeal.mutations import mutate_function_and_test, mutate_and_test, validate_mined_properties, generate_mutants, MutationResult, Mutant

validate_mined_properties¶

validate_mined_properties(
    target: str,                                # dotted path: "myapp.scoring.compute"
    max_examples: int = 100,                    # examples for mine()
    operators: list[str] | None = None,         # None = all operators
) -> MutationResult

Mine properties of target, then mutate it and check the properties catch the mutations. Bridges mine() and mutation testing. Surviving mutants reveal properties too weak to detect real bugs. Used automatically by ordeal audit.

mutate_function_and_test¶

mutate_function_and_test(
    target: str,                                # dotted path: "myapp.scoring.compute"
    test_fn: Callable[[], None],                # test to run against each mutant
    operators: list[str] | None = None,         # None = all operators
) -> MutationResult

Mutate a single function via PatchFault. Safer than module-level. Recommended.

mutate_and_test¶

mutate_and_test(
    target: str,                                # module path: "myapp.scoring"
    test_fn: Callable[[], None],
    operators: list[str] | None = None,
) -> MutationResult

Mutate entire module, swap in sys.modules. Only works if tests import the module, not individual functions.

generate_mutants¶

generate_mutants(
    source: str,                                # source code string
    operators: list[str] | None = None,
) -> list[tuple[Mutant, ast.Module]]

Generate all possible mutants from source. Returns list of (Mutant, modified_ast).

MutationResult¶

Attribute	Type	Description
`target`	`str`	What was mutated
`mutants`	`list[Mutant]`	All generated mutants
`total`	`int`	Total mutants
`killed`	`int`	Mutants caught by tests
`survived`	`list[Mutant]`	Mutants tests missed
`score`	`float`	Kill ratio (1.0 = all caught)
`summary()`	`str`	Human-readable report

Mutant¶

Attribute	Type	Description
`operator`	`str`	`"arithmetic"`, `"comparison"`, `"negate"`, `"return_none"`, `"boundary"`, `"constant"`, `"delete"`
`description`	`str`	What changed: `"+ -> -"`
`line`	`int`	Source line
`col`	`int`	Source column
`killed`	`bool`	Whether test caught it
`error`	`str \| None`	Compilation error if mutant was invalid
`location`	`str`	`"L42:8"`

Available operators: arithmetic, comparison, negate, return_none, boundary, constant, delete

Auto¶

from ordeal.auto import scan_module, fuzz, chaos_for, register_fixture

scan_module¶

scan_module(
    module: str | ModuleType,
    *,
    max_examples: int = 50,
    check_return_type: bool = True,
    fixtures: dict[str, SearchStrategy] | None = None,
) -> ScanResult

Smoke-test every public function. Generates random inputs from type hints, checks: no crash, return type matches.

result = scan_module("myapp.scoring")
assert result.passed
print(result.summary())

fuzz¶

fuzz(
    fn: Any,
    *,
    max_examples: int = 1000,
    check_return_type: bool = False,
    **fixtures: SearchStrategy | Any,
) -> FuzzResult

Deep-fuzz a single function.

result = fuzz(myapp.scoring.compute, model=model_strategy)
assert result.passed

chaos_for¶

chaos_for(
    module: str | ModuleType,
    *,
    fixtures: dict[str, SearchStrategy] | None = None,
    invariants: list[Invariant] | None = None,
    faults: list[Fault] | None = None,
    max_examples: int = 50,
    stateful_step_count: int = 30,
) -> type

Auto-generate a ChaosTest from a module's public API. Each function becomes a @rule.

TestScoring = chaos_for(
    "myapp.scoring",
    invariants=[finite, bounded(0, 1)],
    faults=[timing.timeout("myapp.scoring.predict")],
)

register_fixture¶

register_fixture(name: str, strategy: SearchStrategy) -> None

Register a named fixture for auto-scan. Highest priority after explicit fixtures.

ScanResult¶

Attribute	Type	Description
`module`	`str`	Module tested
`functions`	`list[FunctionResult]`	Per-function results
`skipped`	`list[tuple[str, str]]`	`(name, reason)` for skipped functions
`passed`	`bool`	All functions passed
`total`	`int`	Functions tested
`failed`	`int`	Failures
`summary()`	`str`	Human-readable report

FuzzResult¶

Attribute	Type	Description
`function`	`str`	Function tested
`examples`	`int`	Examples run
`failures`	`list[Exception]`	Exceptions found
`passed`	`bool`	No failures
`summary()`	`str`	Human-readable report

Strategies¶

from ordeal.strategies import corrupted_bytes, adversarial_strings, nan_floats, edge_integers, mixed_types

Strategy	Signature	Description
`corrupted_bytes`	`(min_size=0, max_size=1024)`	Edge-case bytes: empty, all-zero, all-0xFF
`adversarial_strings`	`(min_size=0, max_size=256)`	SQL injection, XSS, path traversal, null bytes
`nan_floats`	`()`	NaN, Inf, -Inf, subnormals, boundaries
`edge_integers`	`(bits=64)`	0, +/-1, min/max for N bits
`mixed_types`	`()`	None, bool, int, float, str, bytes, lists, dicts

from hypothesis import given
from ordeal.strategies import adversarial_strings

@given(s=adversarial_strings())
def test_parser_doesnt_crash(s):
    parse(s)  # should never raise unhandled exception

Audit¶

from ordeal.audit import audit, audit_report, ModuleAudit

audit¶

audit(
    module: str,                    # dotted path: "myapp.scoring"
    *,
    test_dir: str = "tests",       # directory containing existing tests
    max_examples: int = 20,        # Hypothesis examples per function
) -> ModuleAudit

Audit a single module: measure existing test coverage vs ordeal-migrated tests. Every number in the result is either [verified] or FAILED: reason — the audit never silently returns 0%.

Coverage is measured via coverage.py JSON reports (stable schema), not terminal parsing. Results are cross-checked for consistency. Generated test files are saved to .ordeal/test_<module>_migrated.py.

audit_report¶

audit_report(
    modules: list[str],
    *,
    test_dir: str = "tests",
    max_examples: int = 20,
) -> str

Audit multiple modules and produce a formatted summary report. Every number labeled [verified] or FAILED.

ModuleAudit¶

Attribute	Type	Description
`module`	`str`	Module path
`current_test_count`	`int`	Existing test count
`current_test_lines`	`int`	Lines of existing test code
`current_coverage`	`CoverageMeasurement`	Coverage from existing tests (with status)
`migrated_test_count`	`int`	Tests in generated migrated file
`migrated_lines`	`int`	Lines in generated migrated file
`migrated_coverage`	`CoverageMeasurement`	Coverage from migrated tests (with status)
`mined_properties`	`list[str]`	Properties with Wilson CI bounds
`gap_functions`	`list[str]`	Functions needing fixtures
`suggestions`	`list[str]`	Actionable suggestions for uncovered lines
`mutation_score`	`str`	e.g. `"8/10 (80%)"` — how many mutations mined properties catch
`not_checked`	`list[str]`	Known unknowns — what ordeal structurally cannot verify
`warnings`	`list[str]`	Every problem visible here
`generated_test`	`str`	Full generated test file content
`coverage_preserved`	`bool`	True if migrated >= current - 2% (False if either failed)
`summary()`	`str`	Human-readable report with `[verified]`/`FAILED` labels

CoverageMeasurement¶

Every coverage number carries its epistemic status.

from ordeal.audit import CoverageMeasurement, Status

Attribute	Type	Description
`status`	`Status`	`VERIFIED` or `FAILED`
`result`	`CoverageResult \| None`	Structured data if verified
`error`	`str \| None`	Explanation if failed
`percent`	`float`	Coverage %, or 0.0 if failed
`missing_lines`	`frozenset[int]`	Uncovered lines, or empty if failed

CoverageResult¶

from ordeal.audit import CoverageResult

Attribute	Type	Description
`percent`	`float`	Coverage percentage
`total_statements`	`int`	Total source statements
`missing_count`	`int`	Number of uncovered statements
`missing_lines`	`frozenset[int]`	Uncovered line numbers
`source`	`str`	How measured (e.g. `"coverage.py JSON"`)

wilson_lower¶

wilson_lower(successes: int, total: int, z: float = 1.96) -> float

Lower bound of the Wilson score confidence interval. For mined properties: 500/500 at 95% CI gives lower bound ~0.994, meaning "holds with >=99.4% probability" — not "always holds."

Diff¶

from ordeal.diff import diff, DiffResult, Mismatch

Differential testing — compare two implementations on the same random inputs.

diff¶

diff(
    fn_a: Callable,                             # reference function
    fn_b: Callable,                             # function to compare
    *,
    max_examples: int = 100,
    rtol: float | None = None,                  # relative tolerance
    atol: float | None = None,                  # absolute tolerance
    compare: Callable[[Any, Any], bool] | None = None,  # custom comparator
    **fixtures: SearchStrategy | Any,
) -> DiffResult

Compare two functions for equivalence. Infers strategies from fn_a's type hints. Both functions must accept the same parameters.

# Exact comparison
result = diff(score_v1, score_v2)
assert result.equivalent

# Floating-point tolerance
result = diff(compute_old, compute_new, rtol=1e-6)

# Custom comparator
result = diff(fn_a, fn_b, compare=lambda a, b: a.status == b.status)

DiffResult¶

Attribute	Type	Description
`function_a`	`str`	Name of reference function
`function_b`	`str`	Name of compared function
`total`	`int`	Examples tested
`mismatches`	`list[Mismatch]`	Inputs where outputs differed
`equivalent`	`bool`	True if no mismatches
`summary()`	`str`	Human-readable report

Mismatch¶

Attribute	Type	Description
`args`	`dict`	Input arguments that caused divergence
`output_a`	`Any`	Output from `fn_a`
`output_b`	`Any`	Output from `fn_b`

Scaling¶

from ordeal.scaling import usl, amdahl, optimal_n, peak_throughput, fit_usl, analyze, benchmark

Universal Scaling Law (USL) and Amdahl's Law for predicting parallel exploration performance.

usl¶

usl(n: float, sigma: float, kappa: float) -> float

C(N) = N / [1 + sigma*(N-1) + kappa*N*(N-1)]. Returns relative throughput (C(1) = 1).

sigma: contention coefficient — fraction of serialized work
kappa: coherence coefficient — cross-worker sync cost (grows quadratically)

amdahl / optimal_n / peak_throughput¶

amdahl(n: float, sigma: float) -> float          # USL with kappa=0
optimal_n(sigma: float, kappa: float) -> float    # worker count at peak throughput
peak_throughput(sigma: float, kappa: float) -> float

fit_usl¶

fit_usl(measurements: list[tuple[int | float, float]]) -> tuple[float, float]

Fit sigma and kappa from (N, throughput) pairs via least squares. Requires >= 3 data points.

analyze¶

analyze(measurements: list[tuple[int | float, float]]) -> ScalingAnalysis

Fit USL and return full analysis with diagnosis.

benchmark¶

benchmark(
    test_class: type,
    *,
    target_modules: list[str] | None = None,
    max_workers: int | None = None,       # default: CPU count
    time_per_trial: float = 10.0,
    seed: int = 42,
    steps_per_run: int = 50,
    metric: str = "runs",                 # "runs" or "edges"
) -> ScalingAnalysis

Benchmark exploration at N=1, 2, 4, ... workers, measure throughput, fit USL parameters automatically.

from ordeal.scaling import benchmark
analysis = benchmark(MyServiceChaos, target_modules=["myapp"])
print(analysis.summary())

ScalingAnalysis¶

Attribute	Type	Description
`sigma`	`float`	Contention coefficient
`kappa`	`float`	Coherence coefficient
`n_optimal`	`float`	Worker count at peak throughput
`peak`	`float`	Maximum achievable throughput multiplier
`regime`	`str`	`"linear"`, `"amdahl"`, or `"usl"`
`efficiency(n)`	`float`	Parallel efficiency C(N)/N at N workers
`throughput(n)`	`float`	Predicted relative throughput at N workers
`summary()`	`str`	Human-readable report with diagnosis

Mine¶

from ordeal.mine import mine, mine_pair, MineResult, MinedProperty

mine¶

mine(
    fn: Callable,
    *,
    max_examples: int = 500,
    **fixtures: SearchStrategy | Any,
) -> MineResult

Discover likely properties of a function by running it many times with random inputs and observing patterns in outputs.

Properties checked: type consistency, never None, no NaN, non-negative, bounded [0,1], never empty, deterministic, idempotent, involution (f(f(x)) == x), commutative (f(a,b) == f(b,a)), associative (f(f(a,b),c) == f(a,f(b,c))), observed range, monotonicity (per numeric input parameter), and length relationships (len(output) == len(input)). Float comparisons use math.isclose (rel_tol=1e-9, abs_tol=1e-12) so rounding noise doesn't cause false negatives.

result = mine(myapp.scoring.compute, max_examples=500)
for p in result.universal:
    print(p)
# ALWAYS  output type is float (500/500)
# ALWAYS  deterministic (50/50)
# ALWAYS  output in [0, 1] (500/500)

mine_pair¶

mine_pair(
    f: Callable,
    g: Callable,
    *,
    max_examples: int = 200,
    **fixtures: SearchStrategy | Any,
) -> MineResult

Discover relational properties between two functions. Checks roundtrip (g(f(x)) == x), reverse roundtrip (f(g(x)) == x), and commutative composition (f(g(x)) == g(f(x))). Strategies are inferred from f's signature.

result = mine_pair(encode, decode)
# roundtrip decode(encode(x)) == x: ALWAYS

MineResult¶

Results are separated into three categories: checked and applicable, checked but not relevant, and structurally impossible to check.

Attribute	Type	Description
`function`	`str`	Function name
`examples`	`int`	Examples run
`properties`	`list[MinedProperty]`	Checked and applicable (total > 0)
`not_applicable`	`list[str]`	Checked but not relevant (e.g. "bounded [0,1]" for string output)
`not_checked`	`list[str]`	Structural limitations — things mine() cannot verify
`universal`	`list[MinedProperty]`	Properties that held on every example
`likely`	`list[MinedProperty]`	Properties with >= 95% confidence
`summary()`	`str`	Human-readable report

STRUCTURAL_LIMITATIONS¶

from ordeal.mine import STRUCTURAL_LIMITATIONS

Things mine() fundamentally cannot discover from random sampling — these require domain knowledge:

Output value correctness (fuzz checks crash safety, not behavior)
Cross-function consistency (e.g., batch == map of individual)
Domain-specific invariants (e.g., weighted sum, refusal detection)
Error handling for intentionally invalid inputs
Performance and resource usage
Concurrency and thread safety
State mutation and side effects

MinedProperty¶

Attribute	Type	Description
`name`	`str`	Property description
`holds`	`int`	Times property held
`total`	`int`	Times property was checked
`counterexample`	`dict \| None`	First counterexample if not universal
`confidence`	`float`	`holds / total`
`universal`	`bool`	True if held on every example

validate_mined_properties¶

from ordeal.mutations import validate_mined_properties

validate_mined_properties(
    target: str,                    # dotted path: "myapp.scoring.compute"
    max_examples: int = 100,
    operators: list[str] | None = None,
) -> MutationResult

Mine properties of target, then mutate the code and check whether the mined properties catch the mutations. Surviving mutants reveal properties that are too weak. Used by ordeal audit to report mutation scores.

Metamorphic¶

from ordeal.metamorphic import Relation, RelationSet, metamorphic

Metamorphic testing checks relationships between outputs rather than exact values. Define a relation that transforms input and checks how outputs relate, then apply it as a decorator.

Relation¶

Relation(
    name: str,                                              # human-readable label
    transform: Callable[[tuple], tuple],                    # transform input args
    check: Callable[[Any, Any], bool],                      # (original_out, transformed_out) -> bool
)

Compose with +: (relation_a + relation_b) checks both.

metamorphic¶

@metamorphic(*relations: Relation | RelationSet, max_examples: int = 100)
def test_fn(x: int, y: int):
    return x + y

Decorator. For each Hypothesis-generated input, runs the function on original and transformed inputs, then asserts the relation's check holds. Strategies inferred from type hints.

commutative = Relation(
    "commutative",
    transform=lambda args: (args[1], args[0]),
    check=lambda a, b: a == b,
)

negate_involution = Relation(
    "negate is involution",
    transform=lambda args: (-args[0],),
    check=lambda a, b: abs(a + b) < 1e-6,
)

@metamorphic(commutative)
def test_add(x: int, y: int):
    return x + y

@metamorphic(negate_involution)
def test_negate(x: float):
    return -x

Config¶

from ordeal.config import load_config, OrdealConfig, ExplorerConfig, TestConfig, ReportConfig, ScanConfig

load_config¶

load_config(path: str | Path = "ordeal.toml") -> OrdealConfig

Load and validate an ordeal.toml. Raises FileNotFoundError if missing, ConfigError on invalid keys/types.

OrdealConfig¶

Attribute	Type	Default
`explorer`	`ExplorerConfig`	see below
`tests`	`list[TestConfig]`	`[]`
`scan`	`list[ScanConfig]`	`[]`
`report`	`ReportConfig`	see below

ExplorerConfig¶

Attribute	Type	Default
`target_modules`	`list[str]`	`[]`
`max_time`	`float`	`60.0`
`max_runs`	`int \| None`	`None`
`seed`	`int`	`42`
`max_checkpoints`	`int`	`256`
`checkpoint_prob`	`float`	`0.4`
`checkpoint_strategy`	`str`	`"energy"`
`steps_per_run`	`int`	`50`
`fault_toggle_prob`	`float`	`0.3`
`workers`	`int`	`1`

TestConfig¶

Attribute	Type	Required
`class_path`	`str`	Yes
`steps_per_run`	`int \| None`	No
`swarm`	`bool \| None`	No

resolve() -> type — import and return the ChaosTest class.

ReportConfig¶

Attribute	Type	Default
`format`	`str`	`"text"`
`output`	`str`	`"ordeal-report.json"`
`traces`	`bool`	`False`
`traces_dir`	`str`	`".ordeal/traces"`
`verbose`	`bool`	`False`

ScanConfig¶

Attribute	Type	Default
`module`	`str`	required
`max_examples`	`int`	`50`
`fixtures`	`dict[str, str]`	`{}`