Auto — Zero-Boilerplate Testing¶
Point ordeal at your code. Get tests. No scaffolding.
scan_module¶
Smoke-test every public function in a module:
from ordeal.auto import scan_module
result = scan_module("myapp.scoring")
print(result.summary())
# scan_module('myapp.scoring'): 8 functions, 1 failed
# PASS compute
# PASS normalize
# FAIL transform: ZeroDivisionError: division by zero
# PASS clamp
# ...
assert result.passed
Checks per function: no crash with random valid inputs + return type matches annotation.
With fixtures for params that can't be inferred from types:
fuzz¶
Deep-fuzz a single function (1000 examples by default):
from ordeal.auto import fuzz
result = fuzz(myapp.scoring.compute)
assert result.passed
# Override a parameter
result = fuzz(myapp.scoring.compute, model=model_strategy)
chaos_for¶
Auto-generate a ChaosTest from a module's public API. Each function becomes a @rule. The nemesis toggles faults. Invariants are checked on every return value.
from ordeal.auto import chaos_for
from ordeal.invariants import finite, bounded
from ordeal.faults import timing
TestScoring = chaos_for("myapp.scoring")
# With depth:
TestScoring = chaos_for(
"myapp.scoring",
fixtures={"model": model_strategy},
invariants=[finite, bounded(0, 1)],
faults=[timing.timeout("myapp.scoring.predict")],
)
Returns a pytest-discoverable TestCase. Run with pytest.
mine — discover properties automatically¶
mine() runs a function many times with random inputs and observes patterns in outputs. It checks: type consistency, never None, no NaN, non-negative, bounded [0,1], never empty, deterministic, idempotent, involution (f(f(x)) == x), commutative (f(a,b) == f(b,a)), associative (f(f(a,b),c) == f(a,f(b,c))), observed range, monotonicity, and length relationships. Float comparisons use math.isclose so rounding noise doesn't cause false negatives.
You confirm which properties are real and turn them into tested invariants:
from ordeal.mine import mine
result = mine(myapp.scoring.compute, max_examples=500)
for p in result.universal:
print(p)
# ALWAYS output type is float (500/500)
# ALWAYS deterministic (50/50)
# ALWAYS output in [0, 1] (500/500)
Properties are probabilistic — the confidence is stated, not assumed. 500/500 doesn't mean "always holds"; it means "holds with >= 99.4% probability at 95% CI" (Wilson score interval).
mine() also tells you what it cannot check — see result.not_checked for structural limitations (correctness, concurrency, domain-specific invariants). These are the tests you need to write manually.
CLI¶
Mine from the terminal without writing Python:
ordeal mine myapp.scoring.compute # single function
ordeal mine myapp.scoring # all public functions in module
ordeal mine myapp.scoring.compute -n 1000 # more examples for tighter CI
Generated assertions¶
ordeal audit uses mine() to generate real @quickcheck assertion tests (not just comments) when your functions have type hints:
# Auto-generated by ordeal audit
@quickcheck
def test_compute_properties(x: float):
"""Mined properties for myapp.scoring.compute."""
result = myapp.scoring.compute(x)
assert result is not None # >=93.0% CI
assert 0 <= result <= 1 # >=93.0% CI
assert myapp.scoring.compute(x) == result # >=93.0% CI
Functions without type hints fall back to informational comments with confidence bounds.
mine_pair — discover cross-function properties¶
Check if two functions are inverses, roundtrip-safe, or commutative under composition:
from ordeal.mine import mine_pair
result = mine_pair(encode, decode, max_examples=200)
for p in result.universal:
print(p)
# ALWAYS roundtrip decode(encode(x)) == x (48/48)
# ALWAYS roundtrip encode(decode(x)) == x (45/45)
Properties checked:
- Roundtrip:
g(f(x)) == x— the composition is the identity - Reverse roundtrip:
f(g(x)) == x— the other direction - Commutative composition:
f(g(x)) == g(f(x))— order doesn't matter
Strategies are inferred from f's type hints. Both functions must accept each other's output as input for roundtrip checks to apply.
diff — compare two implementations¶
Differential testing: run two functions on the same random inputs and check their outputs match. Catches regressions, validates refactors, and verifies backend ports:
from ordeal.diff import diff
# Exact comparison — are v1 and v2 identical?
result = diff(score_v1, score_v2, max_examples=200)
assert result.equivalent, result.summary()
# Floating-point tolerance — are old and new close enough?
result = diff(compute_old, compute_new, rtol=1e-6)
# Custom comparator — only care about specific fields
result = diff(api_v1, api_v2, compare=lambda a, b: a.status == b.status)
When outputs differ, result.mismatches contains the exact inputs and both outputs so you can debug the divergence. Strategies are inferred from fn_a's type hints — both functions must accept the same parameters.
Use cases: - Refactoring: verify the new implementation matches the old - Porting: compare a Python prototype against a Rust/C extension - Regression testing: ensure a bugfix doesn't change other outputs
register_fixture — teach ordeal your types¶
When your codebase has domain-specific types that ordeal can't infer from hints, register a fixture once and every auto tool picks it up:
from ordeal.auto import register_fixture
import hypothesis.strategies as st
# Register once at import time (e.g., in conftest.py)
register_fixture("model", st.sampled_from(["gpt-4", "claude-3", "llama-70b"]))
register_fixture("api_key", st.just("sk-test-key-12345"))
register_fixture("config", st.fixed_dictionaries({
"temperature": st.floats(0.0, 2.0),
"max_tokens": st.integers(1, 4096),
}))
Now scan_module, fuzz, mine, and diff all know how to generate these parameters without explicit fixtures= overrides:
# These "just work" because the fixtures are registered
result = scan_module("myapp.llm") # uses registered "model" and "api_key"
result = fuzz(myapp.llm.generate) # same
result = mine(myapp.llm.generate) # same
Priority order when resolving a parameter:
- Explicit
fixtures={"model": ...}passed to the function (highest) register_fixture("model", ...)global registry- Type hint inference via
strategy_for_type - Parameter name heuristics (e.g.,
"seed"→ integers,"probability"→ floats 0-1)
Register fixtures for: API clients, database connections, model objects, configuration dicts, authentication tokens — anything that can't be generated from a type hint alone.
How it works¶
All auto primitives (scan_module, fuzz, chaos_for, mine, diff):
- Scan the module for public, non-class callables
- Infer strategies from type hints (via
ordeal.quickcheck.strategy_for_type) - Check registered fixtures for unresolved parameters
- Fall back to parameter name heuristics (
"threshold"→ floats,"count"→ integers) - Accept explicit
fixturesoverrides (highest priority) - Skip functions that can't be tested (no hints, no fixtures, no heuristics)
Functions starting with _ are skipped. Functions with defaults for all params work even without type hints.