Functional Coverage Roadmap¶
Direction
Ordeal should get better at covering behavior, not drift into generic offensive security tooling. The target is simple: cover more real service behavior, explain what was and was not exercised, and turn failures into small readable regressions.
What exists now¶
ChaosTestalready has shrinkable fault-subset swarm mode inordeal/chaos.py.- The Explorer already has joint rule+fault swarm, coverage-directed configs, and energy updates in
ordeal/explore.py. - Native boundary testing exists at the subprocess layer:
subprocess_timeout(),subprocess_delay(), andcorrupt_stdout()inordeal/faults/io.py. - The gap is not "no swarm" or "no boundary testing." The gap is proving efficacy, exposing coverage, and turning child-process failures into first-class findings.
P1: Next¶
1. Swarm observability and efficacy¶
- Outcome: know whether swarm improved coverage or findings on a target, not just that a subset was selected.
- Modules:
ordeal/explore.py,ordeal/chaos.py,ordeal/cli.py,tests/test_explore.py. - Ship when:
- exploration results report the top swarm configs, dead configs, and edges/failures per config
- the CLI can print a compact swarm summary
- tests include a swarm-vs-no-swarm ablation on a benchmark target
2. Behavior coverage reporting¶
- Outcome: report covered behaviors, not only lines and raw edges.
- Modules:
ordeal/explore.py,ordeal/assertions.py,ordeal/state.py,ordeal/cli.py. - Ship when:
- results include a
rule x fault x propertycoverage view - reports call out unexercised retry, fallback, and recovery paths when detectable
- traces show which properties were under stress when a failure happened
3. Native-boundary crash findings¶
- Outcome: treat "the worker died" as a first-class finding with a readable cause.
- Modules:
ordeal/faults/io.py,ordeal/supervisor.py,ordeal/trace.py,ordeal/cli.py. - Ship when:
- Ordeal records child exit mode: nonzero code, signal death, timeout, truncated output
- failures shrink to the smallest request or operation sequence that still kills the child
- a crash in a model worker or helper process does not crash the main exploration run
4. Pairwise and t-wise swarm coverage¶
- Outcome: cover important fault combinations early instead of relying only on coin-flip subsets.
- Modules:
ordeal/explore.py,ordeal/chaos.py,ordeal/config.py. - Ship when:
- swarm can target pairwise coverage of fault sets
- configs can be budgeted by time or run count
- results report uncovered fault pairs
P2: After that¶
5. Long-lived worker harness¶
- Outcome: test persistent model workers, not only one-shot subprocess calls.
- Modules:
ordeal/supervisor.py,ordeal/faults/io.py,ordeal/integrations/http.py. - Ship when:
- Ordeal can start a child once, send multiple requests, inject failures between requests, and restart on crash
- traces preserve session state clearly enough to debug
6. ML and data seam fault packs¶
- Outcome: make the common ML failure modes one import away.
- Modules:
ordeal/faults/numerical.py,ordeal/faults/io.py,ordeal/integrations/http.py,ordeal/integrations/openapi.py. - Candidate packs:
- shape drift
- dtype drift
- NaN and Inf bursts
- partial batch results
- stale model artifact or version skew
- corrupt weights or feature order drift
7. Auto seam discovery¶
- Outcome:
scanandinitshould detect likely service seams and propose relevant fault packs. - Modules:
ordeal/auto.py,ordeal/cli.py,ordeal/state.py. - Ship when:
- subprocess, HTTP, cache, file, and model-load seams are detected automatically
- generated starter configs include likely faults and starter invariants
P3: Later¶
- Coverage-guided swarm tuning that adapts pairwise budgets from observed dead zones
- Differential behavior coverage across service versions, model versions, or fallback implementations
- Corpus sharing for productive swarm configs and native-boundary traces across CI runs
Non-goals¶
- Ordeal should not try to become a kernel, browser, or heap-exploitation framework.
- It does not need to replace ASAN, UBSan, or low-level native fuzzers.
- Its role is to catch service-level evidence: "this sequence of realistic operations kills or corrupts the worker," then shrink that to a readable regression.
First milestone¶
Ship P1 in this order:
- Swarm observability and CLI reporting
- Native-boundary crash findings
- Behavior coverage reporting
- Pairwise swarm coverage
That sequence gives Ordeal better proof, better debugging value, and better seam coverage before it grows the surface area further.