Rust · adaptive · proof-carrying · domain-neutral
The decision engine that learns where to spend
Adaptive routing with Thompson Sampling. Outcome-calibrated quality floors.
SHA-256 sealed proof chain. 22 models across 6 providers.
1M real conversations: 0 errors, $572K saved (84%).
1.1M across 4 industries: 0 runtime failures.
316 tests. Integer kernel. Zero unsafe code.
01 — Integer Kernel
No floating-point in the hot path. The kernel uses bounded integer arithmetic throughout —
every value is an i64 or
u64 denominated in microcents.
No heap allocation per decision. No GC pause. No NaN propagation.
02 — Adaptive Intelligence
Three layers — Welford statistics, Thompson Sampling, outcome calibration — all in allocation-free Rust with lock-free atomics. No manual quality floor tuning needed.
Tracks rolling mean and variance per input regime. Detects anomalous requests automatically — unusual patterns get higher quality floors.
Bayesian explore/exploit for tier selection. Beta(alpha, beta) per regime per tier. Converges to optimal policy without manual tuning.
Failures raise the quality floor 2x faster than successes lower it. The engine is naturally conservative — it protects quality by default.
Pre-computed from public benchmarks (WildChat 500K, KDD 494K, SP500 14K). Day-one defaults: support 0.55, coding 0.72, compliance 0.92, security 0.92.
causal_exploration_bps=0 and use shadow mode instead.
03 — Decision Proof Chain
Each decision gets a SHA-256 fingerprint that includes the previous decision's hash, forming an immutable chain. Tampering with any decision invalidates every subsequent entry.
Every decision is durably written before acknowledgment. The WAL uses hash chaining — each entry includes the previous entry's SHA-256 digest.
Crash-tail truncation detects and discards incomplete entries on restart. Deterministic replay re-derives state from the surviving chain.
Decisions within a 200µs window are batched into a single fsync, amortizing I/O cost without sacrificing durability guarantees.
Optional ed25519 signatures on periodic checkpoints. KMS integration available. The checkpoint proves the chain was intact at a known point.
04 — Budget Enforcement
Balances are denominated in microcents and stored as
i64 atomics.
Thread-safe reservation uses a CAS loop — no mutex, no lock contention.
The budget journal is hash-chained and durably synced.
AtomicI64 — no mutex, no lock convoy05 — Causal Regret Measurement
The engine measures whether past decisions were optimal by comparing actual outcomes against counterfactual alternatives. This is causal inference, not correlation — and it uses importance weighting to correct for the logging policy's selection bias.
Outcomes are bound to their originating decision and execution fingerprints. No outcome can be attributed without matching the proof chain.
Weight = 1 / logging_propensity, capped at 20x to prevent variance explosion from rare selections. Standard doubly-robust correction.
20% of traffic (by default) is held out from policy intervention. The holdout provides an unbiased baseline for regret comparison.
The engine cross-checks its reward model predictions against observed outcomes. Systematic bias triggers a calibration warning, not a silent drift.
06 — Staged Policy Rollout
New policies are never deployed directly. A candidate policy snapshot is staged and runs in shadow mode alongside production, evaluating every decision in parallel without affecting live traffic.
07 — Governance Hard Limits
These constraints are evaluated before every decision. They cannot be overridden by client metadata, configuration, or policy rules. If any gate fails, the decision is blocked.
08 — Domain-Neutral Proof
Same binary. Zero domain-specific code. Every dataset is real — Yahoo Finance, KDD Cup 99, UCI Covertype, California Housing.
09 — Test Infrastructure
The test suite is designed to break the engine, not confirm that it works. Fault injection, property-based testing, exhaustive concurrency analysis, and adversarial inputs.
Truncation mid-write, bit corruption in committed entries, partial writes with power-loss simulation, and 8-thread concurrent contention on a single WAL file.
Random operation sequences — reserve, commit, release, fail — with NaN and Inf poison values injected at random positions. Balance invariant checked after every sequence.
3-way reservation race tested under ALL interleavings using loom. Not sampling — every possible thread schedule is explored to prove the absence of data races.
u32::MAX tokens, empty messages, malformed JSON, and rate limiter 429 responses. The handler must never panic, never produce NaN, and never leak budget.
10 — Products Built on Calybris
Each product plugs its own domain vocabulary into the Calybris policy gate. The proof machinery, WAL, budget control, and regret measurement are shared infrastructure.