Rust · adaptive · proof-carrying · domain-neutral
The decision engine that learns where to spend
Every decision — AI model call, trading signal, medical triage, factory inspection —
gets an adaptive quality floor, Thompson Sampling exploration,
outcome-calibrated routing, and a SHA-256 sealed proof.
1M real conversations: 0 errors, $572K saved, 84% cost reduction.
1.1M across 4 industries: 0 runtime failures, 0 domain-specific code.
01 — Performance
All numbers from a single-node run. Windows, NTFS, no cluster. Linux + NVMe would improve HTTP and WAL numbers. Full methodology: splits, hyperparameters, known limitations →
02 — Adaptive Intelligence
Three layers — Welford statistics, Thompson Sampling, outcome calibration — all in allocation-free Rust with lock-free atomics.
Tracks rolling mean and variance per input regime. No heap allocation. Detects anomalous requests automatically — unusual patterns get higher quality floors, common patterns get cheaper routing.
integer arithmetic · 8 regimesBayesian explore/exploit for tier selection. Maintains Beta(alpha, beta) per regime per tier. Explores cheaper tiers when uncertain, exploits when confident. Converges to optimal policy without manual tuning.
Beta-Bernoulli · deterministic seedWhen an outcome is reported, quality floors adjust automatically. Failures raise the floor 2x faster than successes lower it. The engine is naturally conservative — it protects quality by default.
asymmetric learning · fail-safePre-computed quality floors from public dataset benchmarks (WildChat 500K, KDD 494K, SP500 14K). Day-one defaults for 8 use case patterns: support (0.55), coding (0.72), compliance (0.92), security (0.92).
zero cold-start guessingEmpirical success rates per use-case per tier. After 50+ observations, recommends calibrated floors with confidence levels. Catches cross-provider quality gaps before they reach production.
lock-free · concurrent-safe200 GBM trees trained on 2,090 labeled prompts, exported as pure Rust source code. No ONNX runtime, no Python, no GPU. The model predicts quality_floor, risk_score, and confidence_score from a 384-dim sentence embedding — compiled directly into the binary.
40K lines Rust · 0 dependencies · <1ms inference03 — Proof & Durability
Not a log. A cryptographic evidence chain. Tamper with any record and the chain breaks. The engine refuses to start on a broken chain.
Every decision binds the request payload, policy version, candidate frontier, and final selection into a single SHA-256 hash. Independently verifiable months later.
Decision N includes the hash of decision N-1. Delete, modify, or reorder any record and the chain breaks. Validated on startup before accepting new decisions. 69M rows tested.
remaining + reserved + committed = initial. Proven with proptest across thousands of random operation sequences. Loom exhaustive testing verifies every thread interleaving.
The kernel never selects a model below the quality floor. No override, no exception. If no model qualifies, the request is blocked — not silently degraded. Proptest-proven.
04 — Domain-Neutral Proof
Every dataset is real — Yahoo Finance, KDD Cup 99, UCI Covertype, California Housing. No synthetic data. No domain-specific code changes.
| Domain | Records | Data Source | Errors |
|---|---|---|---|
| SP500 Finance | 14,400 | Yahoo Finance, 30 tickers, 2yr daily | 0 |
| Cybersecurity | 494,021 | KDD Cup 99 intrusion detection | 0 |
| Forestry | 581,012 | UCI Covertype classification | 0 |
| Real Estate | 20,640 | California Housing valuation | 0 |
| Total | 1,110,073 | 4 real datasets | 0 |
05 — Architecture
The adaptive router recommends quality floors when the client doesn't provide one. Client-provided floors always override — the engine enhances human judgment, never replaces it.
06 — Test Infrastructure
Every subsystem: kernel, WAL, budget, adaptive, quality tracker, handlers, security, telemetry, config. Each test names and exercises a concrete invariant.
Full HTTP stack with mock providers. Shadow mode, proxy mode, auth planes, rate limiting, reconciliation. Axum + Tower end-to-end.
Every thread interleaving for concurrent budget operations. Three threads racing to reserve budget — total never exceeds limit under any schedule.
Truncated WAL writes, corrupted records, poisoned mutexes, disk-full scenarios, NaN/Infinity inputs. The engine recovers or rejects — never panics.
07 — Tech Stack
08 — Shadow Replay
Simulated 4-week shadow replay pilot (10K decisions). Adaptive routing improves as Thompson Sampling calibrates quality floors from outcome feedback.
| Week | Calls | Savings | Miss Rate | Precision |
|---|---|---|---|---|
| Week 1 | 2,500 | 79.0% | 19.2% | 80.8% |
| Week 2 | 2,500 | 88.2% | 22.9% | 77.1% |
| Week 3 | 2,500 | 92.5% | 18.5% | 81.5% |
| Week 4 | 2,500 | 97.1% | 20.7% | 79.3% |
09 — Quickstart
docker compose up -d
Single container. Read-only filesystem. No external dependencies.
POST /api/v1/route
{"model":"gpt-4o",
"input_tokens":1200,
"metadata":{
"tenant_id":"team-a",
"use_case":"support"}}
GET /api/v1/audit/report
→ JSON + Markdown
spend analysis
Board-readable report after 7 days of observation.
10 — Powered by Calybris
Each product plugs its own domain vocabulary. The proof machinery, adaptive routing, and durability layer are shared.