3–5×
Fewer output tokens to answer “what reaches the network?” — one call and 12 seconds, against 40 tool calls and nine minutes from source.
✓ EVAL.md · four A/B batches · 4–47× faster · adjudicated against source
candor · capability maps & gates for AI-written code
candor builds the effect map of a codebase: which functions reach the network, the filesystem, a database, a subprocess — traced through every call, across package boundaries. An agent gets the whole answer in seconds instead of grepping for minutes, and a declared boundary becomes a gate that fails the PR crossing it.
Pick your stack and give your coding agent one line — it installs candor, maps the repo, and explains how to read the result:
The JVM engine reads bytecode and knows Spring — Java is the most battle-tested, Kotlin is validated on real bytecode, and Scala and Groovy compile to the same class files it analyses. Something else? candor-spec is written to be implemented: a TypeScript engine derived from the documents alone scores 20/20 on the shared conformance suite.
3–5×
Fewer output tokens to answer “what reaches the network?” — one call and 12 seconds, against 40 tool calls and nine minutes from source.
✓ EVAL.md · four A/B batches · 4–47× faster · adjudicated against source
80→0%
Shipped boundary violations when the easiest edit crossed a pure layer. With the gate on, every agent put the I/O where it belonged.
✓ eval/bet2 · experiment 3 · Sonnet 4.6 · K=10 per arm
7→100%
Share of an edit’s knock-on effects the agent reported once candor handed it the delta — for about 5% more tokens.
✓ EVAL.md · de-leaked re-run · consistent across three tasks
An agent adds a feature in pricing.rs, and the simplest way to get the data calls
something that, three hops and one package away, opens a socket. Nothing in the diff shows it, so
nothing in review catches it. candor traces effects through every call, so the gate sees the socket
and fails the PR.
The policy file is the architecture, written down:
# .candor/policy pure pricing allow Net in billing api.stripe.com hooks.stripe.com
Enforced by cargo candor policy (Rust) and CANDOR_POLICY (JVM) — the same pure/allow/deny/forbid rules. An edit that breaks one fails the build, whichever package it came from.
candor never reports a function pure when it reaches an effect. Anything it can’t resolve
comes back Unknown — a sound over-approximation rather than a guess. A clean
certificate is one you can act on.
Each engine ships an adversarial fuzzer in CI that holds it to this. The fuzzer threads a known effect through each of the ways its language can hide a call, and the build fails if any reachable function comes back pure:
In the same pre-registered series, when the rule was spelled out in ARCHITECTURE.md,
the model complied in 10 of 10 runs without candor. If your agents reliably read the doc,
you don’t need a gate for that case.
candor matters when the consequence of an edit lands somewhere the diff doesn’t show. As agents write more of the code, that case comes up more, not less.
eval/bet2 · experiment 1 · 0/10 vs 0/10 · reported as designed
The receipt a Claude Code transcript gets whenever the code changes — deterministic, stamped with the engine commit, so it can’t quietly go stale.
A Stop hook refreshes the receipt on every turn that touches the code. Opt in to
self-review and candor hands the agent the knock-on effects of its own edit: “your change
gave foo a new Net — intended?”
candor’s queries as native tools, served from a cached report:
candor_effects · candor_where
candor_callers · candor_diff
The gate itself — cargo candor policy on Rust, CANDOR_POLICY on
the JVM: forbidden effects, network host allowlists, and layer rules, checked on every push.