candor · capability maps & gates for AI-written code

From Polymorphism — we build and run large systems for private enterprise and government.

Stop your agent tracing call graphs by hand.
One call, not dozens.

candor maps which functions reach the network, the filesystem, a database or a subprocess — traced through every call. Declare a boundary and the map becomes a gate: the edit that crosses it fails the build.1

Java Rust Kotlin Scala Groovy TypeScript Node.js Swift Claude Code MCP CLI agents

Install and map your repo by pasting one line

Paste this to your coding agent. It installs candor, maps your repo, and tells you what every function reaches — network, filesystem, database, a subprocess — traced through every call, including what a diff wouldn’t show:

Read https://github.com/tombaldwin/candor/blob/main/AGENTS.md and follow it to map this repo's effects.

It writes the map to .candor/report.json and shows you the headline — you’ll see something like:

candor — 312 functions reach effects, across 84 classes (pure functions omitted)
  Net 14 · Fs 9 · Db 47 · Clock 6   ·   Unknown 11 (disclosed)
  full map in .candor/report.json — then ask “who reaches the DB?”

Rather use the terminal? One command installs the candor CLI — it fetches and manages every language engine for you (the JVM engine as a native binary, no JVM needed):

brew install tombaldwin/tap/candor

candor update   # fetch the engines
candor scan .    # analyse this repo
candor tour      # the most surprising reaches
candor init      # turn the map into a gate

candor init proposes an architecture policy from what your code already does (every rule it writes passes today) and records a regression baseline — so the next edit that crosses a boundary, or a dependency that gains a capability, fails the build. Everything runs on your machine; the analysis makes no network calls. macOS & Linux; one CLI across the JVM, Rust, TypeScript and Swift.

Latest engine releases:

Full changelogs & release history →

Already set up? Upgrade or check your version

Read https://github.com/tombaldwin/candor/blob/main/AGENTS.md and upgrade candor in this repo to the latest release.

Read https://github.com/tombaldwin/candor/blob/main/AGENTS.md and tell me which candor version this repo is on and whether it's the latest.

Going further — two ways in

You don’t need any of this to start; the one line above already works. These are the two dedicated setups for when you want the full integration: into your coding-agent loop, or into CI as an architecture gate.

Running coding agents. Stop paying agents to re-derive your call graph. candor hands the agent the complete blast radius in one query, so a cheap model gets it right and gets it whole.

For coding agents →

Shipping code. Your architecture, enforced on every push: the domain does no I/O, the web layer never touches the DB, proven in CI. Deepest on the JVM (it knows Spring), with the same gate on Rust, TypeScript and Swift.

For shipping code →

Watching dependencies. A bump that turns trusted code into a phone-home is the supply-chain event nothing cheap catches. candor knows what every function reaches, so it flags the release that gains a capability — function by function, across the JVM and npm.

Diff a dependency →

“A cheap model gets it right and gets it whole” — the measured evidence, across four model tiers →

01 Measured, not promised2

40×

Fewer tool calls to map an edit’s blast radius on a real 2,272-class app: one candor query (~19 s) against ~42 tool calls and ~9 minutes tracing the call graph by hand (~3× fewer tokens too), on the flagship JVM engine.

✓ candor-java · in-production app, 2,272 classes · N=6/arm · matched an 8-min hand-trace exactly

100→0%

Shipped boundary violations when the easiest edit crossed a pure layer. With the gate on, every agent put the I/O where it belonged, measured on the flagship JVM engine.

✓ candor-java · eval/gate · Sonnet 4.6 · K=10 per arm · p=1.1e-5

11→100%

Share of an edit’s knock-on effects the agent reported once candor handed it the delta. Measured on the flagship JVM engine, across service, controller and report layers.

✓ candor-java · eval/scaled · pre-registered · 3 tasks · N=3 per arm

And the answer doesn’t depend on the model: one query comes back complete at every tier, while an unaided Sonnet-class agent drops callers (3 of 8 runs). The cheaper the model, the more the map is worth.3

02 The edit that looks fine and isn’t

An agent adds a feature to the domain layer, a method on PricingService, and the simplest way to get the data calls something that, three hops and one package away, opens a socket. Nothing in the diff shows it, so nothing in review catches it. candor traces effects through every call, so the gate sees the socket and fails the build.

The policy file is the architecture4, written down:

# .candor/policy
pure pricing
allow Net in billing   api.stripe.com  hooks.stripe.com

Net
Fs
Db
Llm
Exec
Env
Clock
Ipc
Log
Rand
Clipboard

03 A gate that fails safe

candor-java is built to fail safe: anything it can’t resolve comes back Unknown rather than pure, a deliberate over-approximation. The target is that an unresolved call never reads as clean. But we don’t claim more than that: purity is undecidable in general, so this is not a completeness proof. It’s a gate that catches far more than review and tells you where it couldn’t see. The syntactic gaps we keep finding and closing are tracked in the open, not papered over. (The other engines hold the same discipline; the quick-install Rust scanner deliberately under-reports rather than fabricates: what it shows is real, and its docs say what it omits.)

The report itself carries the caveat, not just the tool’s output: a coverage field names every dependency the classifier doesn’t cover, and every verdict computed from that report re-discloses it — a gate pass over partially-covered code says so on the verdict, and the Swift engine’s privacy audit marks its answer conditional rather than clean. No answer pretends to be total.

Each engine ships an adversarial fuzzer in CI that holds it to this. The fuzzer threads a known effect through the call-hiding constructs below, and the build fails if any reachable function comes back pure:

operator overloads
?
.await
dyn dispatch
closures
RAII drops
macros
lambdas
method refs
static initialisers
interface dispatch
scheduled tasks
suspend functions

full disclosure

We publish our nulls

In the same locked-in-advance series, when the rule was spelled out in ARCHITECTURE.md, the model complied in 10 of 10 runs without candor. If your agents reliably read the doc, you don’t need a gate for that case.

candor matters when the consequence of an edit lands somewhere the diff doesn’t show. As agents write more of the code, that happens more often.

eval/bet2 · experiment 1 · 0/10 vs 0/10 · reported as designed

04 In your agent’s hands

candor · 143 fns · 54 Db, 16 Net, 27 Fs · 0 unresolved · fresh @8c4c9053 · coverage ✓

The receipt a Claude Code transcript gets whenever the code changes — deterministic, stamped with the engine commit, so it can’t quietly go stale.

Claude Code

A Stop hook refreshes the receipt on every turn that touches the code. Opt in to self-review and candor hands the agent the knock-on effects of its own edit: “your change gave foo a new Net — intended?”

MCP server

candor’s queries as native tools, served from a cached report:

candor_impact · candor_where
candor_callers · candor_diff

Editors

candor-lsp shows effects and blast radius as CodeLens in any LSP editor; the JetBrains plugin is in review on the Marketplace.

CI

The gate itself — a checked-in .candor/config names the policy, and every engine reads the same file: forbidden effects, network host allowlists, and layer rules, checked on every push.

05 Not only for agents

Boundaries erode the same way when humans write the code; the agent just does it faster. Everything above works without an AI anywhere near the repo.

Architecture that stays enforced

Layer rules (the domain does no I/O, infra never calls up the stack) live in the policy file and run in CI, instead of in a document that drifts. The PR that erodes the boundary fails, whoever wrote it.

Unfamiliar code, mapped

“What does this function actually touch?” and “who reaches the network?” answered instantly from the cached report — blast radius before a risky change, instead of an afternoon of grepping.

Reviewable security boundaries

Billing talks to Stripe and nothing else — as an enforced allowlist, not a hope. A dependency upgrade that quietly starts shelling out or phoning home shows up as an effect change you can see in review.

06 FAQ

How is candor different from Semgrep or CodeQL?

Those find code that matches a pattern you write. candor computes, for every function, what it actually reaches — network, filesystem, database, a subprocess — traced through the whole call graph, including framework code with no source to read (a Spring Data repository resolves to a database access). You declare a boundary once and candor turns it into a build gate, so the edit that crosses it fails CI.

Which languages does it support?

The JVM (Java, Kotlin, Scala, Groovy), Rust, JavaScript/TypeScript and Swift today. The effect spec is language-neutral and every engine is derived from it independently, checked by a shared conformance suite, so a new language is additive rather than a rewrite.

Won’t it be noisy with false positives?

candor discloses rather than guesses. When it can’t resolve a call — reflection, dynamic dispatch — it marks the result Unknown instead of inventing an effect, and the quick Rust scanner deliberately under-reports rather than fabricate. What it reports is real; where it can’t see, it says so.

Purity is undecidable — what does candor actually guarantee?

Not a completeness proof; that’s impossible in general. candor is a gate that catches far more than review and tells you where it couldn’t see, as an explicit Unknown, instead of guessing. The gaps it finds are tracked in the open. The claim is disclosure, not certainty.

How does it work with a coding agent?

The agent runs one query to get the complete blast radius of a change instead of re-deriving it. With Claude Code there’s an edit-time hook that surfaces that fallout as the agent works; it also speaks MCP, and any CLI agent can call candor-query directly.

Is it open source?

Yes: MIT or Apache-2.0, and every engine is on GitHub. Install it today via jbang (JVM), npm (TypeScript/JavaScript), crates.io (Rust) or GitHub releases (Swift) — plus the MCP, LSP and IDE surfaces.

Does my code leave my machine?

No. It runs locally and in your CI, with no model inside it and nothing sent anywhere. The analysis is a deterministic classifier and propagation over your own code. Your code stays on your machine.

Notes

candor-java is the reference implementation — it reads JVM bytecode and knows Spring. Java is the most battle-tested; Kotlin is validated on real bytecode (lambdas, suspend functions, coroutine dispatch); Scala and Groovy run on real bytecode too, via scala-library self-analysis and Groovy’s metaclass dispatch. The same spec runs as full engines beyond the JVM — Rust (cargo install candor-scan), JavaScript & TypeScript (npx -y candor-ts, with --allow-js for plain JS/Node), and Swift — and a shared conformance suite in CI checks they all infer the same effects on the same code. That’s the point: one mental model and one policy file work across your whole stack. The JVM engine leads; the others are first-class, not afterthoughts. ↩
All three panels are pre-registered and locked in advance: design, sample size and metric fixed before any trial ran. Measured on the flagship JVM engine (compiled bytecode); the completeness and boundary effects replicate on Rust too. Raw runs and pre-registrations: candor-java · candor-rust. ↩
You don’t have to take the numbers from us. Each engine ships a PROVE-IT self-experiment that your own agent runs on your own repo: it traces by hand first, verifies every miss at a file:line, and reports the negative if candor doesn’t help on your codebase. ↩
On the JVM, the pure/allow/deny/forbid grammar runs from the checked-in .candor/config alongside the analysis (jbang candor@tombaldwin/candor-java). In CI, an edit that breaks a rule fails the build, whichever package it came from. (The other engines accept the same policy file: e.g. candor-scan --policy on Rust.) ↩

Work with the team behind candor

candor is built and run by Polymorphism — a small senior team that designs, builds and runs large systems for enterprise and government. The judgement we put into AI-code safety here is what we’d bring to your platform.

How we work Talk to the team