What an examiner actually asks
And why a model confidence score can't answer any of it. A field guide to the questions that decide an AI examination — and the evidence each one requires.
When an AI system comes under examination — by a model-risk function, an internal auditor, a regulator, or, in the worst case, a litigant — the questions are remarkably consistent across industries. They are not questions about accuracy. They are questions about governance: what was allowed, why, by what authority, and how anyone can know that account is true.
A probabilistic guardrail is built to answer a different question — "is this output likely to be a problem?" — and that mismatch is why so many AI controls collapse the moment they meet an examiner. Below are the questions that actually get asked, and for each, the concrete evidence artifact a deterministic governance layer produces in response.
"Show me the decision for this specific case."
Examinations are not statistical. An examiner picks one transaction — this loan, this claim, this denied request — and asks what happened. A model can tell you its score for that case; it cannot tell you that a policy was enforced. The deterministic answer is a per-decision record that names the action proposed, the policy set evaluated, the rule that fired, and the disposition (allowed, blocked, or modified). One case, one record, fully reconstructible. The deterministic decision is the unit of evidence, not an aggregate.
"Would you make the same decision again?"
Behind this is a test of whether the control is a control at all. If re-running the same case could produce a different verdict, then the "policy" is really a sample, and the examiner has just learned that your governance is non-reproducible. A deterministic system answers yes without hesitation: same inputs, same verdict, because the decision is computed by rules, not drawn from a distribution. This is the property the reproducibility test is built around, and it is the one a confidence score can never satisfy.
"Who or what authorized this action?"
Regulated controls require a chain of authority, not a vibe. The examiner wants to know that the action was permitted by a named policy under a defined authority — and that the authority itself was valid at the moment of the decision. A deterministic layer enforces this before execution: an action that lacks authorization is never dispatched, and the record shows the policy and authority that gated it. A post-hoc score has no concept of authorization; it observes outputs, it does not grant permission.
The deepest version of this question is "prove the control couldn't have been skipped." That is only answerable when enforcement sits in front of the action in the request path — a brake, not a warning light. A filter that runs after generation can always, in principle, have been bypassed; a pre-execution gate cannot.
"Can I verify this without taking your word for it?"
This is the question that turns a good demo into a defensible system. An examiner does not want to log into your dashboard and read your numbers; they want to take the evidence away and confirm it independently. Cryptographically signed decision evidence answers exactly this: the record carries a signature that anyone can check against a public key, so a third party can confirm both that the record is authentic and that it hasn't changed since it was issued — with no access to your systems and no trust in your account. That is what independently verifiable evidence means, and it is the difference between a log and proof.
"Show me every case where the control fired — and every case it didn't."
Examiners reason about populations as well as cases. They want the full record: every decision the governance layer touched, the verdicts, and crucially, an unbroken trail with no gaps where something could have slipped through unrecorded. A hash-chained or otherwise tamper-evident decision ledger answers this — a missing entry is detectable, not invisible. A sampling-based monitor, by construction, only sees the cases it happened to score.
Why the mapping matters more than the model
The throughline is that every examiner question is really asking for an artifact — a record, a reproduction, an authorization, a signature, a complete trail — and a probability is none of those things. This is also why "our model is very accurate" is not a governance answer: accuracy is about being right, and examination is about being accountable. The two are independent. You can be highly accurate and completely undefendable.
Deterministic governance is, at bottom, the discipline of producing the artifact at the moment of the decision rather than reconstructing a story afterward. Our regulatory mappings take this one level further, connecting specific obligations under the EU AI Act, SR 11-7, HIPAA, ECOA, and the NIST AI RMF to the concrete capability that satisfies each — so the answer to "show me the control" is a pointer, not an essay. If you want to pressure-test these questions against your own systems, a governance assessment is the fastest way to find where the artifacts are missing today.
Book a Governance Assessment
A working session mapping your highest-risk AI workflows to deterministic controls and the evidence your examiners will ask for.