Grounded Confidence

A method for making AI confidence depend on where the system has historically performed well, measured against expert-labeled evals.

We want our AI Agents to always be right.
Our dream is to get them to 100% accuracy.
AI agent producing only correct answers.
the dream is 100% accuracy

In security, the ambition to get to 100% accuracy is because the stakes are high.

If an AI Agent that can act is 90% accurate (which is pretty good), it means the other 10% of the cases - it is wreaking havoc in your environment.
AI agent taking action while one out of ten answers is wrong.
1 in 10 is wrong, and it's taking action, shutting down big legitimate business flows (while you're on your 4th of July weekend).
“Then don’t let it act.”
Well, if you put too-strict guardrails on it - you’re blocking it from doing the bad with the good. If you won’t let it revoke a token or block a port on the machine - you defanged it, and it won’t defend in time (while it waits for a human to pick up the phone, the red AI agent has gone from initial access to impact and the game is over).
Red AI agent racing from initial access to impact while the human response is too slow.
By the time a human picks up, the AI attackers have already gotten to the org's crown jewels. Your AI defender had its AI arms tied behind its back, waiting for HITL.
So you need your AI Agent to act, but you need it to be right.

There are 2 levers you can pull:

Lever 1: improve the AI Agent. Make it accurate.

The classic way to go about this is to tweak the agent (its prompt, its context, its flow) - but you won’t get to 100% accuracy. But the dream of 100% accuracy is only a dream. This happens for several reasons:

  1. Out of distribution - After you build your AI Agent in your own little world, you’re going to send it out there, into the wild, where it will face new data patterns and repercussions you haven’t predicted or tested for.

  2. Not enough context - any decision determines between several options, and there’s stakes. In important decisions (isolate network and lose 150K USD because you think there’s an attacker on the loose vs. do nothing and potentially suffer tens of millions of damage) - there needs to be enough evidence to determine the action. Sometimes the data doesn’t exist.

    You just can’t get to 100% accuracy in a world where the determining piece of information is out of your sight.
  3. Cost efficiency - when you run AI at scale, running the Thinking-Pro-Ultimate-Magnum-Opus models on everything is usually impossible for your wallet (and anyone’s). You’ll usually run cheaper models that make more mistakes, especially when the job gets cognitively hard. There’s great ways to blend advanced models with cheap (upcoming post on Cascading Models), but they rely on knowing when the cheap model is going to succeed and fail.

  4. Life is hard. Some problems are very hard. AI Agents aren’t great at solving all of them, certainly on certain segmentation of the problem (more on that below).

AI agent moving from a controlled environment into the wild.
In the lab, every answer is right. Ship it to the wild and it meets patterns it never trained on.
Lever 2: Confidence. Have the agent act only when it’s very likely to be right.

This dream is more attainable. It’s the way human beings act when they make high-stakes decisions.
When you realize AI agents are additive to your existing workflows (human work, ML, deterministic if-this-then-that tools) - you can make them participate only when they add value, and not when they create damage. Given you can create that elusive Confidence Component.

The problem is, the common way many people go about AI Agent Confidence is:

Self-reported Confidence. Make the AI Agents act only when they believe they’re right.

That’s a terrible, terrible, terrible, terrible approach.
Some dress it up - asking it for a number on a scale of 1-10. It’s still a terrible approach.

Here’s an example, the internet LOVES this classic AI mistake:

- The full article will be live on July 6 -