Grounded Confidence
A method for making AI confidence depend on where the system has historically performed well, measured against expert-labeled evals.
Our dream is to get them to 100% accuracy.
In security, the ambition to get to 100% accuracy is because the stakes are high.
Well, if you put too-strict guardrails on it - you’re blocking it from doing the bad with the good. If you won’t let it revoke a token or block a port on the machine - you defanged it, and it won’t defend in time (while it waits for a human to pick up the phone, the red AI agent has gone from initial access to impact and the game is over).
There are 2 levers you can pull:
The classic way to go about this is to tweak the agent (its prompt, its context, its flow) - but you won’t get to 100% accuracy. But the dream of 100% accuracy is only a dream. This happens for several reasons:
-
Out of distribution - After you build your AI Agent in your own little world, you’re going to send it out there, into the wild, where it will face new data patterns and repercussions you haven’t predicted or tested for.
-
Not enough context - any decision determines between several options, and there’s stakes. In important decisions (isolate network and lose 150K USD because you think there’s an attacker on the loose vs. do nothing and potentially suffer tens of millions of damage) - there needs to be enough evidence to determine the action. Sometimes the data doesn’t exist.
You just can’t get to 100% accuracy in a world where the determining piece of information is out of your sight. -
Cost efficiency - when you run AI at scale, running the Thinking-Pro-Ultimate-Magnum-Opus models on everything is usually impossible for your wallet (and anyone’s). You’ll usually run cheaper models that make more mistakes, especially when the job gets cognitively hard. There’s great ways to blend advanced models with cheap (upcoming post on Cascading Models), but they rely on knowing when the cheap model is going to succeed and fail.
-
Life is hard. Some problems are very hard. AI Agents aren’t great at solving all of them, certainly on certain segmentation of the problem (more on that below).
This dream is more attainable. It’s the way human beings act when they make high-stakes decisions.
When you realize AI agents are additive to your existing workflows (human work, ML, deterministic if-this-then-that tools) - you can make them participate only when they add value, and not when they create damage. Given you can create that elusive Confidence Component.
The problem is, the common way many people go about AI Agent Confidence is:
That’s a terrible, terrible, terrible, terrible approach.
Some dress it up - asking it for a number on a scale of 1-10. It’s still a terrible approach.
Here’s an example, the internet LOVES this classic AI mistake: