The Levels of Trust
object, process, and system
Continuing my series of posts on trustworthy AI advisors, I want to examine three different levels of trust - Object, Process, and System - and describe why the System level requires different guarantees.
Object
When we’re trusting an AI system to give us good information, there are certain foundational elements we expect it to have.
It’s factual: It gives you true pieces of information about the world.
It’s reliable: It has a track record of producing factual answers.
It’s comprehensive: It will provide all the relevant facts to give the whole truth; avoids key omissions.
This is the “object level” of trust. We are evaluating the individual output, or series of outputs, from the system.
It’s like if you’re asking a chatbot questions about a trip you’re planning, and it tells you that Paris is the capital of France (factual), every time you ask it says Paris is the capital of France (reliable), and it’s not eliding important details like whether there was a recent civil war that made Paris only nominally the capital of France (comprehensive).
Process
In the next level up, we start to evaluate not just the individual outputs, but the process that generated them:
It’s verifiable: The answers it provides have a rationale or sourcing such that the user or a third party can determine the provenance of the answer.
It’s fair: If a question is complex and there are multiple sides, the AI system presents the fullness of competing views, in a way that proponents of appointing sides would agree accurately represents their beliefs.
It’s neutral: The AI system (or the process by which it’s created) is providing answers that describe the output, but does not take sides. While it might have values and preferences, those do not influence the deliberative and evaluative process.
Alternatively, it’s transparent in its advocacy: Rather than being a neutral third party, it’s explicitly but transparently being employed to generate a specific output. In the same way that I trust that a lawyer is making the best case possible for their client, but not in a neutral to both sides manner.
The “process level” of trust examines the internals of the system, where you gain confidence in the mechanism that constructed the output.
For instance, as you continue to plan your trip, you ask it for advice on where you should travel in France. It gives you a full travel itinerary, discussing the different pros and cons of popping into the Louvre instead of the Petit Palais (fair + neutral), and provides a rationale for its reason to avoid both and just eat a bunch of croissants (verifiable because it knows you actually hate Art).
The object and process level are relatively straightforward - I don’t think I’ve said anything that you wouldn’t see on some poster in a middle school class on media literacy. There are many existing or in-development epistemic evaluations to assess AI systems at these levels.
System
But, as we move further, into the more rarified air of high-trust systems, the dynamics become more complex. We need to model the AI system and its interactions within a larger world, where other actors and groups might influence or manipulate the outputs.
I think of this as the “system level” of trust. It doesn’t have clear cut attributes; rather, it has scenarios:
Imagine we’ve been using our AI system to get good, truthful answers. It is giving us all the information we need to get on the flight to Paris and carbo-load for days.
But then, all of a sudden, it swerves; it starts to give us actively harmful information, like telling us to rebook our trip to Des Moines and put all our money into Des Moines merchandise, and huh coincidentally that’s where the company that runs the AI system is based.
It’s akin to a Pig Butchering scam where trust is built over time and then, when the mark has come to deeply trust the conman, the trust is burnt for a payday.
Or perhaps 99% of the time the system gives us the true answer, but 1% of the time, in an important moment, the system provides a biased answer. These moments don’t come up that often and so it’s hard to detect, or the misalignment is subtle. It’s like a scheming vizier that manipulates important decisions for its own or its creators’ benefits.
And, even if the AI system is entirely faithful and useful as an advisor, and resistant to external compromises, you might be concerned that you can’t fully trust it or your evaluative faculties or skills will degrade. In essence that you’ll end up a figurehead, which lowers your value. The risk of disintermediation might prevent you from fully trusting the system.
At the system level it’s not enough to evaluate trustworthiness case by case. We’re engaging in more sophisticated system thinking, understanding how the AI system behaves under different incentives, governance, and adversarial pressure.
More examples:
If everyone starts to use the AI system, and it becomes an important source of voting advice, governments would be incentivized to apply political pressure to censor the outputs.
If a company creates an AI system that supports advertising, groups within the company would be incentivized to shape the outputs to benefit and flatter sponsors.
We make these kinds of system level assessments of trust all the time, through a messy social process of observation and gossip and default skepticism of powerful actors.
But, because of how hard it is to attain the highest levels of trust, the type you’d want in an advisor that you could turn to for your most important decisions, we’ll need significantly better ways of generating trust. We’ll require things like:
Frontier developers creating structured arguments for why a trusted advisor system in a given environment can be trusted (aka safety cases).
Public institutions understanding and verifying these arguments.
A robust tech stack (ex. structured transparency algorithms, automated audits) to make this all possible at scale.
I am actually quite optimistic that this is possible. We can use the same AI systems we’re employing to speed up coding to also speed up creating these prerequisites, and perhaps ascend to levels of trust that have heretofore been impossible.



