Epistemics at Scale
if brute force isn't working, you aren't using enough of it.
Welcome to all the new subscribers! Apologies in advance this post does not contain any references to maritime law. Management regrets the oversight.
I want to be mindful to not spam people’s inboxes as I continue my foolhardy quest to post every day in November. To that end some of my more workman-like might be web-only. For instance, Automating Five Years of Link Extraction with Claude Code, which walked through a nifty little side project of mine, was web only. When I do this, I’ll make sure to include a link to the post in a housekeeping section.
Continuing my exploration of epistemically sound, truthful AI advisors (1, 2) I had a helpful conversation with one reader who expressed polite skepticism on the potential of using gork trusted AI systems to resolve disputes in collective epistemics. Paraphrasing:
“Given how often collective truthseeking fails to bottom out in agreement now, why would we think that smarter AI assistants are going to lead people towards the truth?”.
The hope spot for me comes from a few places, but most notably because Quantity has a Quality all its own.
This optimism stems from a belief that many of our collective epistemic problems are not impossible, but only intractable with available levels of time and energy and attention; it’s often not intuitive how orders of magnitude more of these resources would make a qualitative difference on our epistemic affordances.
For example, I’ve often attempted structured conversational techniques like double crux:
Double-Crux is a technique for addressing complex disagreements by systematically uncovering the cruxes upon which the disagreement hinges. A crux for an individual is any fact that if they believed differently about it, they would change their conclusion in the overall disagreement. A double-crux is a crux for both parties. Perhaps we disagree on whether swimming in a lake is safe. A crux for each of us is the presence of crocodiles in water: I believe there aren’t, you believe there are. Either of us would change our mind about the safety if we were persuaded about this crux.
Double-Crux differs from typical debates which are usually adversarial (your opinion vs mine), and instead attempt to be a collaborative attempt to uncover the true structure of the disagreement and what would change the disputants minds.
It rarely works. In real world situations, even with two well-meaning, truth-seeking people, it’s extremely difficult to find a true double crux in a reasonable amount of time. After an hour people tend to shrug and remark “at least we understand each other better” (which is it’s own kind of success!)
Or, the participants realize that the disagreement is due to very broad different frames and models of the world, and there is no single or reasonable number of cruxes that would update both people. For instance I think free markets are good. If you asked me what a crux-y sub-belief for this view was, I would demur and point to a thousand different examples of the way the work pretty well, any one of which is illustrative but not load bearing.
We are bounded rational agents, and there’s a limit to how much time even I am willing to spend in small group workshops sketching argument trees.
But! Our bounds are expanding fast. The pareto-frontier of mutual understanding will grow as we develop delegate agents.
Delegate agents are my catch-all for AI agents which act as representatives on behalf of a principal (the user). This is a continuum, starting with the AI purchasing agents every startup and frontier lab in the bay area is building, and the tech will extend to a future state of agents which could represent the users preferences and interests in high-dimensional, multi-turn negotiations. I like this description from a recent paper The Coasean Singularity?:
We envision agents as having the ability to harness computational resources,to communicate with other agents and humans, to receive and send money, and to access and interact with the Internet…. A prototypical example of an AI agent operating autonomously is Deep Research… We take seriously the idea that, rather than hiring a human agent, one could instead rely on an AI agent.
Consumers will want the same core attributes that they seek in human agents: capability sufficient to act on their preferences successfully, knowledge of their preferences, and alignment sufficient to act on their preferences to their benefit. In essence, they will want capable, knowledgeable, and faithful agents
A delegate agent which can understand my preferences and knowledge state well, should be able to:
Search for relevant information.
Conduct (with others models or in simulation) extensive truth seeking conversations like double crux.
Summarize the key takeaways and points.
As an intuition pump, consider the scale of time, attention, and energy put into preparing a supreme course case. Gemini 2.5 estimates ~750 to 2,100 hours spent preparing. So many hours of skilled labor investigating, preparing arguments, and grilling one another. Note: The transcripts of recent courts arguments are quite good (h/t Nuno).
I cannot personally deploy 2,000 hours of Supreme Court-level researchers on an arbitrary topic. But I can envision a not to distant future where delegate agents, riding the tech uplift afforded by all the effort silicon valley is putting into purchasing agents, could function in this role. Using that time (or the compute equivalent) to build massive argument trees, do literature reviews, and investigate and identify cruxes between different agents (or simulations of them).
This addresses the bounded time and attention of people, and I think also starts to tackle the scenario where disagreements stem from broadly different frames/world models/aesthetics. This too might be solvable through scale. Perhaps a thousand different data points on how free markets are good can be cruxed with a thousand different data points on the beauty of government managed economies, with the summaries being presentable in distilled but inspectable summaries. Or, as before, identifying that even with extended-extended-thinking mode the difference is too broad (again, in its own way its own kind of win).
I think the use of Grok for social AI powered fact checking as an element of mini-twitter debates is an enactment of a very limited form of this type of process. But we can do so much more, we can deploy so much more dakka.
One more intuition pump, from Scott Aaronson, criticizing the Chinese Room argument for failing to consider scale:
We’re invited to imagine someone pushing around slips of paper with zero understanding or insight, much like the doofus freshmen who write (a + b)^2 = a^2 + b^2 on their math tests. But how many slips of paper are we talking about! How big would the rule book have to be, and how quickly would you have to consult it, to carry out an intelligent Chinese conversation in anything resembling real time? If each page of the rule book corresponded to one neuron of a native speaker’s brain, then probably we’d be talking about a “rule book” at least the size of the Earth, its pages searchable by a swarm of robots traveling at close to the speed of light. When you put it that way, maybe it’s not so hard to imagine this enormous Chinese-speaking entity that we’ve brought into being might have something we’d be prepared to call understanding or insight.
This is the image I conjure when visualizing how to solve our institutional truth seeking problems - swarms of robots building gigantic rule books creating massive argument trees, all to determine whether a hot dog is or is not a sandwich. a beautiful future.
Appendix:
For those familiar with the AI Safety literature you might be getting a deja vu feeling of Debate, a technique for scalable oversight of potentially misaligned superhuman AIs. While there are clear parallels, this is explicitly aimed at not-superhuman-level systems, and instead human level epistemic processes, which requires weaker safety guarantees.
A term I want to coin is extended interaction modes, where the delegate agent is building up its understanding of your specifications and preferences through structured Q&A. I like Midjourney’s personalization feature as an example of an extended interaction HCI pattern, which lets you pairwise compare images to create custom style profiles you can apply to influence future images.



