“We don’t write prompts anymore. We design loops.” — someone at Anthropic in June 2026
agent loop, self-critique did no better than doing nothing. A deterministic, source-anchored verifier cut the hallucination rate roughly in half.
The line is from a few weeks ago and already feels true. We stopped tuning one perfect prompt and started building systems that try, check their own work, and improve over several steps. A model that can revise is worth more than a model that answers once and stops. On this, the line is right.
What it leaves out is the bill. A loop is far harder to verify than a single call: with one call you check one output, but in a loop every step can drift, and the ways it can go wrong multiply with each iteration. The hard part stops being generation. It becomes verification. Or, if you prefer: knowing whether the loop is getting it right. And the default way to verify — let the model check its own work — turns out to be the weakest link in the chain.
So this is not a quarrel with “design loops, not prompts.” It is the catch it hides, measured: the experiment that convinced me, with the numbers and the method, so you can check it yourself.
The verification surface grows with every step
A single call has one place to be wrong: the answer. A three-step loop has the first draft, the critique of the draft, the revision, the critique of the revision, and the decision to stop. Each of those is a model output, and each can be confidently wrong. You did not remove the verification problem by adding a loop. You multiplied it.
The loop acts on its own verdicts. If the check says “good,” the loop stops and ships. If the check is wrong, the loop ships a mistake — and worse, it may keep polishing that mistake across iterations until it reads convincingly. A loop is only as trustworthy as the thing it verifies against.
The weakest link: a model grading its own work
The most common verifier is the model itself. After drafting, you ask it: “Is this correct?” It is cheap, it needs no extra infrastructure, and it feels like reflection.
The problem is what the model optimizes for. When an LLM grades its own output, it rewards answers that sound right. A confident, fluent, wrong answer sounds right too. So self-critique tends to wave through exactly the failures you most want to catch, and occasionally it talks itself out of a correct answer. There is no external truth in the loop — only the same distribution that produced the error, now asked to detect it.
I wanted to measure it.
A different kind of check: deterministic and source-anchored
The alternative is a verifier that does not ask the model’s opinion at all. We have to consider two relevant properties:
- Source-anchored. The check measures whether an answer is grounded in a real source, not whether it reads well. If the answer drifts away from the source material, the verifier flags it — independent of how confident the prose sounds.
- Deterministic. Same input, same verdict, every time. You can inspect it, log it, and trust it across runs.
A stochastic judge that changes its mind is not a foundation a loop can stand on.
The verifier I used is geometric. It embeds the question, the candidate answer, and the source on a vector hypersphere and reads the angles between them. A grounded answer sits close to its source; a hallucinated one drifts toward the question and away from the source. The Semantic Grounding Index (SGI) is a ratio of two such angles; a companion score (DGI) is a distributional grounding measure calibrated on held-out grounded pairs. Both are pure geometry over a fixed encoder, so they are deterministic by construction. The implementation is open source (Groundlens); the point of this article is not the math but what happens when you put such a check inside a loop.
First, does the geometry even discriminate hallucinations? On the HaluEval QA benchmark, scoring grounded against hallucinated answers:
| Verifier signal | AUROC | 95% CI |
|---|---|---|
| SGI | 0.769 | [0.715, 0.821] |
| DGI | 0.939 | [0.911, 0.964] |
| SGI + DGI | 0.949 | [0.926, 0.971] |
Table 1: Detection on n = 300 answer pairs; bootstrap confidence intervals.
The combined signal separates grounded from hallucinated answers cleanly. That is the precondition. Now the question is whether a check this accurate, placed inside a loop, actually makes the loop’s final answers better than self-critique does.
The experiment
The design isolates one variable: what the loop verifies against (Figure 1).
A generator answers factual questions closed-book — from its own memory, with no source in front of it — so it hallucinates often and a verifier has something to fix. Each question runs through four arms, and a cross-model referee grades every final answer, so no model judges itself in the scoring:
- Open-book reference — the generator is simply handed the source. No check. This is the ceiling.
- Single (closed-book) — one answer, no check. This is the floor.
- Self-critique — closed-book; the model judges its own answer and revises until it is satisfied (up to three iterations).
- Source-anchored — closed-book; the geometric verifier scores the answer, and on a flag it injects the source and asks for a grounded rewrite (up to three iterations).
Setup, for reproduction: generator Claude Opus 4.8; referee GPT-5.5 (cross-model grading); benchmark HaluEval QA; encoder all-MiniLM-L6-v2; temperature=0 (if available); seed=0; loop thresholds calibrated on the model’s own closed-book training drafts; items through the loops.
One asymmetry is deliberate. And it is the whole point: the source-anchored arm has access to a source of truth through its verifier, and the self-critique arm does not.
The hypothesis under test is not “geometry beats self-critique with the same information.” It is “a source-anchored verifier turns a hallucinating closed-book generator into a grounded one, while self-critique on its own cannot.” The open-book and single arms bound what is possible at the top and bottom.
Results
| Arm | Sees the source? | Hallucination rate | 95% CI (Wilson) | Mean iterations |
|---|---|---|---|---|
| Open-book reference (ceiling) | yes | 5.8% | [2.9%, 11.6%] | 1.00 |
| Single, closed-book (floor) | no | 40.0% | [31.7%, 48.9%] | 1.00 |
| Self-critique (Claude → Claude) | no | 43.3% | [34.8%, 52.3%] | 1.62 |
| Source-anchored verifier (SGI/DGI) | via the check | 19.2% | [13.1%, 27.1%] | 1.59 |
Two readings, and the confidence intervals decide both.
Self-critique did not help. At 43.3% it is, if anything, slightly worse than the 40.0% floor, and its interval [34.8%, 52.3%] overlaps the floor’s [31.7%, 48.9%] almost entirely. The extra iterations bought nothing. A model checking itself spent more compute to land where it started — and the small upward drift is consistent with self-critique occasionally overturning correct answers, exactly the failure mode you would predict when there is no external truth in the loop.
Source-anchored verification roughly halved the error rate. It moved the floor from 40.0% to 19.2%, a 52% relative reduction, in about the same number of iterations the self-critique loop used. This is not within noise: the anchored interval tops out at 27.1%, below where the floor’s interval begins at 31.7%. The two do not overlap. The improvement is real signal, not a lucky run.
The shape of the result is the story. Same generator, same loop budget, same closed-book handicap. The only thing that changed was what the loop trusted — its own judgment, or a deterministic measurement against the source. One of those moved the needle by half. The other did not move it at all.
You can’t lie to a loop
The intuition is simple. An agent learns from its feedback, so you cannot lie to it. A check that rewards a confident wrong answer is doing exactly that — feeding the loop a reward signal that is correlated with fluency rather than truth. The loop dutifully optimizes the signal it is given and polishes the prose. A source-anchored check gives the loop a reward correlated with grounding instead, and the loop optimizes that.
We do not say that geometry knows the truth. The verifier measures whether an answer is engaged with its source, not whether the source is right and not whether the answer is true in some absolute sense. On a benchmark built to test truthfulness rather than grounding, the same signal is near chance. Grounding and truth are different targets, and this method only addresses the first. The win is a modest one: a source-anchored verifier is a better foundation for a loop than self-critique, not an oracle.
Limitations
I would not publish a result about verification without stating where it stops.
- The asymmetry is real and intended. The anchored arm can reach the source; the self-critique arm cannot. The finding is about giving a loop an external, deterministic anchor, not about geometry outperforming self-critique on equal information.
- Grounding is not truth. SGI measures source-engagement. On a truthfulness benchmark the same signal is roughly chance (AUROC ≈ 0.48). If your failure mode is a wrong source rather than an ungrounded answer, this does not help.
- One generator, one benchmark, one encoder. The strong result is Claude Opus 4.8 on HaluEval QA with a single sentence-embedding model. I have not shown it holds across generators and domains; an early run with a different generator and configuration did not show the same gain, which is exactly why cross-generator replication is the next step rather than a footnote.
- Closed-book is a headroom setting. Forcing the model to answer from memory inflates the base error rate so a verifier has room to work. In a normal RAG pipeline where the source is already in context, the absolute numbers will be smaller — though that is also the regime where a grounding check is cheapest to add.
- Single-seed point estimates. Intervals are Wilson; averaging across seeds would tighten them further.
What to take away
“Design loops, not prompts” is right. But a loop is only as safe as the thing it verifies against, and the convenient default — the model’s own judgment — is the part most likely to fail. In this experiment self-verification did not beat doing nothing, while a deterministic, source-anchored check cut the error rate in half on the same budget.
If you are building agent loops, the practical move is to point the loop’s verifier at something outside the model’s opinion: a deterministic, inspectable check against a real source. You get loops that are both more effective and more trustworthy, and you get a verdict you can log and reproduce instead of a vibe.
The verifier used here is open source, and the full notebook reproduces every number above (generator, referee keys, and a single PROVIDER switch): github.com/groundlens-dev/groundlens. Disagreement is welcome — it is deterministic, so you can check it yourself.
References
- Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A., Song, X., & Zhou, D. (2024, May). Large language models cannot self-correct reasoning yet. In International conference on learning representations (Vol. 2024, pp. 32808-32824).
- Kamoi, R., Zhang, Y., Zhang, N., Han, J., & Zhang, R. (2024). When can llms actually correct their own mistakes? a critical survey of self-correction of llms. Transactions of the Association for Computational Linguistics, 12, 1417-1440.
- Marín, J. (2025). Semantic grounding index: Geometric bounds on context engagement in RAG systems. arXiv preprint arXiv:2512.13771.
- Chen, K. Y., Su, F. Y., & Chiang, J. H. (2026). The Self-Correction Illusion: LLMs Correct Others but Not Themselves. arXiv preprint arXiv:2606.05976.
- Marín, J. (2026). A Geometric Taxonomy of Hallucinations in LLMs. arXiv preprint arXiv:2602.13224.





