Close Menu
    Trending
    • Identity-first AI governance: Securing the agentic workforce
    • The foundation for a governed agent workforce: DataRobot and NVIDIA RTX PRO 4500
    • Hallucinations in LLMs Are Not a Bug in the Data
    • Follow the AI Footpaths | Towards Data Science
    • How to Build a Production-Ready Claude Code Skill
    • Where OpenAI’s technology could show up in Iran
    • Nurturing agentic AI beyond the toddler stage
    • Bayesian Thinking for People Who Hated Statistics
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Hallucinations in LLMs Are Not a Bug in the Data
    Artificial Intelligence

    Hallucinations in LLMs Are Not a Bug in the Data

    ProfitlyAIBy ProfitlyAIMarch 16, 2026No Comments11 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    will not be a knowledge high quality drawback. It’s not a coaching drawback. It’s not an issue you’ll be able to resolve with extra RLHF, higher filtering, or a bigger context window. It’s a structural property of what these methods are optimized to do.

    I’ve held this place for months, and the response is predictable: researchers engaged on retrieval augmentation, fine-tuning pipelines, and alignment methods would like a extra optimistic framing. I perceive why.

    What has been lacking from this argument is geometry. Instinct about aims and structure is critical however not ample. We have to open the mannequin and have a look at what is definitely occurring inside when a system produces a assured fallacious reply. Not on the logits. Not on the consideration patterns. On the inner trajectory of the illustration itself, layer by layer, from enter to output. That’s what the work I’m presenting right here did.

    What the Residual Stream Is aware of Earlier than the Mannequin Lies

    The setup could be very easy. We take a factual immediate — the sort the place a transformer ought to retrieve a saved affiliation — and we run it in two circumstances: one the place the mannequin produces the proper reply, one the place it produces a assured fallacious reply (hallucination). Then, we monitor the trajectory of the residual stream — the interior illustration vector — layer by layer by means of the community. The query is: do these two trajectories diverge as a result of the mannequin merely lacks the related affiliation? Or is one thing extra particular occurring?

    To know what which means, consider the mannequin’s inner state at every layer as a degree in house — a high-dimensional house. Because the mannequin processes a immediate, that time strikes. It traces a path. What the experiment measures is whether or not the trail taken throughout an accurate reply and the trail taken throughout a hallucination diverge as a result of one path is shorter — the mannequin operating out of data — or as a result of they go in several instructions whereas protecting the identical distance.

    The reply is the second. The paths are the identical size. They level to completely different locations. That’s what the Determine 1 exhibits: two trajectories leaving the identical origin, touring the identical distance, arriving at completely different ends of the house. One towards the proper reply. One away from it.

    Determine 1. When a LLM hallucinates, the interior illustration doesn’t go clean. It rotates. Each paths — appropriate and incorrect — journey the identical distance by means of the mannequin’s illustration house. What separates them is path, not magnitude. The geometry is telling you one thing the output logits can’t: the mannequin knew the place the proper reply was. It went some place else. Picture by writer

    The Dedication Ratio: The place Suppression Turns into Seen

    The paper introduces a metric referred to as the dedication ratio κ — primarily, how a lot of the mannequin’s chance mass is being actively directed towards or away from the proper token at every layer.

    In appropriate processing κ rises monotonically by means of the community (Determine 2 — purple, blue and darkish gray curves). The mannequin builds dedication to the proper reply progressively. That is what you’d count on from a system retrieving a discovered affiliation.

    In hallucination, one thing completely different occurs. κ doesn’t merely keep flat, which might point out retrieval failure — the absence of the related statistical sample. As an alternative, κ collapses (dashed curves in Determine 2). In all fashions examined, κ reaches a minimal considerably under its beginning worth earlier than recovering barely within the closing layers. In LLaMA-2 13B and Mistral 7B, it drops to κ_min = 0.08. The p-values are under 10⁻¹⁰⁰. This isn’t a “delicate” impact.

    Determine 2: Six fashions with the identical sample. The dashed line in every panel is a hallucination run. Each different curve — appropriate processing beneath completely different immediate circumstances — rises by means of the community. The hallucination curve falls, reaches a flooring close to zero, then partially recovers on the output layer. In LLaMA-2 13B and Mistral 7B that flooring is κ = 0.08. In Gemma 2 2B — a mannequin with a fraction of their parameters — it reaches the identical depth. The mannequin will not be failing to retrieve the proper reply. It’s actively transferring chance away from it. That’s not a retrieval failure. That may be a choice. Picture by writer.

    What is going on? The mannequin will not be failing to seek out the proper reply. It’s actively transferring chance mass away from the proper token on the similar layers the place it could be transferring chance mass towards it within the appropriate situation. The failure is mainly an override.

    The mannequin has encoded the proper reply. That’s what makes the κ collapse important. If the mannequin merely lacked the related affiliation — if “Paris” was by no means statistically linked to “capital of France” within the weights —we’d see a flat or noisy trajectory. Nothing to suppress. The geometry could be uninformative.

    What we see as a substitute is a trajectory that begins in the proper path (all curves in Determine 2 begins mainly in the identical level) however then turns. The proper token accumulates chance within the early layers, as the proper run does, after which loses it within the center layers, at precisely the depth the place it must be rising within the appropriate situation (purple,blue and darkish gray curves in Determine 1). Why? The trustworthy reply is that the paper establishes the what with precision and leaves the why open. However probably the most believable interpretation is competitors. These fashions are usually not retrieving remoted info. They’re predicting the subsequent token in a context, and context generates its personal strain. A sentence that has been moving into a selected path — stylistically, topically, syntactically — creates a robust prior for the way it ought to proceed. When the factually appropriate reply conflicts with that contextual attractor, the mannequin doesn’t flip a coin. The contextual sign, which is dense and steady throughout all the sequence, can outweigh the factual sign, which can be sparse within the coaching information.

    The coaching sign by no means explicitly advised the mannequin to favor coherence over accuracy. It advised the mannequin to foretell the subsequent token. Coherence and accuracy normally align. When they don’t, what we get is the dashed grey line in Determine 2.

    The mannequin will not be mendacity. It’s doing precisely what it was optimized to do. That is the uncomfortable half.

    Three Regimes

    One of many cleaner empirical findings is that the seven fashions don’t distribute constantly alongside any axis of hallucination habits. They fall into three distinct clusters:

    Fashions at 1B parameters present consideration reallocation starting — some geometric separation — however suppression that’s incomplete. Fashions at 1.6B–3B present intermediate suppression. The κ collapse is current however shallower. StableLM-2 1.6B reaches κ_min = 0.32 somewhat than 0.08. Then there’s Gemma 2 2B, which matches the suppression depth of LLaMA-2 13B and Mistral 7B regardless of having a fraction of their parameters (κ_min = 0.08, p < 10⁻⁹¹).

    One thing actual is happening architecturally, not simply as a perform of scale. Architectural decisions — consideration mechanisms, normalization, layer design — resolve the ceiling on suppression depth independently of parameter rely. This can be a part construction.

    Detecting Hallucinations

    Now we have mapped, with geometric precision, how a selected class of system fails. The causal query — which particular circuits implement the suppression, and why — stays open. That’s the subsequent drawback. What the geometry establishes is that the suppression will not be unintentional. It’s not a calibration error you’ll be able to tune away with higher prompting or a distinct studying price. It’s an emergent property of methods optimized for next-token prediction. Contextual coherence and factual accuracy are completely different aims. After they battle, the coaching sign doesn’t adjudicate between them. The override is what that battle seems like from the within.

    The sensible implication is direct. You should utilize this geometric signature to construct hallucination detectors — probes that determine suppression occasions earlier than they attain the output. They work nicely. However they’re native. A probe skilled on factual retrieval doesn’t switch cleanly to reasoning duties or to completely different information domains. The geometry shifts sufficient that detection degrades. This isn’t a flaw within the method. It’s info. It tells you that monitoring must be domain-specific, calibrated per deployment context, not put in as soon as and forgotten.

    For anybody constructing manufacturing methods at scale, that’s the operational conclusion: one monitor per area, skilled on consultant information from that area. The choice — a single common detector — will not be supported by the proof.

    What the Geometry Can’t Repair

    The override mechanism this work paperwork will not be a “bug ready to be patched”. It’s a direct consequence of the target perform used for coaching LLMs. Subsequent-token prediction over discrete sequences doesn’t give a mannequin any mechanism to privilege factual accuracy over contextual coherence. The coaching sign can’t differentiate between them. The mannequin learns to be fluent, which is sort of outstanding. The issue is tha fluency and accuracy normally coincide. When they don’t, fluency wins. It’s a conflict-resolution mechanism producing the fallacious consequence. The geometry exhibits you the second that call occurs.

    To reply the causal query — which particular circuits implement the suppression, and whether or not they are often modified — we’d like activation patching at scale, circuit-level evaluation, and ideally causal intervention experiments that transcend the correlational proof this paper offers. That’s the subsequent step. A number of teams are engaged on it.

    Whether or not the reply to that causal query would permit us to repair hallucination throughout the present architectural paradigm is a distinct matter. My view is that it could not — not basically. We will suppress the suppression. We will add a monitoring layer that catches the κ collapse earlier than it reaches the output. We will fine-tune on domains the place the battle is most acute. These are actual enhancements. However the underlying rigidity between contextual prediction and factual grounding doesn’t go away till the mannequin has representations of the world that aren’t derived from token co-occurrence. That requires a distinct structure.

    Why This Work Issues Anyway

    Infrastructure that precisely characterizes the failure modes of present LLMs is a needed step for the transition to raised ones. We will‘t design a successor structure with out understanding, intimately, what the predecessor is definitely doing inside. This work tells us one thing particular:

    • In autoregressive LLMs (transformers structure), the geometry of appropriate and incorrect factual processing diverges rotationally, not magnitudinally;
    • the divergence is energetic somewhat than passive;
    • the depth of suppression is architecturally gated, not purely a perform of scale;
    • the geometric signature transfers throughout domains with systematic however bounded degradation.

    The geometry doesn’t lie. What we select to do with it’s a completely different query.

    Code, information, and associated papers shall be accessible at cert-framework.com quickly.

    Really helpful studying

    • Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, and Shan Carter. 2020. Zoom in: An introduction to circuits. Distill, 5(3):e00024–001.
    • Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Daybreak Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. 2021. A mathematical framework for transformer circuits. Transformer Circuits Thread. https://transformercircuits.pub/2021/framework/index.html
    • Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Baby, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Grey, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language fashions are fewshot learners. In Advances in Neural Info Processing Methods 33: Annual Convention on Neural Info Processing Methods 2020, NeurIPS 2020, December 6–12, 2020, digital.
    • Bereska, L., & Gavves, E. (2024). Mechanistic interpretability for AI security — a evaluation. arXiv preprint arXiv:2404.14082.
    • Guillaume Alain and Yoshua Bengio. Understanding intermediate layers utilizing linear classifier probes. ICLR, 2016.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleFollow the AI Footpaths | Towards Data Science
    Next Article The foundation for a governed agent workforce: DataRobot and NVIDIA RTX PRO 4500
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Follow the AI Footpaths | Towards Data Science

    March 16, 2026
    Artificial Intelligence

    How to Build a Production-Ready Claude Code Skill

    March 16, 2026
    Artificial Intelligence

    Bayesian Thinking for People Who Hated Statistics

    March 16, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Googles framtidsvision är att Gemini utför googling åt användarna

    May 23, 2025

    Six Lessons Learned Building RAG Systems in Production

    December 19, 2025

    Study: AI chatbots provide less-accurate information to vulnerable users | MIT News

    February 19, 2026

    Agentic AI: On Evaluations | Towards Data Science

    August 7, 2025

    How I Used Machine Learning to Predict 41% of Project Delays Before They Happened

    October 17, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Grok 4 Is Making Waves as World’s “Most Intelligent Model”

    July 22, 2025

    Introducing ShaTS: A Shapley-Based Method for Time-Series Models

    November 17, 2025

    MedGemma – Nya AI-modeller för hälso och sjukvård

    July 15, 2025
    Our Picks

    Identity-first AI governance: Securing the agentic workforce

    March 16, 2026

    The foundation for a governed agent workforce: DataRobot and NVIDIA RTX PRO 4500

    March 16, 2026

    Hallucinations in LLMs Are Not a Bug in the Data

    March 16, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.