Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI

Introduction

Retrieval-Augmented Technology (RAG) could have been essential for the primary wave of enterprise AI, nevertheless it’s shortly evolving into one thing a lot bigger. Over the previous two years, organizations have realized that merely retrieving textual content snippets utilizing vector search isn’t sufficient. Context needs to be ruled, explainable, and adaptive to an agent’s goal.

This put up explores how that evolution is taking form and what it means for knowledge and AI leaders constructing methods that may motive responsibly.

You’ll come away with solutions to some key questions:

How do data graphs enhance RAG?

They supply construction and that means to enterprise knowledge, linking entities and relationships throughout paperwork and databases to make retrieval extra correct and explainable for each people and machines.

How do semantic layers assist LLMs retrieve higher solutions?

Semantic layers standardize knowledge definitions and governance insurance policies so AI brokers can perceive, retrieve, and motive over all types of knowledge in addition to AI instruments, recollections, and different brokers.

How is RAG evolving within the age of agentic AI?

Retrieval is changing into one step in a broader reasoning loop (more and more being referred to as “context engineering”) the place brokers dynamically write, compress, isolate, and choose context throughout knowledge and instruments.

TL;DR

(RAG) rose to prominence following the launch of ChatGPT and the conclusion that there’s a restrict on the context window: you may’t simply copy all of your knowledge into the chat interface. Groups used RAG, and its variants like GraphRAG (RAG utilizing a graph database) to convey extra context into prompts at question time. RAG’s reputation quickly uncovered its weaknesses: placing incorrect, irrelevant, or simply an excessive amount of info into the context window can really degrade fairly than enhance outcomes. New strategies like re-rankers have been developed to beat these limitations however RAG wasn’t constructed to outlive within the new agentic world.

As AI shifts from single prompts to autonomous brokers, retrieval and its variants are only one device in an agent’s toolbelt, alongside writing, compressing, and isolating context. Because the complexity of workflows and the data required to finish these workflows grows, retrieval will proceed to evolve (although it could be referred to as context engineering, RAG 2.0, or agentic retrieval). The subsequent period of retrieval (or context engineering) would require metadata administration throughout knowledge buildings (not simply relational) in addition to instruments, recollections, and brokers themselves. We’ll consider retrieval not only for accuracy but in addition relevance, groundedness, provenance, protection, and recency. Information graphs will probably be key for retrieval that’s context-aware, policy-aware, and semantically grounded.

The Rise of RAG

What’s RAG?

RAG, or Retrieval-Augmented Technology, is a method for retrieving related info to reinforce a immediate that’s despatched to an LLM with the intention to enhance the mannequin’s response.

Shortly after ChatGPT went mainstream in November 2022, customers realized that LLMs weren’t (hopefully) skilled on their very own knowledge. To bridge that hole, groups started creating methods to retrieve related knowledge at question time to reinforce the immediate – an strategy often known as retrieval-augmented technology (RAG). The time period got here from a 2020 Meta paper, however the reputation of the GPT fashions introduced the time period and the follow into the limelight.

Instruments like LangChain and LlamaIndex helped builders construct these retrieval pipelines. LangChain was launched at across the similar time as ChatGPT as a manner of chaining completely different parts like immediate templates, LLMs, brokers, and reminiscence collectively for generative AI purposes. LlamaIndex was additionally launched similtaneously a strategy to tackle the restricted context window in GPT3 and thus enabling RAG. As builders experimented, they realized that vector databases present a quick and scalable strategy to energy the retrieval a part of RAG, and vector databases like Weaviate, Pinecone, and Chroma turn into commonplace elements of the RAG structure.

What’s GraphRAG?

GraphRAG is a variation of RAG the place the underlying database used for retrieval is a data graph or a graph database.

One variation of RAG grew to become particularly in style: GraphRAG. The thought right here is that the underlying knowledge to complement LLM prompts is saved in a data graph. This enables the mannequin to motive over entities and relationships fairly than flat textual content chunks. In early 2023, researchers started publishing papers exploring how data graphs and LLMs might complement one another. In late 2023, Juan Sequeda, Dean Allemang, and Bryon Jacob from knowledge.world launched a paper demonstrating how data graphs can enhance LLM accuracy and explainability. In July 2024, Microsoft open-sourced its GraphRAG framework, which made graph-based retrieval accessible to a wider developer viewers and solidified GraphRAG as a recognizable class inside RAG.

The rise of GraphRAG reignited curiosity in data graphs akin to when Google launched its Information Graph in 2012. The sudden demand for structured context and explainable retrieval gave them new relevance.

From 2023–2025, the market responded shortly:

January 23, 2023 – Digital Science acquired metaphacts, creators of the metaphactory platform: “a platform that helps clients in accelerating their adoption of information graphs and driving data democratization.”

February 7, 2023 – Progress acquired MarkLogic in February of 2023. MarkLogic is a multimodal NoSQL database, with a specific energy in managing RDF knowledge, the core knowledge format for graph know-how.
July 18, 2024 – Samsung acquired Oxford Semantic Applied sciences, makers of the RDFox graph database, to energy on-device reasoning and private data capabilities.
October 23, 2024 – Ontotext and Semantic Net Firm merged to type Graphwise, explicitly positioning round GraphRAG. “The announcement is critical for the graph business, because it elevates Graphwise as probably the most complete data graph AI group and establishes a transparent path in direction of democratizing the evolution of Graph RAG as a class.”
Could 7, 2025 – ServiceNow introduced its acquisition of knowledge.world, integrating a graph-based knowledge catalog and semantic layer into its enterprise workflow platform.

These are simply the occasions associated to data graph and associated semantic know-how. If we broaden this to incorporate metadata administration and/or semantic layers extra broadly then there are extra offers, most notably the $8 billion acquisition of metadata chief Informatica by Salesforce.

These strikes mark a transparent shift: data graphs are now not simply metadata administration instruments—they’ve turn into the semantic spine for AI and nearer to their origins as knowledgeable methods. GraphRAG made data graphs related once more by giving them a important position in retrieval, reasoning, and explainability.

In my day job because the product lead for a semantic data and AI company, we work to resolve the hole between knowledge and its precise that means for a few of the world’s greatest firms. Making their knowledge AI-ready is a mixture of making it interoperable, discoverable, and usable so it may well feed LLMs contextually related info with the intention to produce protected, correct outcomes. That is no small order for big, extremely regulated, and sophisticated enterprises managing exponential quantities of knowledge.

The autumn of RAG and the rise of context engineering

Is RAG useless? No, nevertheless it has developed. The unique model of RAG relied on a single dense vector search and took the highest outcomes to feed immediately into an LLM. GraphRAG constructed on this by including in some graph analytics and entity and/or relationship filters. These implementations virtually instantly bumped into constraints round relevance, scalability, and noise. These constraints pushed RAG ahead into new evolutions identified by many names: agentic retrieval, RAG 2.0, and most just lately, context engineering. The unique, naive implementation is basically useless, however its descendants are thriving and the time period itself remains to be extremely in style.

Following the RAG hype cycle in 2024, there was inevitable disillusionment. Whereas it’s attainable to construct a RAG demo in minutes, and many individuals did, getting your app to scale in an enterprise turns into fairly a bit dicier. “Folks suppose that RAG is simple as a result of you may construct a pleasant RAG demo on a single doc in a short time now and will probably be fairly good. However getting this to truly work at scale on actual world knowledge the place you might have enterprise constraints is a really completely different drawback,” said Douwe Kiela of Contextual AI and one of many authors of the unique RAG paper from Meta in 2020.

One difficulty with scaling a RAG app is the amount of knowledge wanted at retrieval time. “I feel the difficulty that individuals get into with it’s scaling it up. It’s nice on 100 paperwork, however now unexpectedly I’ve to go to 100,000 or 1,000,000 paperwork” says Rajiv Shah. However as LLMs matured, their context home windows grew. The dimensions of context home windows was the unique ache level that RAG was constructed to handle, elevating the query if RAG remains to be essential or helpful. As Dr. Sebastian Gehrmann from Bloomberg points out, “If I’m able to simply paste in additional paperwork or extra context, I don’t have to depend on as many methods to slim down the context window. I can simply depend on the big language mannequin. There’s a tradeoff right here although” he notes, “the place longer context often comes at a price of considerably elevated latency and price.”

It isn’t simply value and latency that you simply danger by arbitrarily dumping extra info into the context window, you may as well degrade efficiency. RAG can enhance responses from LLMs, supplied the retrieved context is related to the preliminary immediate. If the context isn’t related, you will get worse outcomes, one thing referred to as “context poisoning” or “context conflict”, the place deceptive or contradictory info contaminates the reasoning course of. Even in case you are retrieving related context, you may overwhelm the mannequin with sheer quantity, resulting in “context confusion” or “context distraction.” Whereas terminology varies, a number of research present that mannequin accuracy tends to say no past a sure context measurement. This was present in a Databricks paper again in August of 2024 and bolstered by way of recent research from Chroma, one thing they termed “context rot”. Drew Breuning’s post usefully categorizes these points as distinct “context fails”.

To deal with the issue of overwhelming the mannequin, or offering incorrect, or irrelevant info, re-rankers have grown in reputation. As Nikolaos Vasiloglou from RelationalAI states, “a re-ranker is, after you convey the information, how do you resolve what to maintain and what to throw away, [and that] has a huge impact.” In style re-rankers are Cohere Rerank, Voyage AI Rerank, Jina Reranker, and BGE Reranker. Re-ranking isn’t sufficient in in the present day’s agentic world. The most recent technology of RAG has turn into embedded into brokers–one thing more and more often known as context engineering.

What’s Context Engineering?

“the artwork and science of filling the context window with simply the suitable info at every step of an agent’s trajectory.” Lance Martin of LangChain.

I wish to deal with context engineering for 2 causes: the originators of the phrases RAG 2.0 and Agentic Retrieval (Contextual AI and LlamaIndex, respectively) have began utilizing the time period context engineering; and it’s a much more in style time period based mostly on Google search developments. Context engineering will also be regarded as an evolution of immediate engineering. Immediate engineering is about crafting a immediate in a manner that will get you the outcomes you need, context engineering is about supplementing that immediate with the suitable context

RAG grew to prominence in 2023, eons in the past within the timeline of AI. Since then, all the things has turn into ‘agentic’. RAG was created beneath the idea that the immediate could be generated by a human, and the response could be learn by a human. With brokers, we have to rethink how this works. Lance Martin breaks down context engineering into 4 classes: write, compress, isolate, and choose. Brokers have to write (or persist or bear in mind) info from process to process, similar to people. Brokers will usually have an excessive amount of context as they go from process to process and have to compress or condense it one way or the other, often by way of summarization or ‘pruning’. Relatively than giving the entire context to the mannequin, we will isolate it or cut up it throughout brokers to allow them to, as Anthropic describes it, “discover completely different elements of the issue concurrently”. Relatively than danger context rot and degraded outcomes, the concept right here is to not give the LLM sufficient rope to hold itself.

Brokers have to make use of their recollections when wanted or name upon instruments to retrieve extra info, i.e. they should choose (retrieve) what context to make use of. A type of instruments might be vector-based retrieval i.e. conventional RAG. However that is only one device within the agent’s toolbox. As Mark Brooker from AWS put it, “I do count on what we’re going to see is a few of the flashy newness round vector form of cool down and us go to a world the place we have now this new device in our toolbox, however numerous the brokers we’re constructing are utilizing relational interfaces. They’re utilizing these doc interfaces. They’re utilizing lookup by main key, lookup by secondary index. They’re utilizing lookup by geo. All of these items which have existed within the database area for many years, now we even have this another, which is kinda lookup by semantic that means, which may be very thrilling and new and highly effective.”

These on the forefront are already doing this. Martin quotes Varun Mohan of Windsurf who says, “we […] depend on a mix of strategies like grep/file search, data graph based mostly retrieval, and … a re-ranking step the place [context] is ranked so as of relevance.”

Naive RAG could also be useless, and we’re nonetheless determining what to name the fashionable implementations, however one factor appears sure: the way forward for retrieval is vivid. How can we guarantee brokers are in a position to retrieve completely different datasets throughout an enterprise? From relational knowledge to paperwork? The reply is more and more being referred to as the semantic layer.

Context engineering wants a semantic layer

What’s a Semantic Layer?

A semantic layer is a manner of attaching metadata to all knowledge in a type that’s each human and machine readable, so that individuals and computer systems can persistently perceive, retrieve, and motive over it.

There’s a latest push from these within the relational knowledge world to construct a semantic layer over relational knowledge. Snowflake even created an Open Semantic Interchange (OSI) initiative to aim to standardize the best way firms are documenting their knowledge to make it prepared for AI.

However focusing solely on relational knowledge is a slim view of semantics. What about unstructured knowledge and semi-structured knowledge? That’s the form of knowledge that giant language fashions excel at and what began all of the RAG rage. If solely there was a precedent for retrieving related search outcomes throughout a ton of unstructured knowledge 🤔.

Google has been retrieving related info throughout the complete web for many years utilizing structured knowledge. By structured knowledge, right here, I imply machine-readable metadata, or as Google describes it, “a standardized format for offering details about a web page and classifying the web page content material.” Librarians, info scientists, and search engine marketing practitioners have been tackling the unstructured knowledge retrieval drawback by way of data group, info retrieval, structured metadata, and Semantic Net applied sciences. Their strategies for describing, linking, and governing unstructured knowledge underpin in the present day’s search and discovery methods, each publicly and on the enterprise. The way forward for the semantic layer will bridge the relational and the structured knowledge worlds by combining the rigor of relational knowledge administration with the contextual richness of library sciences and data graphs.

Picture by Creator

The way forward for RAG

Listed below are my predictions for the way forward for RAG.

RAG will proceed to evolve into extra agentic patterns. Because of this retrieval of context is only one a part of a reasoning loop which additionally contains writing, compressing, and isolating context. Retrieval turns into an iterative course of, fairly than one-shot. Anthropic’s Model Context Protocol (MCP) treats retrieval as a device that may be given through MCP to an agent. OpenAI presents File search as a device that brokers can name. LangChain’s agent framework LangGraph enables you to construct brokers utilizing a node and edge sample (like a graph). Of their quickstart guide right here, you may see that retrieval (on this case an internet search) is simply one of many instruments that the agent may be given to do its job. Here they record retrieval as one of many actions an agent or workflow can take. Wikidata additionally has an MCP that permits customers to work together immediately with public knowledge.

Retrieval will broaden and embody all types of knowledge (aka multimodal retrieval): relational, content material, after which photographs, audio, geodata, and video. LlamaIndex presents 4 ‘retrieval modes’: chunks, files_via_metadata, files_via_content, auto_routed. In addition they supply composite retrieval, permitting you to retrieve from a number of sources directly. Snowflake presents Cortex Search for content material and Cortex Analyst for relational knowledge. LangChain presents retrievers over relational knowledge, graph knowledge (Neo4j), lexical, and vector.

Retrieval will broaden to incorporate metadata about instruments themselves, in addition to “recollections”. Anthropic’s MCP standardized how brokers name instruments utilizing a registry of tools i.e. device metadata. OpenAI, LangChain, LlamaIndex, AWS Bedrock, Azure, Snowflake, and Databricks all have capabilities for managing instruments, some through MCP immediately, others through their very own registries. On the reminiscence aspect, each LlamaIndex and LangChain deal with recollections as retrievable knowledge (quick time period and long run) that brokers can question throughout workflows. Tasks like Cognee push this additional with devoted, queryable agent reminiscence.

Information graphs will play a key position as a metadata layer between relational and unstructured knowledge, changing the slim definition of semantic layer presently in use with a extra sturdy metadata administration framework. The market consolidation we’ve seen over the previous couple years and described above, I imagine, is a sign of the market’s rising acknowledgement that data graphs and metadata administration are going to be essential as brokers are requested to do extra sophisticated duties throughout enterprise knowledge. Gartner’s Could 2025 report “Pivot Your Data Engineering Discipline to Efficiently Support AI Use Cases,” recommends knowledge engineering groups undertake semantic strategies (akin to ontologies and data graphs) to help AI use instances. Information graphs, metadata administration, and reference knowledge administration are already ubiquitous in giant life sciences and monetary providers firms, largely as a result of they’re extremely regulated and require fact-based, grounded knowledge to energy their AI initiatives. Different industries are going to begin adopting the tried and true strategies of semantic know-how as their use instances turn into extra mature and require explainable solutions.

Analysis metrics on context retrieval will acquire reputation. Ragas, Databricks Mosaic AI Agent Analysis, and TruLens all present frameworks for evaluating RAG. Evidently presents open supply libraries and tutorial materials on RAG analysis. LangChain’s analysis product LangSmith has a module centered on RAG. What’s vital is that these frameworks will not be simply evaluating the accuracy of the reply given the immediate, they consider context relevance and groundedness (how properly the response is supported by the context). Some distributors are constructing out metrics to judge provenance (citations and sourcing) of the retrieved context, protection (did we retrieve sufficient?) and freshness or recency.

Coverage-as-code guardrails guarantee retrieval respects entry management, insurance policies, laws, and finest practices. Snowflake and Databricks allow row stage entry management and column masking already. Coverage engines like Open Policy Agent (OPA) and Oso are embedding entry management into agentic workflows. As Dr. Sebastian Gehrmann of Bloomberg has found, “RAG isn’t essentially safer,” and might introduce new governance dangers. I count on the necessity for guardrails to develop to incorporate extra sophisticated governance guidelines (past entry management), coverage necessities, and finest practices.

Conclusion

RAG was by no means the tip purpose, simply the place to begin. As we transfer into the agentic period, retrieval is evolving into part of a full self-discipline: context engineering. Brokers don’t simply want to seek out paperwork; they should perceive which knowledge, instruments, and recollections are related for every step of their reasoning. This understanding requires a semantic layer–a strategy to perceive, retrieve, and govern over the complete enterprise. Information graphs, ontologies, and semantic fashions will present that connective tissue. The subsequent technology of retrieval gained’t simply be about velocity and accuracy; it’ll even be about explainability and belief. The way forward for RAG isn’t retrieval alone, however retrieval that’s context-aware, policy-aware, and semantically grounded.

Concerning the creator: Steve Hedden is the Head of Product Administration at TopQuadrant, the place he leads the technique for TopBraid EDG, a platform for data graph and metadata administration. His work focuses on bridging enterprise knowledge governance and AI by way of ontologies, taxonomies, and semantic applied sciences. Steve writes and speaks frequently about data graphs, and the evolving position of semantics in AI methods.

Source link

Creating AI that matters | MIT News

Scaling Recommender Transformers to a Billion Parameters

Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know

From Reactive to Predictive: Forecasting Network Congestion with Machine Learning and INT

How to Build an MCQ App

An anomaly detection framework anyone can use | MIT News

How to Evaluate Retrieval Quality in RAG Pipelines: Precision@k, Recall@k, and F1@k

YouTube Tests AI Feature That Will Completely Change How You Search for Videos

Most Popular

Inroads to personalized AI trip planning | MIT News

YouTube lanserar Lens för Shorts: AI-sökning direkt i videon

It’s pretty easy to get DeepSeek to talk dirty

Our Picks