generate tons of phrases and responses primarily based on normal data, however what occurs after we want solutions requiring correct and particular data? Solely generative fashions ceaselessly wrestle to supply solutions on area particular questions for a bunch of causes; perhaps the info they have been educated on are actually outdated, perhaps what we’re asking for is actually particular and specialised, perhaps we would like responses that bear in mind private or company information that simply aren’t public… 🤷♀️ the checklist goes on.
So, how can we leverage generative AI whereas retaining our responses correct, related, and down-to-earth? A superb reply to this query is the Retrieval-Augmented Generation (RAG) framework. RAG is a framework that consists of two key elements: retrieval and technology (duh!). Not like solely generative fashions which are pre-trained on particular information, RAG incorporates an additional step of retrieval that enables us to push further info into the mannequin from an exterior supply, reminiscent of a database or a doc. To place it in another way, a RAG pipeline permits for offering coherent and pure responses (offered by the technology step), that are additionally factually correct and grounded in a data base of our selection (offered by the retrieval step).
On this method, RAG may be a particularly worthwhile software for purposes the place extremely specialised information is required, as as an illustration buyer help, authorized recommendation, or technical documentation. One typical instance of a RAG utility is buyer help chatbots, answering buyer points primarily based on an organization’s database of help paperwork and FAQs. One other instance can be complicated software program or technical merchandise with intensive troubleshooting guides. Yet another instance can be authorized recommendation — a RAG mannequin would entry and retrieve customized information from regulation libraries, earlier instances, or agency pointers. The examples are actually limitless; nevertheless, in all these instances, the entry to exterior, particular, and related to the context information allows the mannequin to supply extra exact and correct responses.
So, on this publish, I stroll you thru constructing a easy RAG pipeline in Python, using ChatGPT API, LangChain, and FAISS.
What about RAG?
From a extra technical perspective, RAG is a method used to reinforce an LLM’s responses by injecting it with further, domain-specific info. In essence, RAG permits for a mannequin to additionally bear in mind further exterior info — like a recipe ebook, a technical handbook, or an organization’s inside data base — whereas forming its responses.
This is essential as a result of it permits us to remove a bunch of issues inherent to LLMs, as as an illustration:
- Hallucinations — making issues up
- Outdated info — if the mannequin wasn’t educated on latest information
- Transparency — not realizing the place responses are coming from
To make this work, the exterior paperwork are first processed into vector embeddings and saved in a vector database. Then, after we submit a immediate to the LLM, any related information is retrieved from the vector database and handed to the LLM together with our immediate. Consequently, the response of the LLM is fashioned by contemplating each our immediate and any related info current within the vector database within the background. Such a vector database may be hosted regionally or within the cloud, utilizing a service like Pinecone or Weaviate.
What about ChatGPT API, LangChain, and FAISS?
The primary part for constructing a RAG pipeline is the LLM mannequin that can generate the responses. This may be any LLM, like Gemini or Claude, however on this publish, I will likely be utilizing OpenAI’s ChatGPT fashions by way of their API platform. In an effort to use their API, we have to sign up and procure an API key. We additionally want to ensure the respective Python libraries are put in.
pip set up openai
The opposite main part of constructing a RAG is processing exterior information — producing embeddings from paperwork and storing them in a vector database. The preferred framework for performing such a job is LangChain. Specifically, LangChain permits:
- Load and extract textual content from numerous doc varieties (PDFs, DOCX, TXT, and so forth.)
- Break up the textual content into chunks appropriate for producing the embeddings
- Generate vector embeddings (on this publish, with the help of OpenAI’s API)
- Retailer and search embeddings by way of vector databases like FAISS, Chroma, and Pinecone
We are able to simply set up the required LangChain libraries by:
pip set up langchain langchain-community langchain-openai
On this publish, I’ll be utilizing LangChain along with FAISS, an area vector database developed by Fb AI Analysis. FAISS is a really light-weight package deal, and is thus acceptable for constructing a easy/small RAG pipeline. It may be simply put in with:
pip set up faiss-cpu
Placing every little thing collectively
So, in abstract, I’ll use:
- ChatGPT fashions by way of OpenAI’s API because the LLM
- LangChain, together with OpenAI’s API, to load the exterior recordsdata, course of them, and generate the vector embeddings
- FAISS to generate an area vector database
The file that I will likely be feeding into the RAG pipeline for this publish is a textual content file with some information about me. This textual content file is positioned within the folder ‘RAG recordsdata’.

Now we’re all arrange, and we are able to begin by specifying our API key and initializing our mannequin:
from openai import OpenAI # Chat_GPT API key api_key = "your key"
# initialize LLM
llm = ChatOpenAI(openai_api_key=api_key, mannequin="gpt-4o-mini", temperature=0.3)
Then we are able to load the recordsdata we wish to use for the RAG, generate the embeddings, and retailer them as a vector database as follows:
# loading paperwork for use for RAG
text_folder = "rag_files"
all_documents = []
for filename in os.listdir(text_folder):
if filename.decrease().endswith(".txt"):
file_path = os.path.be part of(text_folder, filename)
loader = TextLoader(file_path)
all_documents.prolong(loader.load())
# generate embeddings
embeddings = OpenAIEmbeddings(openai_api_key=api_key)
# create vector database w FAISS
vector_store = FAISS.from_documents(paperwork, embeddings)
retriever = vector_store.as_retriever()
Lastly, we are able to wrap every little thing in a easy executable Python file:
def foremost():
print("Welcome to the RAG Assistant. Kind 'exit' to give up.n")
whereas True:
user_input = enter("You: ").strip()
if user_input.decrease() == "exit":
print("Exiting…")
break
# get related paperwork
relevant_docs = retriever.get_relevant_documents(user_input)
retrieved_context = "nn".be part of([doc.page_content for doc in relevant_docs])
# system immediate
system_prompt = (
"You're a useful assistant. "
"Use ONLY the next data base context to reply the person. "
"If the reply is just not within the context, say you do not know.nn"
f"Context:n{retrieved_context}"
)
# messages for LLM
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_input}
]
# generate response
response = llm.invoke(messages)
assistant_message = response.content material.strip()
print(f"nAssistant: {assistant_message}n")
if __name__ == "__main__":
foremost()
Discover how the system immediate is outlined. Primarily, a system immediate is an instruction given to the LLM that units the habits, tone, or constraints of the assistant earlier than the person interacts. For instance, we may set the system immediate to make the LLM present responses like speaking to a 4-year-old or a rocket scientist — right here we ask to supply responses solely primarily based on the exterior information we offered, the ‘Maria information’
So, let’s see what we’ve cooked! 🍳
Firstly, I ask a query that’s irrelevant to the offered exterior datasource, to guarantee that the mannequin solely makes use of the offered datasource when forming the responses and never normal data.

… after which I requested some questions particularly from the file I offered…

✨✨✨✨
On my thoughts
Apparently, it is a very simplistic instance of a RAG setup — there’s way more to contemplate when implementing it in an actual enterprise surroundings, reminiscent of safety issues round how information is dealt with, or efficiency points when coping with a bigger, extra real looking data corpus and elevated token utilization. Nonetheless, I consider OpenAI’s API is really spectacular and gives immense, untapped potential for constructing customized, context-specific AI purposes.
Liked this publish? Let’s be buddies! Be part of me on