Close Menu
    Trending
    • Three OpenClaw Mistakes to Avoid and How to Fix Them
    • I Stole a Wall Street Trick to Solve a Google Trends Data Problem
    • How AI is turning the Iran conflict into theater
    • Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)
    • Machine Learning at Scale: Managing More Than One Model in Production
    • Improving AI models’ ability to explain their predictions | MIT News
    • Write C Code Without Learning C: The Magic of PythoC
    • LatentVLA: Latent Reasoning Models for Autonomous Driving
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » You Probably Don’t Need a Vector Database for Your RAG — Yet
    Artificial Intelligence

    You Probably Don’t Need a Vector Database for Your RAG — Yet

    ProfitlyAIBy ProfitlyAIJanuary 20, 2026No Comments15 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    , off the again of Retrieval Augmented Technology (RAG), vector databases are getting lots of consideration within the AI world. 

    Many individuals say you want instruments like Pinecone, Weaviate, Milvus, or Qdrant to construct a RAG system and handle your embeddings. In case you are engaged on enterprise functions with tons of of thousands and thousands of vectors, then instruments like these are important. They allow you to carry out CRUD operations, filter by metadata, and use disk-based indexing that goes past your laptop’s reminiscence.

    However for many inner instruments, documentation bots, or MVP brokers, including a devoted vector database is perhaps overkill. It will increase complexity, community delays, provides serialisation prices, and makes issues extra difficult to handle.

    The reality is that “Vector Search” (i.e the Retrieval a part of RAG) is simply matrix multiplication. And Python already has a number of the world’s finest instruments for that.

    On this article, we’ll present how you can construct a production-ready retrieval part of a RAG pipeline for small-to-medium knowledge volumes utilizing solely NumPy and SciKit-Study. You’ll see that it’s potential to look thousands and thousands of textual content strings in milliseconds, all in reminiscence and with none exterior dependencies.

    Understanding Retrieval as Matrix Math

    Sometimes, RAG includes 4 foremost steps:

    1. Embed: Flip the textual content of your supply knowledge into vectors (lists of floating-point numbers)
    2. Retailer: Squirrel these vectors away right into a database 
    3. Retrieve: Discover vectors which are mathematically “shut” to the question vector.
    4. Generate: Feed the corresponding textual content to an LLM and get your remaining reply.

    Steps 1 and 4 depend on massive language fashions. Steps 2 and three are the area of the Vector DB. We’ll focus on components 2 and three and the way we keep away from utilizing vector DBs totally.

    However once we’re looking out our vector database, what truly is “closeness”? Often, it’s Cosine Similarity. In case your two vectors are normalised to have a magnitude of 1, then cosine similarity is simply the dot product of the 2.

    If in case you have a one-dimensional question vector of dimension N, Q(1xN), and a database of doc vectors of dimension M by N, D(MxN), discovering the most effective matches will not be a database question; it’s a matrix multiplication operation, the dot product of D with the transpose of Q.

    Scores = D.Q^T

    NumPy is designed to carry out this sort of operation effectively, utilizing routines that leverage trendy CPU options comparable to vectorisation.

    The Implementation

    We’ll create a category referred to as SimpleVectorStore to deal with ingestion, indexing, and retrieval. Our enter knowledge will encompass a number of recordsdata containing the textual content we need to search on. Utilizing Sentence Transformers for native embeddings will make every part work offline.

    Stipulations

    Arrange a brand new growth surroundings, set up the required libraries, and begin a Jupyter pocket book.

    Sort the next instructions right into a command shell. I’m utilizing UV as my package deal supervisor; change to go well with no matter instrument you’re utilizing.

    $ uv init ragdb
    $ cd ragdb
    $ uv venv ragdb
    $ supply ragdb/bin/activate
    $ uv pip set up numpy scikit-learn sentence-transformers jupyter
    $ jupyter pocket book

    The In-Reminiscence Vector Retailer

    We don’t want a sophisticated server. All we want is a perform to load our textual content knowledge from the enter recordsdata and chunk it into byte-sized items, and a category with two lists: one for the uncooked textual content chunks and one for the embedding matrix. Right here’s the code.

    import numpy as np
    import os
    from sentence_transformers import SentenceTransformer
    from sklearn.metrics.pairwise import cosine_similarity
    from typing import Listing, Dict, Any
    from pathlib import Path
    
    class SimpleVectorStore:
        def __init__(self, model_name: str = 'all-MiniLM-L6-v2'):
            print(f"Loading embedding mannequin: {model_name}...")
            self.encoder = SentenceTransformer(model_name)
            self.paperwork = []  # Shops the uncooked textual content and metadata
            self.embeddings = None # Will turn out to be a numpy array 
    
        def add_documents(self, docs: Listing[Dict[str, Any]]):
            """
            Ingests paperwork.
            docs format: [{'text': '...', 'metadata': {...}}, ...]
            """
            texts = [d['text'] for d in docs]
            
            # 1. Generate Embeddings
            print(f"Embedding {len(texts)} paperwork...")
            new_embeddings = self.encoder.encode(texts)
            
            # 2. Normalize Embeddings 
            # (Essential optimization: permits dot product to approximate cosine similarity)
            norm = np.linalg.norm(new_embeddings, axis=1, keepdims=True)
            new_embeddings = new_embeddings / norm
            
            # 3. Replace Storage
            if self.embeddings is None:
                self.embeddings = new_embeddings
            else:
                self.embeddings = np.vstack([self.embeddings, new_embeddings])
                
            self.paperwork.lengthen(docs)
            print(f"Retailer now comprises {len(self.paperwork)} paperwork.")
    
        def search(self, question: str, ok: int = 5):
            """
            Retrieves the top-k most comparable paperwork.
            """
            if self.embeddings is None or len(self.paperwork) == 0:
                print("Warning: Vector retailer is empty. No paperwork to look.")
                return []
    
            # 1. Embed and Normalize Question
            query_vec = self.encoder.encode([query])
            norm = np.linalg.norm(query_vec, axis=1, keepdims=True)
            query_vec = query_vec / norm
            
            # 2. Vectorized Search (Matrix Multiplication)
            # Outcome form: (1, N_docs)
            scores = np.dot(self.embeddings, query_vec.T).flatten()
            
            # 3. Get Prime-Ok Indices
            # argsort kinds ascending, so we take the final ok and reverse them
            # Guarantee ok does not exceed the variety of paperwork
            ok = min(ok, len(self.paperwork))
            top_k_indices = np.argsort(scores)[-k:][::-1]
            
            outcomes = []
            for idx in top_k_indices:
                outcomes.append({
                    "rating": float(scores[idx]),
                    "textual content": self.paperwork[idx]['text'],
                    "metadata": self.paperwork[idx].get('metadata', {})
                })
                
            return outcomes
    
    def load_from_directory(directory_path: str, chunk_size: int = 1000, overlap: int = 200):
        """
        Reads .txt recordsdata and splits them into overlapping chunks.
        """
        docs = []
        # Use pathlib for sturdy path dealing with and backbone
        path = Path(directory_path).resolve()
        
        if not path.exists():
            print(f"Error: Listing '{path}' not discovered.")
            print(f"Present working listing: {os.getcwd()}")
            return docs
            
        print(f"Loading paperwork from: {path}")
        for file_path in path.glob("*.txt"):
            attempt:
                with open(file_path, "r", encoding="utf-8") as f:
                    textual content = f.learn()
                    
                # Easy sliding window chunking
                # We iterate via the textual content with a step dimension smaller than the chunk dimension
                # to create overlap (preserving context between chunks).
                step = chunk_size - overlap
                for i in vary(0, len(textual content), step):
                    chunk = textual content[i : i + chunk_size]
                    
                    # Skip chunks which are too small (e.g., leftover whitespace)
                    if len(chunk) < 50:
                        proceed
                        
                    docs.append({
                        "textual content": chunk,
                        "metadata": {
                            "supply": file_path.identify,
                            "chunk_index": i
                        }
                    })
            besides Exception as e:
                print(f"Warning: Couldn't learn file {file_path.identify}: {e}")
                
        print(f"Efficiently loaded {len(docs)} chunks from {len(checklist(path.glob('*.txt')))} recordsdata.")
        return docs

    The embedding mannequin used

    The all-MiniLM-L6-v2 mannequin used within the code is from the Sentence Transformers library. This was chosen as a result of,

    1. It’s quick and light-weight.
    2. It produces 384-dimensional vectors that use much less reminiscence than bigger fashions.
    3. It performs properly on all kinds of English-language duties while not having specialised fine-tuning.

    This mannequin is only a suggestion. You should utilize any embedding mannequin you need in case you have a specific favorite.

    Why Normalise?

    You may discover the normalisation steps within the code. We talked about it earlier than, however to be clear, given two vectors X and Y, cosine similarity is outlined as 

    Similarity = (X · Y) / (||X|| * ||Y||)

    The place:

    • X · Y is the dot product of vectors X and Y
    • ||X|| is the magnitude (size) of vector X
    • ||Y|| is the magnitude of vector Y

    Since division takes additional computation, if all our vectors have unit magnitude, the denominator is 1, so the formulation reduces to the dot product of X and Y, which makes looking out a lot sooner.

    Testing the Efficiency

    The very first thing we have to do is get some enter knowledge to work with. You should utilize any enter textual content file for this. For earlier RAG experiments, I used a guide I downloaded from Challenge Gutenberg. The constantly riveting:

    “Illnesses of cattle, sheep, goats, and swine by Jno. A. W. Greenback & G. Moussu”

    Observe that you would be able to view the Challenge Gutenberg Permissions, Licensing and different Widespread Requests web page utilizing the next hyperlink.

    https://www.gutenberg.org/policy/permission.html

    However to summarise, the overwhelming majority of Challenge Gutenberg eBooks are within the public area within the US and different components of the world. Which means that no person can grant or withhold permission to do with this merchandise as you please.

    “… as you please” contains any industrial use, republishing in any format, making spinoff works or performances

    I downloaded the textual content of the guide from the Challenge Gutenberg web site to my native PC utilizing this hyperlink,

    https://www.gutenberg.org/ebooks/73019.txt.utf-8

    This guide contained roughly 36,000 traces of textual content. Querying the guide takes solely six traces of code. For my pattern query, line 2315 of the guide discusses a illness referred to as CONDYLOMATA. Right here is the excerpt,

    INFLAMMATION OF THE INTERDIGITAL SPACE.

    (CONDYLOMATA.)

    Condylomata outcome from power irritation of the pores and skin masking the
    interdigital ligament. Any damage to this area inflicting even
    superficial harm could lead to power irritation of the pores and skin and
    hypertrophy of the papillæ, the primary stage within the manufacturing of
    condylomata.

    Accidents produced by cords slipped into the interdigital house for the
    objective of lifting the toes when shoeing working oxen are additionally fruitful
    causes.

    In order that‘s what we’ll ask, “What’s Condylomata?” Observe that we gained’t get a correct reply as we’re not feeding our search outcome into an LLM, however we must always see that our search returns a textual content snippet that will give the LLM all of the required info to formulate a solution had we achieved so.

    %%time
    # 1. Initialize
    retailer = SimpleVectorStore()
    
    # 2. Load Paperwork
    real_docs = load_from_directory("/mnt/d/guide")
    
    # 3. Add to Retailer
    if real_docs:
       retailer.add_documents(real_docs)
    
    # 4. Search
    outcomes = retailer.search("What's Condylomata?", ok=1)
    
    outcomes

    And right here is the output.

    Loading embedding mannequin: all-MiniLM-L6-v2...
    Loading paperwork from: /mnt/d/guide
    Efficiently loaded 2205 chunks from 1 recordsdata.
    Embedding 2205 paperwork...
    Retailer now comprises 2205 paperwork.
    CPU instances: consumer 3.27 s, sys: 377 ms, complete: 3.65 s
    Wall time: 3.82 s
    
    [{'score': 0.44883957505226135,
      'text': 'two lastnphalanges, the latter operation being easier than 
    the former, andnproviding flaps of more regular shape and better adapted 
    for thenproduction of a satisfactory stump.nnn                
    INFLAMMATION OF THE INTERDIGITAL SPACE.nn(CONDYLOMATA.)nn
    Condylomata result from chronic inflammation of the skin covering 
    theninterdigital ligament. Any injury to this region causing 
    evennsuperficial damage may result in chronic inflammation of the 
    skin andnhypertrophy of the papillæ, the first stage in the production 
    ofncondylomata.nnInjuries produced by cords slipped into the 
    interdigital space for thenpurpose of lifting the feet when shoeing 
    working oxen are also fruitfulncauses.nnInflammation of the 
    interdigital space is also a common complication ofnaphthous eruptions 
    around the claws and in the space between them.nContinual contact with 
    litter, dung and urine favour infection ofnsuperficial or deep wounds, 
    and by causing exuberant granulation lead tonhypertrophy of the papillary 
    layer of ',
      'metadata': {'source': 'cattle_disease.txt', 'chunk_index': 122400}}]

    Underneath 4 seconds to learn, chunk, retailer, and accurately question a 36000-line textual content doc is fairly good going.

    SciKit-Study: The Improve Path

    NumPy works properly for brute-force searches. However what in case you have dozens or tons of of paperwork, and brute-force is simply too sluggish? Earlier than switching to a vector database, you’ll be able to attempt SciKit-Study’s NearestNeighbors. It makes use of tree-based buildings like KD-Tree and Ball-Tree to hurry up searches to O(log N) as an alternative of O(N).

    To check this out, I downloaded a bunch of different books from Gutenberg, together with:-

    • A Christmas Carol by Charles Dickens
    • The Life and Adventures of Santa Claus by L. Frank Baum
    • Warfare and Peace by Tolstoy
    • A Farewell to Arms by Hemingway

    In complete, these books comprise round 120,000 traces of textual content. I copied and pasted all 5 enter guide recordsdata ten instances, leading to fifty recordsdata and 1.2 million traces of textual content. That’s round 12 million phrases, assuming a mean of 10 phrases per line. To offer some context, this text comprises roughly 2800 phrases, so the info quantity we’re testing with is equal to over 4000 instances the amount of this textual content.

    $ dir
    
    achristmascarol - Copy (2).txt  cattle_disease - Copy (9).txt  santa - Copy (6).txt
    achristmascarol - Copy (3).txt  cattle_disease - Copy.txt       santa - Copy (7).txt
    achristmascarol - Copy (4).txt  cattle_disease.txt                santa - Copy (8).txt
    achristmascarol - Copy (5).txt  farewelltoarms - Copy (2).txt  santa - Copy (9).txt
    achristmascarol - Copy (6).txt  farewelltoarms - Copy (3).txt  santa - Copy.txt
    achristmascarol - Copy (7).txt  farewelltoarms - Copy (4).txt  santa.txt
    achristmascarol - Copy (8).txt  farewelltoarms - Copy (5).txt  warandpeace - Copy (2).txt
    achristmascarol - Copy (9).txt  farewelltoarms - Copy (6).txt  warandpeace - Copy (3).txt
    achristmascarol - Copy.txt       farewelltoarms - Copy (7).txt  warandpeace - Copy (4).txt
    achristmascarol.txt                farewelltoarms - Copy (8).txt  warandpeace - Copy (5).txt
    cattle_disease - Copy (2).txt   farewelltoarms - Copy (9).txt  warandpeace - Copy (6).txt
    cattle_disease - Copy (3).txt   farewelltoarms - Copy.txt       warandpeace - Copy (7).txt
    cattle_disease - Copy (4).txt   farewelltoarms.txt                warandpeace - Copy (8).txt
    cattle_disease - Copy (5).txt   santa - Copy (2).txt           warandpeace - Copy (9).txt
    cattle_disease - Copy (6).txt   santa - Copy (3).txt           warandpeace - Copy.txt
    cattle_disease - Copy (7).txt   santa - Copy (4).txt           warandpeace.txt
    cattle_disease - Copy (8).txt   santa - Copy (5).txtLet's say we're ut

    Let’s say we had been finally on the lookout for a solution to the next query,

    Who, after the Christmas holidays, did Nicholas inform his mom of his love for?

    In case you didn’t know, this comes from the novel Warfare and Peace.

    Let’s see how our new search does in opposition to this massive physique of knowledge.

    Right here is the code utilizing SciKit-Study.

    First off, we’ve got a brand new class that implements SciKit-Study’s nearest Neighbour algorithm.

    from sklearn.neighbors import NearestNeighbors
    
    class ScikitVectorStore(SimpleVectorStore):
        def __init__(self, model_name='all-MiniLM-L6-v2'):
            tremendous().__init__(model_name)
            # Brute drive is commonly sooner than bushes for high-dimensional knowledge 
            # except N may be very massive, however 'ball_tree' can assist in particular circumstances.
            self.knn = NearestNeighbors(n_neighbors=5, metric='cosine', algorithm='brute')
            self.is_fit = False
    
        def build_index(self):
            print("Constructing Scikit-Study Index...")
            self.knn.match(self.embeddings)
            self.is_fit = True
    
        def search(self, question: str, ok: int = 5):
            if not self.is_fit: self.build_index()
            
            query_vec = self.encoder.encode([query])
            # Observe: Scikit-learn handles normalization internally for cosine metric 
            # if configured, however specific is healthier.
            
            distances, indices = self.knn.kneighbors(query_vec, n_neighbors=ok)
            
            outcomes = []
            for i in vary(ok):
                idx = indices[0][i]
                # Convert distance again to similarity rating (1 - dist)
                rating = 1 - distances[0][i]
                outcomes.append({
                    "rating": rating,
                    "textual content": self.paperwork[idx]['text']
                })
            return outcomes

    And our search code is simply so simple as for the NumPy model.

    %%time
    
    # 1. Initialize
    retailer = ScikitVectorStore()
    
    # 2. Load Paperwork
    real_docs = load_from_directory("/mnt/d/guide")
    
    # 3. Add to Retailer
    if real_docs:
       retailer.add_documents(real_docs)
    
    # 4. Search
    outcomes = retailer.search("Who, after the Christmas holidays, did Nicholas inform his mom of his love for", ok=1)
    
    outcomes

    And our output.

    Loading embedding mannequin: all-MiniLM-L6-v2...
    Loading paperwork from: /mnt/d/guide
    Efficiently loaded 73060 chunks from 50 recordsdata.
    Embedding 73060 paperwork...
    Retailer now comprises 73060 paperwork.
    Constructing Scikit-Study Index...
    CPU instances: consumer 1min 46s, sys: 18.3 s, complete: 2min 4s
    Wall time: 1min 13s
    
    [{'score': 0.6972659826278687,
      'text': 'nCHAPTER XIIInnSoon after the Christmas holidays Nicholas told 
    his mother of his lovenfor Sónya and of his firm resolve to marry her. The 
    countess, whonhad long noticed what was going on between them and was 
    expecting thisndeclaration, listened to him in silence and then told her son 
    that henmight marry whom he pleased, but that neither she nor his father 
    wouldngive their blessing to such a marriage. Nicholas, for the first time,
    nfelt that his mother was displeased with him and that, despite her loven
    for him, she would not give way. Coldly, without looking at her son,nshe 
    sent for her husband and, when he came, tried briefly and coldly toninform 
    him of the facts, in her son's presence, but unable to restrainnherself she 
    burst into tears of vexation and left the room. The oldncount began 
    irresolutely to admonish Nicholas and beg him to abandon hisnpurpose. 
    Nicholas replied that he could not go back on his word, and hisnfather, 
    sighing and evidently disconcerted, very soon became silent ',
      'metadata': {'source': 'warandpeace - Copy (6).txt',
       'chunk_index': 1396000}}]

    Virtually the entire 1m 13s it took to do the above processing was spent on loading and chunking our enter knowledge. The precise search half, after I ran it individually, took lower than one-tenth of a second!

    Not too shabby in any respect.

    Abstract

    I’m not arguing that Vector Databases will not be wanted. They clear up particular issues that NumPy and SciKit-Study don’t deal with. It is best to migrate from one thing like our SimpleVectorStore or ScikitVectorStore to Weaviate/Pinecone/pgvector, and so on, when any of the next situations apply.

    Persistence: You want knowledge to outlive a server restart with out rebuilding the index from supply recordsdata each time. Although np.save or pickling works for easy persistence. Engineering all the time includes trade-offs. Utilizing a vector database provides complexity to your setup in trade for scalability chances are you’ll not want proper now. When you begin with a extra simple RAG setup utilizing NumPy and/or SciKit-Study for the retrieval course of, you get:

    RAM is the bottleneck: Your embedding matrix exceeds your server’s reminiscence. Observe: 1 million vectors of 384 dimensions [float32] is just ~1.5GB of RAM, so you’ll be able to match quite a bit in reminiscence.

    CRUD frequency: You want to continually replace or delete particular person vectors whereas studying. NumPy arrays, for instance, are immutable, and appending requires copying the entire array, which is sluggish.

    Metadata Filtering: You want complicated queries like “Discover vectors close to X the place user_id=10 AND date > 2023”. Doing this in NumPy requires boolean masks that may get messy.

    Engineering all the time includes trade-offs. Utilizing a vector database provides complexity to your setup in trade for scalability chances are you’ll not want proper now. When you begin with a extra simple RAG setup utilizing NumPy and/or SciKit-Study for the retrieval course of, you get:

    • Decrease Latency. No community hops.
    • Decrease Prices. No SaaS subscriptions or additional cases.
    • Simplicity. It’s only a Python script.

    Simply as you don’t want a sports activities automobile to go to the grocery retailer. In lots of circumstances, NumPy or SciKit-Study could also be all of the RAG search you want.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWhich Hallucinates Less And How To Fix Both » Ofemwire
    Next Article The UK government is backing AI scientists that can run their own experiments
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Three OpenClaw Mistakes to Avoid and How to Fix Them

    March 9, 2026
    Artificial Intelligence

    I Stole a Wall Street Trick to Solve a Google Trends Data Problem

    March 9, 2026
    Artificial Intelligence

    Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

    March 9, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Why You Should Not Replace Blanks with 0 in Power BI

    June 20, 2025

    ChatGPT Agent, Grok 4, Meta Superintelligence Labs, Windsurf Drama, Kimi K2 & AI Browsers from OpenAI and Perplexity

    July 22, 2025

    From Pixels to Plots | Towards Data Science

    June 30, 2025

    How Much Data Is Needed to Train Successful ML Models in 2024?

    April 6, 2025

    Perplexity Labs lanserar projektassistenten Pro AI-suite

    May 30, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    A beginner’s guide to Tmux: a multitasking superpower for your terminal

    February 15, 2026

    Prompt Fidelity: Measuring How Much of Your Intent an AI Agent Actually Executes

    February 6, 2026

    A Multi-Agent SQL Assistant You Can Trust with Human-in-Loop Checkpoint & LLM Cost Control

    June 18, 2025
    Our Picks

    Three OpenClaw Mistakes to Avoid and How to Fix Them

    March 9, 2026

    I Stole a Wall Street Trick to Solve a Google Trends Data Problem

    March 9, 2026

    How AI is turning the Iran conflict into theater

    March 9, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.