Close Menu
    Trending
    • Three OpenClaw Mistakes to Avoid and How to Fix Them
    • I Stole a Wall Street Trick to Solve a Google Trends Data Problem
    • How AI is turning the Iran conflict into theater
    • Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)
    • Machine Learning at Scale: Managing More Than One Model in Production
    • Improving AI models’ ability to explain their predictions | MIT News
    • Write C Code Without Learning C: The Magic of PythoC
    • LatentVLA: Latent Reasoning Models for Autonomous Driving
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » When (Not) to Use Vector DB
    Artificial Intelligence

    When (Not) to Use Vector DB

    ProfitlyAIBy ProfitlyAIDecember 16, 2025No Comments9 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    . They clear up an actual drawback, and in lots of instances, they’re the proper selection for RAG methods. However right here’s the factor: simply since you’re utilizing embeddings doesn’t imply you want a vector database.

    We’ve seen a rising pattern the place each RAG implementation begins by plugging in a vector DB. Which may make sense for large-scale, persistent information bases, however it’s not all the time probably the most environment friendly path, particularly when your use case is extra dynamic or time-sensitive.

    At Planck, we make the most of embeddings to reinforce LLM-based methods. Nonetheless, in one among our real-world functions, we opted to keep away from a vector database and as a substitute used a easy key-value retailer, which turned out to be a significantly better match.

    Earlier than I dive into that, let’s discover a easy, generalized model of our state of affairs to elucidate why.

    Foo Instance

    Let’s think about a easy RAG-style system. A person uploads just a few textual content information, possibly some studies or assembly notes. We break up these information into chunks, generate embeddings for every chunk, and use these embeddings to reply questions. The person asks a handful of questions over the following jiffy, then leaves. At that time, each the information and their embeddings are ineffective and could be safely discarded.

    In different phrases, the information is ephemeral, the person will ask solely a few questions, and we need to reply them as quick as doable.

    Now pause for a second and ask your self:

    The place ought to I retailer these embeddings?


    Most individuals’s intuition is: “I’ve embeddings, so I would like a vector database”, however pause for a second and take into consideration what’s truly occurring behind that abstraction. While you ship embeddings to a vector DB, it doesn’t simply “retailer” them. It builds an index that accelerates similarity searches. That indexing work is the place quite a lot of the magic comes from, and likewise the place quite a lot of the fee lives.

    In a long-lived, large-scale information base, this trade-off makes good sense: you pay an indexing price as soon as (or incrementally as knowledge adjustments), after which unfold that price over tens of millions of queries. In our Foo instance, that’s not what’s occurring. We’re doing the other: always including small, one-off batches of embeddings, answering a tiny variety of queries per batch, after which throwing every thing away.

    So the true query just isn’t “ought to I exploit a vector database?” however “is the indexing work value it?” To reply that, we will have a look at a easy benchmark.

    Benchmarking: No-Index Retrieval vs. Listed Retrieval

    Photograph by Julia Fiander on Unsplash

    This part is extra technical. We’ll have a look at Python code and clarify the underlying algorithms. If the precise implementation particulars aren’t related to you, be at liberty to skip forward to the Outcomes part.

    We need to evaluate two methods:

    1. No indexing in any respect, simply retains embeddings in reminiscence and scans them immediately.
    2. A vector database, the place we pay an indexing price upfront to make every question sooner.

    First, take into account the “no vector DB” method. When a question is available in, we compute similarities between the question embedding and all saved embeddings, then choose the top-k. That’s simply K-Nearest Neighbors with none index.

    import numpy as np
    
    def run_knn(embeddings: np.ndarray, query_embedding: np.ndarray, top_k: int) -> np.ndarray:
        sims = embeddings @ query_embedding
        return sims.argsort()[-top_k:][::-1]

    The code makes use of the dot product as a proxy for cosine similarity (assuming normalized vectors) and types the scores to search out the very best matches. It actually simply scans all vectors and picks the closest ones.

    Now, let’s have a look at what a vector DB usually does. Beneath the hood, most vector databases depend on an approximate nearest neighbor (ANN) index. ANN strategies commerce a little bit of accuracy for a big increase in search pace, and one of the extensively used algorithms for that is HNSW. We’ll use the hnswlib library to simulate the index habits.

    import numpy as np
    import hnswlib
    
    def create_hnsw_index(embeddings: np.ndarray, num_dims: int) -> hnswlib.Index:
        index = hnswlib.Index(area='cosine', dim=num_dims)
        index.init_index(max_elements=embeddings.form[0])
        index.add_items(embeddings)
        return index
    
    def query_hnsw(index: hnswlib.Index, query_embedding: np.ndarray, top_k: int) -> np.ndarray:
        labels, distances = index.knn_query(query_embedding, okay=top_k)
        return labels[0]

    To see the place the trade-off lands, we will generate some random embeddings, normalize them, and measure how lengthy every step takes:

    import time
    import numpy as np
    import hnswlib
    from tqdm import tqdm
    
    def run_benchmark(num_embeddings: int, num_dims: int, top_k: int, num_iterations: int) -> None:
        print(f"Benchmarking with {num_embeddings} embeddings of dimension {num_dims}, retrieving top-{top_k} nearest neighbors.")
    
        knn_times: checklist[float] = []
        index_times: checklist[float] = []
        hnsw_query_times: checklist[float] = []
    
        for _ in tqdm(vary(num_iterations), desc="Operating benchmark"):
            embeddings = np.random.rand(num_embeddings, num_dims).astype('float32')
            embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
            query_embedding = np.random.rand(num_dims).astype('float32')
            query_embedding = query_embedding / np.linalg.norm(query_embedding)
    
            start_time = time.time()
            run_knn(embeddings, query_embedding, top_k)
            knn_times.append((time.time() - start_time) * 1e3)
    
            start_time = time.time()
            vector_db_index = create_hnsw_index(embeddings, num_dims)
            index_times.append((time.time() - start_time) * 1e3)
    
            start_time = time.time()
            query_hnsw(vector_db_index, query_embedding, top_k)
            hnsw_query_times.append((time.time() - start_time) * 1e3)
    
        print(f"BENCHMARK RESULTS (averaged over {num_iterations} iterations)")
        print(f"[Naive KNN] Common search time with out indexing: {np.imply(knn_times):.2f} ms")
        print(f"[HNSW Index] Common index development time: {np.imply(index_times):.2f} ms")
        print(f"[HNSW Index] Common question time with indexing: {np.imply(hnsw_query_times):.2f} ms")
    
    run_benchmark(num_embeddings=50000, num_dims=1536, top_k=5, num_iterations=20)

    Outcomes

    On this instance, we use 50,000 embeddings with 1,536 dimensions (matching OpenAI’s text-embedding-3-small) and retrieve the top-5 neighbors. The precise outcomes will differ with totally different configs, however the sample we care about is similar.

    I encourage you to run the benchmark with your individual numbers, it’s one of the best ways to see how the trade-offs play out in your particular use case.

    On common, the naive KNN search takes 24.54 milliseconds per question. Constructing the HNSW index for a similar embeddings takes round 277 seconds. As soon as the index is constructed, every question takes about 0.47 milliseconds.

    From this, we will estimate the break-even level. The distinction between naive KNN and listed queries is 24.07 ms per question. That means you want 11,510 queries earlier than the time saved on every question compensates for the time spent constructing the index.

    Generated utilizing the benchmark code: A graph evaluating naive KNN and listed search effectivity

    Moreover, even with totally different values for the variety of embeddings and top-k, the break-even level stays within the 1000’s of queries and stays inside a reasonably slim vary. You don’t get a state of affairs the place indexing begins to repay after only a few dozen queries.

    Generated utilizing the benchmark code: A graph displaying break-even factors for varied embedding counts and top-k settings (picture by writer)

    Now evaluate that to the Foo instance. A person uploads a small set of information and asks just a few questions, not 1000’s. The system by no means reaches the purpose the place the index pays off. As a substitute, the indexing step merely delays the second when the system can reply the primary query and provides operational complexity.

    For this kind of short-lived, per-user context, the easy in-memory KNN method just isn’t solely simpler to implement and function, however additionally it is sooner end-to-end.

    If in-memory storage just isn’t an possibility, both as a result of the system is distributed or as a result of we have to protect the person’s state for a couple of minutes, we will use a key-value retailer like Redis. We will retailer a singular identifier for the person’s request as the important thing and retailer all of the embeddings as the worth.

    This offers us a light-weight, low-complexity answer that’s well-suited to our use case of short-lived, low-query contexts.

    Actual-World Instance: Why We Selected a Key-Worth Retailer

    Photograph by Gavin Allanwood on Unsplash

    At Planck, we reply insurance-related questions on companies. A typical request begins with a enterprise title and tackle, after which we retrieve real-time knowledge about that particular enterprise, together with its on-line presence, registrations, and different public information. This knowledge turns into our context, and we use LLMs and algorithms to reply questions primarily based on it.

    The vital bit is that each time we get a request, we generate a recent context. We’re not reusing current knowledge, it’s fetched on demand and stays related for a couple of minutes at most.

    In case you assume again to the sooner benchmark, this sample ought to already be triggering your “this isn’t a vector DB use case” sensor.

    Each time we obtain a request, we generate recent embeddings for short-lived knowledge that we’ll doubtless question only some hundred instances. Indexing these embeddings in a vector DB provides pointless latency. In distinction, with Redis, we will instantly retailer the embeddings and run a fast similarity search within the utility code with virtually no indexing delay.

    That’s why we selected Redis as a substitute of a vector database. Whereas vector DBs are glorious at dealing with giant volumes of embeddings and supporting quick nearest-neighbor queries, they introduce indexing overhead, and in our case, that overhead just isn’t value it.

    In Conclusion

    If you’ll want to retailer tens of millions of embeddings and help high-query workloads throughout a shared corpus, a vector DB could be a greater match. And sure, there are positively use instances on the market that actually want and profit from a vector DB.

    However simply since you’re utilizing embeddings or constructing a RAG system doesn’t imply you need to default to a vector DB.

    Every database know-how has its strengths and trade-offs. The only option begins with a deep understanding of your knowledge and use case, quite than mindlessly following the pattern.

    So, the following time you’ll want to select a database, pause for a second and ask: am I choosing the proper one primarily based on goal trade-offs, or am I simply going with the trendiest, shiniest selection?



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleCreating psychological safety in the AI era
    Next Article ChatGPT Images – den nya bildgeneratorn
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Three OpenClaw Mistakes to Avoid and How to Fix Them

    March 9, 2026
    Artificial Intelligence

    I Stole a Wall Street Trick to Solve a Google Trends Data Problem

    March 9, 2026
    Artificial Intelligence

    Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

    March 9, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Evaluating LLMs for Inference, or Lessons from Teaching for Machine Learning

    June 2, 2025

    How do AI models generate videos?

    September 12, 2025

    Boosting Your Anomaly Detection With LLMs

    September 4, 2025

    Enterprise AI: From Build-or-Buy to Partner-and-Grow

    April 23, 2025

    Can deep learning transform heart failure prevention? | MIT News

    April 5, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    I Made My AI Model 84% Smaller and It Got Better, Not Worse

    September 29, 2025

    New AI system could accelerate clinical research | MIT News

    September 25, 2025

    Agentic AI in Finance: Opportunities and Challenges for Indonesia

    October 22, 2025
    Our Picks

    Three OpenClaw Mistakes to Avoid and How to Fix Them

    March 9, 2026

    I Stole a Wall Street Trick to Solve a Google Trends Data Problem

    March 9, 2026

    How AI is turning the Iran conflict into theater

    March 9, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.