You Should Probably Be Using Hybrid Search

RAG

In our journey through building effective Retrieval-Augmented Generation (RAG) systems, we've discussed chunking strategies and embedding models. Now let’s discuss a fundamental part of the R in RAG.

So many RAG prototypes implement a single retrieval method, usually pure semantic (vector) search if built in the last two years or so. Traditional keyword search is often already implemented in the places you want to build RAG and so occasionally a generation step is tacked on to that. While each has its strengths, relying solely on one approach is a common pitfall that leads to ineffective retrieval. Let's explore why, and why hybrid search strategies consistently win out.

Semantic Search

Vector search, powered by embedding models, is often the canonical method for retrieval in RAG. It aims to understand the meaning behind a query. The idea is that the query and the target passage containing the answer should be semantically similar because they are talking about the same thing.

Strengths

  • Understands synonyms, related concepts, and user intent.

  • Suited for natural language, conversational queries.

  • Can find relevant information even with different phrasing.

Weaknesses

  • Keyword Blindness: Struggles with specific, literal terms like product IDs, error codes, acronyms, or unique names that don't have strong semantic overlap with related concepts in the model's training data.

  • Embedding Dependency: Its effectiveness is entirely dependent on the quality and domain-specificity of the underlying embedding model. A suboptimal model leads to suboptimal semantic search.

  • Subtle Nuance Issues: Can sometimes retrieve chunks that are conceptually similar but miss the specific entity or detail the user actually asked for.

  • Example Failure: A user asks, "What were the findings of 'Operation Chimera'?" Semantic search might retrieve documents about mythical creatures or general operational reports if the embedding model doesn't strongly associate the specific code name "Operation Chimera" with its corresponding documents.

Keyword Search

Usually in RAG systems keyword search is handled by methods like BM25 or TF-IDF, which match the literal words in the query to words in the documents. This lexical matching approach works by counting word frequencies in documents (TF - Term Frequency) and then adjusting for how common words are across all documents (IDF - Inverse Document Frequency). BM25 extends TF-IDF by including document length in the score calculation.

Strengths

  • Excellent at finding exact matches for specific keywords, names, codes, acronyms, and identifiers.

  • Fast, efficient, and well-established technology.

  • Not dependent on complex embedding models.

Weaknesses

  • Lack of Understanding: Doesn't grasp synonyms, context, or user intent. "Laptop" won't match "notebook." "How to secure a web server" won't necessarily find documents about "hardening Apache."

  • Requires Precise Queries: Users often need to guess the exact terminology used in the documents.

  • Context Insensitivity: Can retrieve documents where keywords appear frequently but in an irrelevant context.

  • Example Failure: A user asks, "Tell me about ways to mitigate risks when deploying new code." Keyword search might return documents mentioning "risks" and "code" but completely miss valuable resources discussing "deployment safety protocols" or "canary release strategies" because the exact keywords aren't present.

Hybrid Search

As the examples show, semantic and keyword search have complementary strengths and weaknesses. Relying on only one means you inherit its limitations, leading to missed information and irrelevant results depending on the query type.

  • Queries needing conceptual understanding benefit most from semantic search.

  • Queries needing specific entity matching benefit most from keyword search.

Since users will inevitably ask both types of questions, a single-method approach is inherently brittle.

Hybrid search explicitly acknowledges that no single method is perfect and combines the strengths of multiple approaches.The most common approach runs a semantic (vector) search and a keyword search (e.g., BM25) in parallel for the same query. The results from both are then combined using a fusion algorithm.

Reciprocal Rank Fusion (RRF) is a popular technique that scores results based on their rank in each list, mitigating issues with incomparable scores from different systems. Other methods involve weighting scores or simple interleaving.

Hybrid search gets the best of both worlds. It can handle the intent behind a query like "ways to reduce software vulnerabilities" (semantic) while also precisely finding documents mentioning the specific vulnerability identifier "CVE-2024-12345" (keyword).

Misc Techniques To Keep In Mind

Query Routing

An alternative to hybrid search is to do query routing. This involves looking at the query, usually with an LLM but smaller specific classifiers can be trained for this too, and determining which search method is most suitable. While adding latency and complexity this step can prevent unnecessary embedding for clearly lexically intended queries.

Query Rewriting

Another technique one can apply before searching with the raw user query is to apply some transformation to it.This might look like using a fast LLM to rewrite the query to better match the target corpus, replacing random words in the query with synonyms and running multiple searches or even converting the query into a semantic search suited query (or set of queries) and a lexically suited query.

Metadata Extraction

A huge boost for search systems is to infer filters from the query. If your search systems allow for pre-filtering this can lead to latency gains whilst also providing more relevant results. So often a query implies some relevant time-range, “What were the outcomes of our landing page design discussion last month?” Metadata extraction allows you to intelligently filter down to just the results that matter.

Reranking

Reranking is certainly worthy of it’s own post but the idea is to apply a second relevance scoring stage after the initial search. In our hybrid search case the first stage search would retrieve a reasonably sized set of candidate chunks (e.g., top 20-50 to “cast a wide net). The, a more sophisticated, and often more computationally expensive, cross-encoder model then examines the query and each candidate chunk pair, assigning a relevance score. The passages are then reranked based on this score.

  • The Benefit: Rerankers can capture fine-grained relevance details that the initial retrieval might miss, further improving the quality of context provided to the LLM.

  • The Drawback: A second step means increased latency - the improved ranking is not always worth the increased wait time for results.

We’ll expand more on reranking at a later date, it’s certainly not all roses in that particular garden.

Conclusion

Effective retrieval is the engine of your RAG system. While semantic search is powerful, relying on it alone often leads to disappointment when specific terms or entities are crucial. Similarly, keyword search alone lacks the necessary understanding for many natural language queries. Hybrid search, by combining the strengths of both semantic and keyword retrieval, offers a significantly more robust, versatile, and effective solution. It ensures you're better equipped to handle the diverse ways users search for information, ultimately leading to more relevant context and higher-quality generated responses from your LLM. Consider hybrid search not just an option, but a foundational element for production-grade RAG.

Previous
Previous

Choosing Embedding Models for RAG

Next
Next

Why Chunking Matters for Effective RAG