aka BM25 + Dense, Lexical + Semantic Retrieval
Combine sparse lexical retrieval (BM25) with dense vector retrieval and fuse the results.
A coding-assistant searches its codebase for 'how do we authenticate with Stripe?' Pure semantic search misses files that mention 'stripe-api-key' verbatim; pure keyword search misses files that talk about 'payment processor authentication'. Hybrid search runs both at once: a keyword scorer catches the exact tokens, an embedding scorer catches the conceptual matches, and a fusion step blends the two ranked lists.
Queries vary: some are keyword-matchable (exact identifiers, names), others are semantic; pure dense or pure sparse retrieval misses one or the other.
Dense retrieval misses exact matches; sparse retrieval misses paraphrase. Each alone leaves recall on the table.
Index the corpus twice: BM25 for sparse, dense embeddings for semantic. At query time, retrieve top-k from each, fuse with Reciprocal Rank Fusion or weighted aggregation. Pass the fused top-N forward (typically into a reranker). Do not weight raw scores directly; use rank-based fusion (RRF) or score-normalised aggregation, since BM25 and dense scores live on incompatible scales.
flowchart LR Q[Query] --> S[Sparse retrieval BM25] Q --> D[Dense retrieval embeddings] S --> F[Fusion / Reciprocal Rank Fusion] D --> F F --> R[Top-k results]
Hybrid Search runs sparse and dense retrievers in parallel and fuses their rankings before downstream stages.
Each retriever produces a ranked list; final score for a doc is the sum of 1/(k+rank) across lists. Tunes via the constant k (~60 in published work).
Each retriever returns scores normalised to [0,1]; final score is a tunable convex combination (e.g. 0.4 * sparse + 0.6 * dense).
A classifier inspects the query and routes to sparse OR dense retrieval (not both). 'How do I X' goes dense; 'order #4471' goes sparse.
specialises → naive-ragcomposes-with → cross-encoder-rerankingcomposes-with → contextual-retrieval