In modern Retrieval-Augmented Generation (RAG) systems, hybrid search strategies combine multiple retrieval methods to enhance accuracy. The architecture typically integrates dense vectors, sparse vectors, full-text search, and tensor-based re-ranking mechanisms.
Why Multi-Path Recall Matters Traditional dense vector search alone often fails to deliver satisfactory results when query keywords don't precisely match stored data. This limitation arises because vector representations capture probabilistic semantic relationships rather than exact term matches. For instance, querying about a company's 2024 Q1 investment portfolio might incorrectly return marketing plans or operational data from the same period.
To address this, hybrid search combines full-text search for precise keyword matching with vector search for semantic understanding. Another approach introduces sparse vectors that complement dense vectors by representing documents as weighted keyword sets. Unlike dense vectors, sparse vectors use position-value pairs {position: value} to represent 30,000+ dimensional vectors where most positions are zero.
[0.2, 0.3, 0.5, 0.7 ,...]
{331: 0.5, 14136: 0.7}
SPLADE is a prominent framework that generates 30,000-dimensional sparse vectors by removing common stopwords and expanding terminology. This approach outperforms BM25 ranking in standard information retrieval tasks. Recent studies show that combining sparse and dense vectors significantly improves performance compared to BM25 alone.
Three-Way Retrieval Strategy While two-path approaches work well, real-world scenarios require three-way retrieval: full-text search + dense vectors + sparse vectors. IBM's research comparing combinations like BM25+dense, dense+sparse, and BM25+dense+sparse confirms that three-path retrieval yields optimal results. Each component serves a distinct purpose:
- Dense vectors capture semantic meaning
- Sparse vectors enable precise keyword recall
- Full-text search handles phrase queries and specialized terminology
This approach increases engineering complexity, requiring synchronization between multiple databases (e.g., vector DB + Elasticsearch). Infinity database simplifies this by supporting all three data types in a single system with ACID compliance.
Ranking Fusion Techniques Infinity implements three fusion ranking algorithms:
- RRF (Reciprocal Rank Fusion): Assigns scores based on reciprocal rankings (1st=1, 2nd=0.5, etc.)
- Weighted Sum: Adjusts weights for different retrieval paths
- Colbert Re-ranking: Uses contextualized late interaction for final scoring
Example implementations:
results = db.query()
.vector_search('vec_col', [3.0, 2.8, 2.7, 3.1], 'ip')
.sparse_search('sparse_col', {indices: [0,10,20], values: [0.1,0.2,0.3]})
.text_search('content', 'hello world')
.fusion('rrf', k=60)
.execute()
results = db.query()
.vector_search('vec_col', [3.0, 2.8, 2.7, 3.1], 'ip')
.tensor_search('tensor_col', [[0.0,-10.0,0.0,0.7],[9.2,45.6,-55.8,3.5]], 'maxsim')
.text_search('content', 'hello world')
.fusion('weighted_sum', weights=[0.8,0.2])
.execute()
Tensor Data Type Implementation Infinity v0.2 introduces tensor data type for ColBERT integration. Tensors enable efficient MaxSim calculations through:
- Binary quantization (1/32 space reduction)
- EMVB index (SIMD acceleration)
- Segment-wise processing for long documents
Performance Evaluation on MLDR dataset shows:
- Hybrid search (full-text + dense + sparse) improves nDCG by 23% over pure vector search
- Adding ColBERT re-ranking further enhances results by 15%
- Tensor-based indexing provides cost-effective high-quality retrieval
These advancements make Infinity a comprehensive solution for enterprise RAG systems requiring both scalability and precision.