Hybrid Retrieval for Exact and Semantic Recall
Why production memory systems need both semantic search and lexical retrieval instead of treating them as substitutes.
Semantic search is excellent until someone asks for a ticket number, a short product code, or a quoted phrase that must match exactly.
Lexical search is excellent until someone asks the same question with different wording.
Production memory systems need both.
The false choice
Teams often frame retrieval design as a choice:
- vector search for meaning
- keyword search for exactness
That framing is too narrow. Real-world memory queries regularly contain both semantic and exact-term intent.
Examples:
- "What did the user say about the Enterprise Pro plan?"
- "Show me the memory about ACME-4921."
- "When did we decide to replace Redis with Valkey?"
Each query mixes fuzzy meaning with exact anchors.
What lexical retrieval still does better
Lexical systems remain strong at:
- identifiers
- dates
- quoted strings
- names with rare spelling
- acronyms and short tokens
Those are exactly the places where retrieval mistakes feel obviously wrong to the user.
What semantic retrieval still does better
Semantic systems remain strong at:
- paraphrased questions
- indirect references
- concept similarity
- sparse wording where exact overlap is weak
That matters when the user remembers the idea but not the exact phrasing.
The production pattern
A robust retrieval stack usually does something like this:
semantic candidates + lexical candidates -> fusion -> reranking
The fusion step broadens recall. The reranking step restores precision.
Why agent memory benefits even more
Agent memory queries are unusually diverse. They can contain:
- explicit user facts
- long-form conversation history
- internal IDs
- evolving preferences
- time references
No single scoring method handles all of those well.
The mistake to avoid
Do not treat lexical retrieval as a legacy fallback. In many systems, it is the reason exact facts remain discoverable at all.
The best retrieval stacks are not semantically pure. They are operationally correct.
Hybrid retrieval in Aletheia
Aletheia implements the full fusion and reranking pipeline described here. The engine combines HNSW vector search with BM25 lexical scoring, then applies cross-encoder reranking and temporal ranking before returning results. The architecture documentation explains how these layers fit together.
For agent memory specifically, hybrid retrieval pairs with fact supersession to ensure that exact identifiers and temporal truth are both preserved. The ingestion pipeline docs describe how raw text flows through neural extraction before reaching the hybrid indexes.
That means combining multiple signals and letting the ranking system decide which evidence matters most for the current query.