Hybrid Retrieval for Exact and Semantic Recall

Why production memory systems need both semantic search and lexical retrieval instead of treating them as substitutes.

Hybrid RetrievalSemantic SearchBM25

Semantic search is excellent until someone asks for a ticket number, a short product code, or a quoted phrase that must match exactly.

Lexical search is excellent until someone asks the same question with different wording.

Production memory systems need both.

The false choice

Teams often frame retrieval design as a choice:

vector search for meaning
keyword search for exactness

That framing is too narrow. Real-world memory queries regularly contain both semantic and exact-term intent.

Examples:

"What did the user say about the Enterprise Pro plan?"
"Show me the memory about ACME-4921."
"When did we decide to replace Redis with Valkey?"

Each query mixes fuzzy meaning with exact anchors.

What lexical retrieval still does better

Lexical systems remain strong at:

identifiers
dates
quoted strings
names with rare spelling
acronyms and short tokens

Those are exactly the places where retrieval mistakes feel obviously wrong to the user.

What semantic retrieval still does better

Semantic systems remain strong at:

paraphrased questions
indirect references
concept similarity
sparse wording where exact overlap is weak

That matters when the user remembers the idea but not the exact phrasing.

The production pattern

A robust retrieval stack usually does something like this:

semantic candidates + lexical candidates -> fusion -> reranking

The fusion step broadens recall. The reranking step restores precision.

Why agent memory benefits even more

Agent memory queries are unusually diverse. They can contain:

explicit user facts
long-form conversation history
internal IDs
evolving preferences
time references

No single scoring method handles all of those well.

The mistake to avoid

Do not treat lexical retrieval as a legacy fallback. In many systems, it is the reason exact facts remain discoverable at all.

The best retrieval stacks are not semantically pure. They are operationally correct.

Hybrid retrieval in Aletheia

Aletheia implements the full fusion and reranking pipeline described here. The engine combines HNSW vector search with BM25 lexical scoring, then applies cross-encoder reranking and temporal ranking before returning results. The architecture documentation explains how these layers fit together.

For agent memory specifically, hybrid retrieval pairs with fact supersession to ensure that exact identifiers and temporal truth are both preserved. The ingestion pipeline docs describe how raw text flows through neural extraction before reaching the hybrid indexes.

That means combining multiple signals and letting the ranking system decide which evidence matters most for the current query.

Browse the journal

The false choice

What lexical retrieval still does better

What semantic retrieval still does better

The production pattern

Why agent memory benefits even more

The mistake to avoid

Hybrid retrieval in Aletheia

Related posts

OpenAI-Compatible Memory Proxy: Drop-In Persistent Memory for Existing Agents

Knowledge Graph Memory for AI Agents: Why Relationships Matter as Much as Facts

AI Agent Memory at Scale: From Prototype to Production