Cross-Encoder Reranking

Reranking improves top-k relevance by scoring query and passage jointly.

Where reranking fits

Use semantic + lexical retrieval for broad candidate generation, then apply reranking to a small candidate set. This gives better precision without full-corpus cross-encoding cost.

Reranking is most useful for ambiguous or compositional queries.

Candidate budgeting

Retrieve 30-100 candidates from fusion stage.
Rerank top 20-40 for latency-sensitive workloads.
Expose a per-request override for evaluation runs.

Rerank config

reranking:
  enabled: true
  model: cross-encoder/ms-marco-MiniLM-L-6-v2
  max_candidates: 32

When to disable

Disable reranking for strict low-latency paths where lexical exact-match dominates query value, or when running tiny local benchmarks focused only on ingestion correctness.