Operations
Observability
Track recall quality and service health together; latency alone is not enough for memory systems.
Metrics that matter
Quality metrics should be first-class dashboards, not hidden in offline scripts.
- Ingest accepted/deduplicated/invalid counts
- Semantic query latency p50/p95/p99
- Lexical-only hit share vs hybrid share
- Superseded facts filtered per query
- Index reconciliation backlog
Structured logging
Include enough retrieval internals in logs to debug ranking anomalies without sampling full payload text.
{
"request_id": "req_7f1d",
"route": "/query/semantic",
"entity_id": "user-123",
"semantic_candidates": 40,
"lexical_candidates": 15,
"latency_ms": 32
}Alerting
- Sustained p95 latency breach
- Spike in invalid ingest payload ratio
- Sharp drop in recall@k against canary eval set
- Index mismatch detected by repair scanner