Operations

Observability

Track recall quality and service health together; latency alone is not enough for memory systems.

Metrics that matter

Quality metrics should be first-class dashboards, not hidden in offline scripts.

  • Ingest accepted/deduplicated/invalid counts
  • Semantic query latency p50/p95/p99
  • Lexical-only hit share vs hybrid share
  • Superseded facts filtered per query
  • Index reconciliation backlog

Structured logging

Include enough retrieval internals in logs to debug ranking anomalies without sampling full payload text.

Log fields
{
  "request_id": "req_7f1d",
  "route": "/query/semantic",
  "entity_id": "user-123",
  "semantic_candidates": 40,
  "lexical_candidates": 15,
  "latency_ms": 32
}

Alerting

  • Sustained p95 latency breach
  • Spike in invalid ingest payload ratio
  • Sharp drop in recall@k against canary eval set
  • Index mismatch detected by repair scanner