Operations

Observability

Track recall quality and service health together; latency alone is not enough for memory systems.

Metrics that matter

Quality metrics should be first-class dashboards, not hidden in offline scripts.

Ingest accepted/deduplicated/invalid counts
Semantic query latency p50/p95/p99
Lexical-only hit share vs hybrid share
Superseded facts filtered per query
Index reconciliation backlog

Structured logging

Include enough retrieval internals in logs to debug ranking anomalies without sampling full payload text.

Log fields

{
  "request_id": "req_7f1d",
  "route": "/query/semantic",
  "entity_id": "user-123",
  "semantic_candidates": 40,
  "lexical_candidates": 15,
  "latency_ms": 32
}

Alerting

Sustained p95 latency breach
Spike in invalid ingest payload ratio
Sharp drop in recall@k against canary eval set
Index mismatch detected by repair scanner