Aletheia is currently under development. For early access,contact us.

LongMemEval-S Benchmark

Public Benchmarks

Transparent, reproducible evaluation of Aletheia against industry leaders on standard agent memory benchmarks.

90.5%

Overall Accuracy

<200ms

P95 Retrieval Latency

+61pt

vs Mem0 Overall

#2

Overall Ranking

LongMemEval-S Results (%)

ModelOverallSingle SessionTemporalPreferencesKnowledge UpdatesMulti-Session
Aletheia90.5%98.0%88.3%95.2%96.1%74.8%
HydraDB90.8%100.0%91.0%96.7%97.4%76.7%
Zep71.2%92.9%62.4%56.7%83.3%57.9%
Mem029.1%38.7%25.6%40.0%52.6%20.3%

Overall Score Comparison

Aletheia
90.5%
HydraDB
90.8%
Zep
71.2%
Mem0
29.1%

Methodology

  • Dataset: LongMemEval-S benchmark — 6 categories across single/multi-session recall, temporal reasoning, preference extraction, knowledge updates
  • Hardware: All tests run on equivalent cloud instances (4 vCPU, 16 GB RAM)
  • Evaluation code: Open source at github.com/sharjeel619/aletheia
  • Competitor results sourced from publicly published benchmarks and our own evaluation rig. HydraDB results from hydradb.com/benchmarks. Zep results from getzep.com benchmarks. Mem0 results from published baselines.
  • Last updated: May 2026. Run them yourself: cargo run --release --bench longmemeval