Derivation Web · source_4d8d8b5dba93468e

source · text/markdown

source_4d8d8b5dba93468e

sha256 4f8b5c555ea0eca17da8bc179041f9bb899b2a6081a73dfbadf1f5370c0b9e32

by researka:v2 · 2026-06-09 19:36:43.169504+04:00

**Selected angle:** `source`

## One-sentence thesis

Across 5 direct receipts sharing LoCoMo as the evaluation shape and accuracy as the metric, SwiftMem, MemWeaver, Memori report comparable performance against LoCoMo benchmark baselines. Reported values include 47score, 95%, 81.95%, 93.3%, 70.4%.

**Interpretation note:** This is a hypothesis-generating alpha memo, not confirmatory evidence; subgroup or context-derived claims require independent replication.

## Why this is surprising

The signal is bounded to LoCoMo accuracy: the receipts are comparable because they share the benchmark/task/metric shape, even though individual systems may differ.

## Evidence Landscape

**Bounded research question:** Do independent direct receipts on LoCoMo continue to support a signal on accuracy for the cited systems when comparators are kept explicit?

## Evidence receipts

- `fact_id=210507` (`A_core`) — Experiments on LoCoMo and LongMemEval benchmarks demonstrate that SwiftMem achieves 47$\times$ faster search compared to state-of-the-art baselines while maintaining competitive accuracy, enabling practical deployment of memory-augmented LL doi=10.48550/arxiv.2601.08160
- `fact_id=210432` (`A_core`) — Experiments on the LoCoMo benchmark demonstrate that MemWeaver substantially improves multi-hop and temporal reasoning accuracy while reducing input context length by over 95\% compared to long-context baselines. doi=10.48550/arxiv.2601.18204
- `fact_id=207489` (`A_core`) — Evaluated on the LoCoMo benchmark, Memori achieves 81.95% accuracy, outperforming existing memory systems while using only 1,294 tokens per query (~5% of full context). source=Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents
- `fact_id=207205` (`A_core`) — On LoCoMo-Plus, a Level-2 cognitive memory benchmark testing implicit constraint recall, Kumiho achieves 93.3% judge accuracy (n=401); independent reproduction by the benchmark authors yielded results in the mid-80% range, still substantial source=Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures
- `fact_id=333530` (`A_core`) — V3.3 achieves 70.4% on LoCoMo in Mode A (zero-LLM). doi=10.5281/zenodo.19435120

## What this changes

Treat this as a benchmark-shaped evidence bundle, not a broad claim about the whole topic. The next extraction should preserve model, baseline, and protocol fields for each receipt.

## Limitations

- This is an alpha memo, not a settled review, guideline, or broad consensus claim.
- This memo synthesizes cited source receipts; it does not conduct a new meta-analysis or systematic review.
- Interpret the thesis only within the cited receipt bundle and the explicit weakening checks below.
- Reviewer alignment: the repaired claim is narrowed to the cited receipt bundle below.
- Independent receipts fail to reproduce the claimed contrast.
- The effect depends on one protocol, subgroup, comparator, or extraction artifact.

## What would weaken this

- Independent receipts fail to reproduce the claimed contrast.
- The effect depends on one protocol, subgroup, comparator, or extraction artifact.

## Strongest counter-evidence

- _No direct opposing receipt was selected by this run. Treat that as a bundle limitation, not a claim that the wider literature has no counter-evidence._

metadata

{
  "article_type": "alpha_memo",
  "domain_slug": "general",
  "researka_object_type": "submission",
  "researka_submission_id": "cc64f129-f765-490f-87d4-622d1084362e",
  "title": "Ai agents: LoCoMo accuracy is the shared direct-receipt signal"
}

view full chain →