Preprint · Independent Research

State, Not Tokens: Repository-Scale Agent Reasoning Is Bound by State Architecture

DOI License CC BY 4.0 Code MIT

Abstract

The agent community has largely treated repository-scale forgetting as a context-window problem: bigger windows (8k → 128k → 1M) are expected to yield better whole-repo reasoning. We argue this is a misdiagnosis. Using a hard, machine-checkable task (strict JavaScript→TypeScript migration of real OSS repositories under an unforgeable oracle: strict tsc, immutable test suites, mandatory .js.ts replacement, zero type-escape-hatches), we vary a single axis: how state flows between bounded workers. Three arms hold model, tools, scaffold, and oracle constant: a single-context monolith, a durable arm that accumulates each completed dependency layer as a committed artifact on a shared evolving tree, and a stateless-RAG arm whose per-file workers retrieve context but never see each other's results. On an independent third-party benchmark (NL2Repo-Bench), the same durable-state orchestration reaches a 91.1% mean test-pass rate, about 2.28× the published ~40% state of the art. The contribution is a reframing — state is an asset, not a prompt — with controls that isolate which capability actually matters.

Key findings

Paper