Active Memory

A context memory layer for AI agents built around one core observation: human memory works through selective attention, not total recall.

View on GitHub →

The Problem

Most AI agents either exhaust their context window mid-conversation or start every session completely blank. Active Memory sits between your code and your model call, quietly solving both problems without requiring changes to your model, stack, or prompts.

Architecture

The architecture is modeled directly on cognitive science. Four components, each mapped to a biological analogue.

The Bucket — Working Memory

Mirrors working memory — the roughly four to seven things a human holds in active attention at once. The agent only ever sees the Bucket: a structured topic summary, a recent message window, and semantically retrieved associations. Bounded context isn't a limitation here, it's the design.

The Observer — Reticular Activating System

Functions like the reticular activating system — the brain's pre-conscious filter for what reaches awareness. It watches the conversation and maintains a living topic tree, merging new information into existing nodes or creating new ones, and annotating each with staleness so the model gets a natural sense of temporal distance without parsing raw timestamps.

The Curator — Hippocampus

Maps to the hippocampus — the structure responsible for deciding what gets encoded into long-term memory. It runs silently after each response, invisible to the user, evaluating what just aged out of the recent window. It never participates in conversation. It only judges.

The Librarian — Sleep Consolidation

Models sleep consolidation. During downtime it promotes memories that keep getting retrieved back to warm storage, demotes ones that have gone cold, and prunes what hasn't been accessed in a long time. Forgetting is an explicit feature, not a failure mode — it's the mechanism that keeps the system sharp.

Retrieval

Retrieval is associative rather than keyword-based, using vector similarity to surface semantically related memories automatically when relevant — the same way a smell or a word fires related memories in humans without a deliberate search.

Backends & Tooling

Provider-agnostic — supports Anthropic, Ollama, and Hugging Face backends. Ships with an Inspector dashboard that runs the full pipeline against a long-conversation benchmark and streams every memory operation live into a browser, useful both for debugging and for demonstrating what each layer is doing in real time.

View source on GitHub →