Search and Retrieval
Your agent has a library of thousands of notes, messages, and profiles. Search is how it finds the right books — using exact words, fuzzy spelling, or "things that feel related." Retrieval ranking is how it decides which books to actually read — the ones you've used recently and often glow brighter than the dusty ones in the back. Together, they keep the agent focused on what matters without drowning in old noise.
The search infrastructure is designed around two complementary problems: search (finding relevant content across all layers) and retrieval ranking (surfacing what matters most from the candidates). Search finds matches; retrieval ranking decides which matches deserve context window space.
Search: Three Tiers
Postgres handles all search through three complementary approaches, each covering gaps the others leave.
Full-Text Search (tsvector)
Postgres's built-in full-text search is fast, exact, and great at matching specific terms. When the agent searches for "Colima Docker Desktop," tsvector finds documents containing those words. It supports:
- Relevance ranking via
ts_rank(term frequency, proximity) - Phrase matching (
phraseto_tsqueryfor exact sequences) - Prefix queries (
to_tsquery('deploy:*')matches "deployment", "deploying") - Weight classes (A/B/C/D) for boosting title matches over body matches
The weakness is rigidity — tsvector performs lexical matching after stemming, so it won't find "container runtime" when you search for "Docker." That's where semantic search fills the gap.
Fuzzy Matching (pg_trgm)
The pg_trgm extension breaks strings into three-character subsequences (trigrams) and computes similarity scores between them. If the agent encounters "Spantreee" in a transcript or a user types "Cedrci," trigram similarity still finds the right records.
This is particularly valuable for name matching in the identity layer, where meeting transcripts often contain misspellings, informal names, or phonetic variations. A trigram similarity of 0.4+ is typically sufficient for name matching; exact identifier lookups handle the high-confidence cases.
-- Find people with names similar to a transcript speaker label
SELECT id, full_name, similarity(full_name, 'Cedrci Hurst') AS sim
FROM people
WHERE similarity(full_name, 'Cedrci Hurst') > 0.3
ORDER BY sim DESC;
Semantic Search (pgvector)
Semantic search uses embedding vectors to find conceptually similar content. A search for "who handles infrastructure at Acme" finds relevant person profiles even if they mention "DevOps" or "platform engineering" instead of "infrastructure." This is the most flexible search mode and the one agents use most often.
Embeddings are generated by language models (currently Gemini) and stored as vectors in Postgres via the pgvector extension. Similarity is computed using cosine distance (<=> operator), which measures the angle between vectors regardless of magnitude.
-- Semantic search with distance threshold
SELECT title, 1 - (embedding <=> query_embedding) AS similarity
FROM knowledge_entities
WHERE embedding <=> query_embedding < 0.3
ORDER BY similarity DESC
LIMIT 10;
Embeddings are generated incrementally. When a knowledge file is updated or a new activity record is ingested, its embedding is computed and stored. There's no batch reindexing step — updates are processed as they arrive.
Combined Search
The three tiers work best in combination. A single query can score candidates across all three and blend the results:
SELECT
title,
ts_rank(search_vector, query) AS text_rank,
1 - (embedding <=> query_embedding) AS semantic_similarity,
similarity(title, 'Sarah Chen') AS name_similarity
FROM knowledge_entities,
plainto_tsquery('english', 'Acme infrastructure') AS query
WHERE search_vector @@ query
OR embedding <=> query_embedding < 0.3
OR similarity(title, 'Sarah Chen') > 0.4
ORDER BY (text_rank * 0.3 + semantic_similarity * 0.5 + name_similarity * 0.2) DESC
LIMIT 10;
The weights (0.3, 0.5, 0.2) are tunable per query context. Identity lookups weight name similarity higher; knowledge discovery weights semantic similarity higher.
The agent reformulates on miss. If a search returns poor results, it rephrases the query, tries different tiers, or broadens scope. This agent-in-the-loop approach compensates for the limitations of any single method — there's no search UI where a human expects perfect results on the first try. The agent formulates queries programmatically, tries multiple approaches, and interprets results with language understanding.
Search by Layer
Different layers lean on different tiers depending on the data and access pattern:
| Layer | Primary search | Typical query |
|---|---|---|
| Episodic Memory | Semantic (Gemini embeddings) | "When did we decide to use Colima?" |
| Knowledge Base | All three + graph traversal | "Find people related to AI coding tools" |
| Activity Logs | Full-text + time-range filters | "What was discussed in #acme-project last Tuesday?" |
| Identity Graph | pg_trgm fuzzy + exact lookups | Match "Jeff" from a transcript to a known person |
Retrieval Ranking: Vitality and Decay
"Attention Is All You Need" could have been dismissed as overengineered in 2017 — why not just use RNNs with a simple hidden state? The answer was that uniform sequential processing doesn't scale. The same argument applies to agent memory. Naive retrieval — search everything, rank by text similarity — works fine at 50 notes. At 50,000 notes across four layers, undifferentiated retrieval wastes the most expensive resource in the system: context window tokens. A 6-month-old note about a resolved bug shouldn't compete equally with yesterday's architecture decision.
Vitality scoring is to agent memory what attention is to sequence processing — a principled mechanism for focusing on what matters. Both solve the same fundamental problem (selective focus over a large space), just at different timescales: attention operates within a single forward pass; vitality operates across an agent's lifetime.
| Transformers | Agent Memory | |
|---|---|---|
| Naive approach | RNN hidden state | Search everything, sort by date |
| Problem at scale | Long sequences lose early context | Large knowledge bases waste context tokens |
| Solution | Selective attention (Q/K/V) | Selective retrieval (vitality scoring) |
| Vindicated at scale | GPT-3/4 | Agents running for months/years |
Attaché's vitality model draws from ACT-R (Adaptive Control of Thought — Rational), a cognitive architecture developed by John Anderson at Carnegie Mellon University since the 1990s, and Ori-Mnemos, an open-source agent memory system that extends ACT-R with graph-aware features.
Base-Level Activation (ACT-R)
In ACT-R, every memory chunk has a base-level activation determined by how often and how recently it's been accessed. The formula models the well-established power law of forgetting:
Where:
- = number of times the chunk was accessed
- = time since the -th access (in days)
- = decay parameter (default )
The key insight: both frequency and recency matter. A note accessed 50 times decays much slower than one accessed once, even at the same age. This is fundamentally different from naive approaches like "sort by last modified date" or simple exponential decay — it produces the power law curve observed in human memory experiments.
Why logarithmic? The ln() wrapper means that doubling the number of accesses doesn't double the activation — it adds a constant. This matches human memory: the 100th exposure to a word adds much less memorability than the 2nd. The decay exponent ensures that recent accesses contribute far more than old ones, with producing a square-root decay curve.
For computational efficiency, ACT-R's own research provides an optimized O(1) approximation that avoids iterating over every access event:
Where is the note's lifetime in days and is the total access count. This requires only three stored values (access count, first access, last access) regardless of history size.
The raw activation value is normalized to a 0–1 vitality score via sigmoid:
Metabolic Rates
Not all memory should decay at the same speed. A person's identity doesn't fade like yesterday's standup notes. Attaché applies metabolic rate multipliers to the decay parameter, inspired by Ori-Mnemos's observation that different memory types have fundamentally different lifecycles:
| Layer | Metabolic Rate | Effective Decay | Behavior |
|---|---|---|---|
| Entity (people, orgs) | 0.1× | Identity barely fades — a person profile stays relevant for months | |
| Knowledge (research, projects) | 1.0× | Standard relevance-driven lifecycle | |
| Episodic (daily logs) | 2.0× | Recent context matters most — last week's notes outrank last month's | |
| Activity (messages, transcripts) | 3.0× | Burns hot, clears quickly — yesterday's Slack messages fade fast |
The metabolic rate multiplies the base decay parameter: . An entity with metabolic rate has an effective decay of — it takes roughly 10× longer to fade than a knowledge note.
Spreading Activation
When a note is accessed, its neighbors in the knowledge graph receive a vitality boost. This models the cognitive science concept of spreading activation: thinking about one topic primes related topics.
The boost propagates along wiki-link edges using breadth-first search:
Where is the source utility, is the damping factor (default ), and is the hop count:
- Hop 1 neighbors: 60% of source utility
- Hop 2 neighbors: 36% of source utility
Boosts are stored in Postgres and decayed on read (half-life ~7 days). When you access a note about "Acme," its linked notes about Sarah Chen, the migration plan, and the monitoring dashboard all warm up — even if they haven't been directly accessed recently.
This creates emergent behavior: clusters of actively-used notes form warm neighborhoods in the knowledge graph, while isolated, unused notes cool down naturally. Active projects pull their entire constellation of related entities into higher vitality.
Structural Protection
Some notes are structurally important even if rarely accessed. A project overview that connects 15 sub-notes is a bridge node — archiving it would fragment the graph and orphan its dependents.
Two mechanisms protect structural integrity:
Structural boost: Notes with high in-degree (many incoming links) decay slower. Each incoming link adds ~10% to effective stability, capped at 2×:
Bridge protection floor: Tarjan's bridge-finding algorithm identifies articulation points — notes whose removal would disconnect the graph. These get a minimum vitality floor (default 0.5) regardless of access patterns, preventing them from being archived.
Revival Spikes
Old notes that gain new connections get a revival spike — a 14-day boost that prevents newly-relevant dormant notes from being immediately archived:
Where is days since the new connection was established. This handles the case where a 6-month-old research note suddenly becomes relevant because a new project links to it.
Zone Classification
Notes are classified into zones based on their composite vitality score:
| Zone | Vitality | Behavior |
|---|---|---|
| Active | ≥ 0.6 | Fully accessible, prioritized in search results |
| Stale | 0.3 – 0.6 | Accessible but deprioritized in rankings |
| Fading | 0.1 – 0.3 | Candidate for archival, still searchable |
| Archived | < 0.1 | Moved to archive, excluded from default search |
Zone transitions are automatic. The prune operation analyzes the full activation topology and identifies archive candidates, with dry-run as the default — no silent deletions.
Integration with Search
Vitality scores are incorporated into search ranking as a multiplicative factor, applied after the three search tiers produce their scores:
SELECT title,
(text_rank * 0.3 + semantic_similarity * 0.5) * vitality AS final_score
FROM search_results
JOIN chunk_activation_cache USING (chunk_id)
ORDER BY final_score DESC;
Active notes rank higher than stale notes with the same textual or semantic match. Archived notes are excluded from default search but remain queryable with an explicit include_archived flag.
Implementation
Access events are stored in an append-only table:
CREATE TABLE memory_access_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chunk_type TEXT NOT NULL, -- 'episodic', 'knowledge', 'entity', 'activity'
chunk_id TEXT NOT NULL, -- permalink or entity ID
access_type TEXT NOT NULL, -- 'retrieval', 'read', 'reference', 'write'
session_id TEXT,
accessed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
A materialized activation cache avoids recomputing on every query:
CREATE TABLE chunk_activation_cache (
chunk_id TEXT PRIMARY KEY,
chunk_type TEXT NOT NULL,
access_count INTEGER NOT NULL DEFAULT 0,
first_accessed TIMESTAMPTZ NOT NULL,
last_accessed TIMESTAMPTZ NOT NULL,
base_activation REAL,
spreading_boost REAL DEFAULT 0,
structural_boost REAL DEFAULT 1.0,
vitality REAL, -- composite score
zone TEXT, -- 'active', 'stale', 'fading', 'archived'
computed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
At current scale (~200K chunks), the O(1) approximation computes activation per chunk trivially. Access events grow linearly (~50/day × 365 = ~18K rows/year). The cache approach means vitality can always be recomputed from the summary without scanning the full event log.
Attribution
- ACT-R (Anderson & Lebiere, Carnegie Mellon) — base-level activation, power law of forgetting, spreading activation, optimized learning approximation. LGPL v2.1.
- Ori-Mnemos (Aayo Awoyemi) — metabolic rates, structural protection via Tarjan's algorithm, zone classification, revival spikes. Apache-2.0.