Skip to main content

Search and Retrieval

TL;DR

Your agent has a library of thousands of notes, messages, and profiles. Search is how it finds the right books — using exact words, fuzzy spelling, or "things that feel related." Retrieval ranking is how it decides which books to actually read — the ones you've used recently and often glow brighter than the dusty ones in the back. Together, they keep the agent focused on what matters without drowning in old noise.

The search infrastructure is designed around two complementary problems: search (finding relevant content across all layers) and retrieval ranking (surfacing what matters most from the candidates). Search finds matches; retrieval ranking decides which matches deserve context window space.

Search: Three Tiers

Postgres handles all search through three complementary approaches, each covering gaps the others leave.

Full-Text Search (tsvector)

Postgres's built-in full-text search is fast, exact, and great at matching specific terms. When the agent searches for "Colima Docker Desktop," tsvector finds documents containing those words. It supports:

  • Relevance ranking via ts_rank (term frequency, proximity)
  • Phrase matching (phraseto_tsquery for exact sequences)
  • Prefix queries (to_tsquery('deploy:*') matches "deployment", "deploying")
  • Weight classes (A/B/C/D) for boosting title matches over body matches

The weakness is rigidity — tsvector performs lexical matching after stemming, so it won't find "container runtime" when you search for "Docker." That's where semantic search fills the gap.

Fuzzy Matching (pg_trgm)

The pg_trgm extension breaks strings into three-character subsequences (trigrams) and computes similarity scores between them. If the agent encounters "Spantreee" in a transcript or a user types "Cedrci," trigram similarity still finds the right records.

This is particularly valuable for name matching in the identity layer, where meeting transcripts often contain misspellings, informal names, or phonetic variations. A trigram similarity of 0.4+ is typically sufficient for name matching; exact identifier lookups handle the high-confidence cases.

-- Find people with names similar to a transcript speaker label
SELECT id, full_name, similarity(full_name, 'Cedrci Hurst') AS sim
FROM people
WHERE similarity(full_name, 'Cedrci Hurst') > 0.3
ORDER BY sim DESC;

Semantic Search (pgvector)

Semantic search uses embedding vectors to find conceptually similar content. A search for "who handles infrastructure at Acme" finds relevant person profiles even if they mention "DevOps" or "platform engineering" instead of "infrastructure." This is the most flexible search mode and the one agents use most often.

Embeddings are generated by language models (currently Gemini) and stored as vectors in Postgres via the pgvector extension. Similarity is computed using cosine distance (<=> operator), which measures the angle between vectors regardless of magnitude.

-- Semantic search with distance threshold
SELECT title, 1 - (embedding <=> query_embedding) AS similarity
FROM knowledge_entities
WHERE embedding <=> query_embedding < 0.3
ORDER BY similarity DESC
LIMIT 10;

Embeddings are generated incrementally. When a knowledge file is updated or a new activity record is ingested, its embedding is computed and stored. There's no batch reindexing step — updates are processed as they arrive.

The three tiers work best in combination. A single query can score candidates across all three and blend the results:

SELECT
title,
ts_rank(search_vector, query) AS text_rank,
1 - (embedding <=> query_embedding) AS semantic_similarity,
similarity(title, 'Sarah Chen') AS name_similarity
FROM knowledge_entities,
plainto_tsquery('english', 'Acme infrastructure') AS query
WHERE search_vector @@ query
OR embedding <=> query_embedding < 0.3
OR similarity(title, 'Sarah Chen') > 0.4
ORDER BY (text_rank * 0.3 + semantic_similarity * 0.5 + name_similarity * 0.2) DESC
LIMIT 10;

The weights (0.3, 0.5, 0.2) are tunable per query context. Identity lookups weight name similarity higher; knowledge discovery weights semantic similarity higher.

The agent reformulates on miss. If a search returns poor results, it rephrases the query, tries different tiers, or broadens scope. This agent-in-the-loop approach compensates for the limitations of any single method — there's no search UI where a human expects perfect results on the first try. The agent formulates queries programmatically, tries multiple approaches, and interprets results with language understanding.

Search by Layer

Different layers lean on different tiers depending on the data and access pattern:

LayerPrimary searchTypical query
Episodic MemorySemantic (Gemini embeddings)"When did we decide to use Colima?"
Knowledge BaseAll three + graph traversal"Find people related to AI coding tools"
Activity LogsFull-text + time-range filters"What was discussed in #acme-project last Tuesday?"
Identity Graphpg_trgm fuzzy + exact lookupsMatch "Jeff" from a transcript to a known person

Retrieval Ranking: Vitality and Decay

"Attention Is All You Need" could have been dismissed as overengineered in 2017 — why not just use RNNs with a simple hidden state? The answer was that uniform sequential processing doesn't scale. The same argument applies to agent memory. Naive retrieval — search everything, rank by text similarity — works fine at 50 notes. At 50,000 notes across four layers, undifferentiated retrieval wastes the most expensive resource in the system: context window tokens. A 6-month-old note about a resolved bug shouldn't compete equally with yesterday's architecture decision.

Vitality scoring is to agent memory what attention is to sequence processing — a principled mechanism for focusing on what matters. Both solve the same fundamental problem (selective focus over a large space), just at different timescales: attention operates within a single forward pass; vitality operates across an agent's lifetime.

TransformersAgent Memory
Naive approachRNN hidden stateSearch everything, sort by date
Problem at scaleLong sequences lose early contextLarge knowledge bases waste context tokens
SolutionSelective attention (Q/K/V)Selective retrieval (vitality scoring)
Vindicated at scaleGPT-3/4Agents running for months/years

Attaché's vitality model draws from ACT-R (Adaptive Control of Thought — Rational), a cognitive architecture developed by John Anderson at Carnegie Mellon University since the 1990s, and Ori-Mnemos, an open-source agent memory system that extends ACT-R with graph-aware features.

Base-Level Activation (ACT-R)

In ACT-R, every memory chunk has a base-level activation determined by how often and how recently it's been accessed. The formula models the well-established power law of forgetting:

Bi=ln(j=1ntjd)B_i = \ln\left(\sum_{j=1}^{n} t_j^{-d}\right)

Where:

  • nn = number of times the chunk was accessed
  • tjt_j = time since the jj-th access (in days)
  • dd = decay parameter (default 0.50.5)

The key insight: both frequency and recency matter. A note accessed 50 times decays much slower than one accessed once, even at the same age. This is fundamentally different from naive approaches like "sort by last modified date" or simple exponential decay — it produces the power law curve observed in human memory experiments.

Why logarithmic? The ln() wrapper means that doubling the number of accesses doesn't double the activation — it adds a constant. This matches human memory: the 100th exposure to a word adds much less memorability than the 2nd. The decay exponent tdt^{-d} ensures that recent accesses contribute far more than old ones, with d=0.5d = 0.5 producing a square-root decay curve.

For computational efficiency, ACT-R's own research provides an optimized O(1) approximation that avoids iterating over every access event:

Biln(n1d)dln(L)B_i \approx \ln\left(\frac{n}{1-d}\right) - d \cdot \ln(L)

Where LL is the note's lifetime in days and nn is the total access count. This requires only three stored values (access count, first access, last access) regardless of history size.

The raw activation value is normalized to a 0–1 vitality score via sigmoid:

vitality=11+eBi\text{vitality} = \frac{1}{1 + e^{-B_i}}

Metabolic Rates

Not all memory should decay at the same speed. A person's identity doesn't fade like yesterday's standup notes. Attaché applies metabolic rate multipliers to the decay parameter, inspired by Ori-Mnemos's observation that different memory types have fundamentally different lifecycles:

LayerMetabolic RateEffective DecayBehavior
Entity (people, orgs)0.1×d=0.05d = 0.05Identity barely fades — a person profile stays relevant for months
Knowledge (research, projects)1.0×d=0.5d = 0.5Standard relevance-driven lifecycle
Episodic (daily logs)2.0×d=1.0d = 1.0Recent context matters most — last week's notes outrank last month's
Activity (messages, transcripts)3.0×d=1.5d = 1.5Burns hot, clears quickly — yesterday's Slack messages fade fast

The metabolic rate multiplies the base decay parameter: deff=d×md_{\text{eff}} = d \times m. An entity with metabolic rate m=0.1m = 0.1 has an effective decay of 0.050.05 — it takes roughly 10× longer to fade than a knowledge note.

Spreading Activation

When a note is accessed, its neighbors in the knowledge graph receive a vitality boost. This models the cognitive science concept of spreading activation: thinking about one topic primes related topics.

The boost propagates along wiki-link edges using breadth-first search:

boost(k)=uαk\text{boost}(k) = u \cdot \alpha^k

Where uu is the source utility, α\alpha is the damping factor (default 0.60.6), and kk is the hop count:

  • Hop 1 neighbors: 60% of source utility
  • Hop 2 neighbors: 36% of source utility

Boosts are stored in Postgres and decayed on read (half-life ~7 days). When you access a note about "Acme," its linked notes about Sarah Chen, the migration plan, and the monitoring dashboard all warm up — even if they haven't been directly accessed recently.

This creates emergent behavior: clusters of actively-used notes form warm neighborhoods in the knowledge graph, while isolated, unused notes cool down naturally. Active projects pull their entire constellation of related entities into higher vitality.

Structural Protection

Some notes are structurally important even if rarely accessed. A project overview that connects 15 sub-notes is a bridge node — archiving it would fragment the graph and orphan its dependents.

Two mechanisms protect structural integrity:

Structural boost: Notes with high in-degree (many incoming links) decay slower. Each incoming link adds ~10% to effective stability, capped at 2×:

s=1+0.1min(in_degree,10)s = 1 + 0.1 \cdot \min(\text{in\_degree}, 10)

Bridge protection floor: Tarjan's bridge-finding algorithm identifies articulation points — notes whose removal would disconnect the graph. These get a minimum vitality floor (default 0.5) regardless of access patterns, preventing them from being archived.

Revival Spikes

Old notes that gain new connections get a revival spike — a 14-day boost that prevents newly-relevant dormant notes from being immediately archived:

r=e0.2Δtr = e^{-0.2 \cdot \Delta t}

Where Δt\Delta t is days since the new connection was established. This handles the case where a 6-month-old research note suddenly becomes relevant because a new project links to it.

Zone Classification

Notes are classified into zones based on their composite vitality score:

ZoneVitalityBehavior
Active≥ 0.6Fully accessible, prioritized in search results
Stale0.3 – 0.6Accessible but deprioritized in rankings
Fading0.1 – 0.3Candidate for archival, still searchable
Archived< 0.1Moved to archive, excluded from default search

Zone transitions are automatic. The prune operation analyzes the full activation topology and identifies archive candidates, with dry-run as the default — no silent deletions.

Vitality scores are incorporated into search ranking as a multiplicative factor, applied after the three search tiers produce their scores:

SELECT title,
(text_rank * 0.3 + semantic_similarity * 0.5) * vitality AS final_score
FROM search_results
JOIN chunk_activation_cache USING (chunk_id)
ORDER BY final_score DESC;

Active notes rank higher than stale notes with the same textual or semantic match. Archived notes are excluded from default search but remain queryable with an explicit include_archived flag.

Implementation

Access events are stored in an append-only table:

CREATE TABLE memory_access_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chunk_type TEXT NOT NULL, -- 'episodic', 'knowledge', 'entity', 'activity'
chunk_id TEXT NOT NULL, -- permalink or entity ID
access_type TEXT NOT NULL, -- 'retrieval', 'read', 'reference', 'write'
session_id TEXT,
accessed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

A materialized activation cache avoids recomputing on every query:

CREATE TABLE chunk_activation_cache (
chunk_id TEXT PRIMARY KEY,
chunk_type TEXT NOT NULL,
access_count INTEGER NOT NULL DEFAULT 0,
first_accessed TIMESTAMPTZ NOT NULL,
last_accessed TIMESTAMPTZ NOT NULL,
base_activation REAL,
spreading_boost REAL DEFAULT 0,
structural_boost REAL DEFAULT 1.0,
vitality REAL, -- composite score
zone TEXT, -- 'active', 'stale', 'fading', 'archived'
computed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

At current scale (~200K chunks), the O(1) approximation computes activation per chunk trivially. Access events grow linearly (~50/day × 365 = ~18K rows/year). The cache approach means vitality can always be recomputed from the summary without scanning the full event log.

Attribution

  • ACT-R (Anderson & Lebiere, Carnegie Mellon) — base-level activation, power law of forgetting, spreading activation, optimized learning approximation. LGPL v2.1.
  • Ori-Mnemos (Aayo Awoyemi) — metabolic rates, structural protection via Tarjan's algorithm, zone classification, revival spikes. Apache-2.0.