Search and Retrieval

TL;DR

Your agent has a library of thousands of notes, messages, and profiles. Search is how it finds the right books — using exact words, fuzzy spelling, or "things that feel related." Retrieval ranking is how it decides which books to actually read — the ones you've used recently and often glow brighter than the dusty ones in the back. Together, they keep the agent focused on what matters without drowning in old noise.

The search infrastructure is designed around two complementary problems: search (finding relevant content across all layers) and retrieval ranking (surfacing what matters most from the candidates). Search finds matches; retrieval ranking decides which matches deserve context window space.

Search: Three Tiers

Postgres handles all search through three complementary approaches, each covering gaps the others leave.

Full-Text Search (tsvector)

Postgres's built-in full-text search is fast, exact, and great at matching specific terms. When the agent searches for "Colima Docker Desktop," tsvector finds documents containing those words. It supports:

Relevance ranking via ts_rank (term frequency, proximity)
Phrase matching (phraseto_tsquery for exact sequences)
Prefix queries (to_tsquery('deploy:*') matches "deployment", "deploying")
Weight classes (A/B/C/D) for boosting title matches over body matches

The weakness is rigidity — tsvector performs lexical matching after stemming, so it won't find "container runtime" when you search for "Docker." That's where semantic search fills the gap.

Fuzzy Matching (pg_trgm)

The pg_trgm extension breaks strings into three-character subsequences (trigrams) and computes similarity scores between them. If the agent encounters "Spantreee" in a transcript or a user types "Cedrci," trigram similarity still finds the right records.

This is particularly valuable for name matching in the identity layer, where meeting transcripts often contain misspellings, informal names, or phonetic variations. A trigram similarity of 0.4+ is typically sufficient for name matching; exact identifier lookups handle the high-confidence cases.

-- Find people with names similar to a transcript speaker label
SELECT id, full_name, similarity(full_name, 'Cedrci Hurst') AS sim
FROM people
WHERE similarity(full_name, 'Cedrci Hurst') > 0.3
ORDER BY sim DESC;

Semantic Search (pgvector)

Semantic search uses embedding vectors to find conceptually similar content. A search for "who handles infrastructure at Acme" finds relevant person profiles even if they mention "DevOps" or "platform engineering" instead of "infrastructure." This is the most flexible search mode and the one agents use most often.

Embeddings are generated by language models (currently Gemini) and stored as vectors in Postgres via the pgvector extension. Similarity is computed using cosine distance (<=> operator), which measures the angle between vectors regardless of magnitude.

-- Semantic search with distance threshold
SELECT title, 1 - (embedding <=> query_embedding) AS similarity
FROM knowledge_entities
WHERE embedding <=> query_embedding < 0.3
ORDER BY similarity DESC
LIMIT 10;

Embeddings are generated incrementally. When a knowledge file is updated or a new activity record is ingested, its embedding is computed and stored. There's no batch reindexing step — updates are processed as they arrive.

Combined Search

The three tiers work best in combination. A single query can score candidates across all three and blend the results:

SELECT
  title,
  ts_rank(search_vector, query) AS text_rank,
  1 - (embedding <=> query_embedding) AS semantic_similarity,
  similarity(title, 'Sarah Chen') AS name_similarity
FROM knowledge_entities,
  plainto_tsquery('english', 'Acme infrastructure') AS query
WHERE search_vector @@ query
   OR embedding <=> query_embedding < 0.3
   OR similarity(title, 'Sarah Chen') > 0.4
ORDER BY (text_rank * 0.3 + semantic_similarity * 0.5 + name_similarity * 0.2) DESC
LIMIT 10;

The weights (0.3, 0.5, 0.2) are tunable per query context. Identity lookups weight name similarity higher; knowledge discovery weights semantic similarity higher.

The agent reformulates on miss. If a search returns poor results, it rephrases the query, tries different tiers, or broadens scope. This agent-in-the-loop approach compensates for the limitations of any single method — there's no search UI where a human expects perfect results on the first try. The agent formulates queries programmatically, tries multiple approaches, and interprets results with language understanding.

Search by Layer

Different layers lean on different tiers depending on the data and access pattern:

Layer	Primary search	Typical query
Episodic Memory	Semantic (Gemini embeddings)	"When did we decide to use Colima?"
Knowledge Base	All three + graph traversal	"Find people related to AI coding tools"
Activity Logs	Full-text + time-range filters	"What was discussed in #acme-project last Tuesday?"
Identity Graph	pg_trgm fuzzy + exact lookups	Match "Jeff" from a transcript to a known person

Retrieval Ranking: Vitality and Decay

"Attention Is All You Need" could have been dismissed as overengineered in 2017 — why not just use RNNs with a simple hidden state? The answer was that uniform sequential processing doesn't scale. The same argument applies to agent memory. Naive retrieval — search everything, rank by text similarity — works fine at 50 notes. At 50,000 notes across five layers, undifferentiated retrieval wastes the most expensive resource in the system: context window tokens. A 6-month-old note about a resolved bug shouldn't compete equally with yesterday's architecture decision.

Vitality scoring is to agent memory what attention is to sequence processing — a principled mechanism for focusing on what matters. Both solve the same fundamental problem (selective focus over a large space), just at different timescales: attention operates within a single forward pass; vitality operates across an agent's lifetime.

	Transformers	Agent Memory
Naive approach	RNN hidden state	Search everything, sort by date
Problem at scale	Long sequences lose early context	Large knowledge bases waste context tokens
Solution	Selective attention (Q/K/V)	Selective retrieval (vitality scoring)
Vindicated at scale	GPT-3/4	Agents running for months/years

Evie Platform's vitality model draws from ACT-R (Adaptive Control of Thought — Rational), a cognitive architecture developed by John Anderson at Carnegie Mellon University since the 1990s, and Ori-Mnemos, an open-source agent memory system that extends ACT-R with graph-aware features.

Base-Level Activation (ACT-R)

In ACT-R, every memory chunk has a base-level activation determined by how often and how recently it's been accessed. The formula models the well-established power law of forgetting:

B_i = \ln\left(\sum_{j=1}^{n} t_j^{-d}\right)

Where:

$n$ = number of times the chunk was accessed
$t_j$ = time since the $j$ -th access (in days)
$d$ = decay parameter (default $0.5$ )

The key insight: both frequency and recency matter. A note accessed 50 times decays much slower than one accessed once, even at the same age. This is fundamentally different from naive approaches like "sort by last modified date" or simple exponential decay — it produces the power law curve observed in human memory experiments.

Why logarithmic? The ln() wrapper means that doubling the number of accesses doesn't double the activation — it adds a constant. This matches human memory: the 100th exposure to a word adds much less memorability than the 2nd. The decay exponent $t^{-d}$ ensures that recent accesses contribute far more than old ones, with $d = 0.5$ producing a square-root decay curve.

For computational efficiency, ACT-R's own research provides an optimized O(1) approximation that avoids iterating over every access event:

B_i \approx \ln\left(\frac{n}{1-d}\right) - d \cdot \ln(L)

Where $L$ is the note's lifetime in days and $n$ is the total access count. This requires only three stored values (access count, first access, last access) regardless of history size.

The raw activation value is normalized to a 0–1 vitality score via sigmoid:

\text{vitality} = \frac{1}{1 + e^{-B_i}}

Metabolic Rates

Not all memory should decay at the same speed. A person's identity doesn't fade like yesterday's standup notes. Evie Platform applies metabolic rate multipliers to the decay parameter, inspired by Ori-Mnemos's observation that different memory types have fundamentally different lifecycles:

Layer	Metabolic Rate	Effective Decay	Behavior
Entity (people, orgs)	0.1×	$d = 0.05$	Identity barely fades — a person profile stays relevant for months
Knowledge (research, projects)	1.0×	$d = 0.5$	Standard relevance-driven lifecycle
Episodic (daily logs)	2.0×	$d = 1.0$	Recent context matters most — last week's notes outrank last month's
Activity (messages, transcripts)	3.0×	$d = 1.5$	Burns hot, clears quickly — yesterday's Slack messages fade fast

The metabolic rate multiplies the base decay parameter: $d_{\text{eff}} = d \times m$ . An entity with metabolic rate $m = 0.1$ has an effective decay of $0.05$ — it takes roughly 10× longer to fade than a knowledge note.

Spreading Activation

When a note is accessed, its neighbors in the knowledge graph receive a vitality boost. This models the cognitive science concept of spreading activation: thinking about one topic primes related topics.

The boost propagates along wiki-link edges using breadth-first search:

\text{boost}(k) = u \cdot \alpha^k

Where $u$ is the source utility, $\alpha$ is the damping factor (default $0.6$ ), and $k$ is the hop count:

Hop 1 neighbors: 60% of source utility
Hop 2 neighbors: 36% of source utility

Boosts are stored in Postgres and decayed on read (half-life ~7 days). When you access a note about "Acme," its linked notes about Sarah Chen, the migration plan, and the monitoring dashboard all warm up — even if they haven't been directly accessed recently.

This creates emergent behavior: clusters of actively-used notes form warm neighborhoods in the knowledge graph, while isolated, unused notes cool down naturally. Active projects pull their entire constellation of related entities into higher vitality.

Structural Protection

Some notes are structurally important even if rarely accessed. A project overview that connects 15 sub-notes is a bridge node — archiving it would fragment the graph and orphan its dependents.

Two mechanisms protect structural integrity:

Structural boost: Notes with high in-degree (many incoming links) decay slower. Each incoming link adds ~10% to effective stability, capped at 2×:

s = 1 + 0.1 \cdot \min(\text{in\_degree}, 10)

Bridge protection floor: Tarjan's bridge-finding algorithm identifies articulation points — notes whose removal would disconnect the graph. These get a minimum vitality floor (default 0.5) regardless of access patterns, preventing them from being archived.

Revival Spikes

Old notes that gain new connections get a revival spike — a 14-day boost that prevents newly-relevant dormant notes from being immediately archived:

r = e^{-0.2 \cdot \Delta t}

Where $\Delta t$ is days since the new connection was established. This handles the case where a 6-month-old research note suddenly becomes relevant because a new project links to it.

Zone Classification

Notes are classified into zones based on their composite vitality score:

Zone	Vitality	Behavior
Active	≥ 0.6	Fully accessible, prioritized in search results
Stale	0.3 – 0.6	Accessible but deprioritized in rankings
Fading	0.1 – 0.3	Candidate for archival, still searchable
Archived	< 0.1	Moved to archive, excluded from default search

Zone transitions are automatic. The prune operation analyzes the full activation topology and identifies archive candidates, with dry-run as the default — no silent deletions.

Integration with Search

Vitality scores are incorporated into search ranking as a multiplicative factor, applied after the three search tiers produce their scores:

SELECT title,
  (text_rank * 0.3 + semantic_similarity * 0.5) * vitality AS final_score
FROM search_results
JOIN chunk_activation_cache USING (chunk_id)
ORDER BY final_score DESC;

Active notes rank higher than stale notes with the same textual or semantic match. Archived notes are excluded from default search but remain queryable with an explicit include_archived flag.

Implementation

Access events are stored in an append-only table:

CREATE TABLE memory_access_events (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  chunk_type  TEXT NOT NULL,       -- 'episodic', 'knowledge', 'entity', 'activity'
  chunk_id    TEXT NOT NULL,       -- permalink or entity ID
  access_type TEXT NOT NULL,       -- 'retrieval', 'read', 'reference', 'write'
  session_id  TEXT,
  accessed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

A materialized activation cache avoids recomputing on every query:

CREATE TABLE chunk_activation_cache (
  chunk_id        TEXT PRIMARY KEY,
  chunk_type      TEXT NOT NULL,
  access_count    INTEGER NOT NULL DEFAULT 0,
  first_accessed  TIMESTAMPTZ NOT NULL,
  last_accessed   TIMESTAMPTZ NOT NULL,
  base_activation REAL,
  spreading_boost REAL DEFAULT 0,
  structural_boost REAL DEFAULT 1.0,
  vitality        REAL,            -- composite score
  zone            TEXT,            -- 'active', 'stale', 'fading', 'archived'
  computed_at     TIMESTAMPTZ NOT NULL DEFAULT now()
);

At current scale (~200K chunks), the O(1) approximation computes activation per chunk trivially. Access events grow linearly (~50/day × 365 = ~18K rows/year). The cache approach means vitality can always be recomputed from the summary without scanning the full event log.

Attribution

ACT-R (Anderson & Lebiere, Carnegie Mellon) — base-level activation, power law of forgetting, spreading activation, optimized learning approximation. LGPL v2.1.
Ori-Mnemos (Aayo Awoyemi) — metabolic rates, structural protection via Tarjan's algorithm, zone classification, revival spikes. Apache-2.0.

Search: Three Tiers​

Full-Text Search (tsvector)​

Fuzzy Matching (pg_trgm)​

Semantic Search (pgvector)​

Combined Search​

Search by Layer​

Retrieval Ranking: Vitality and Decay​

Base-Level Activation (ACT-R)​

Metabolic Rates​

Spreading Activation​

Structural Protection​

Revival Spikes​

Zone Classification​

Integration with Search​

Implementation​

Attribution​