Memory Plugin
@opencode-manager/memory is an optional OpenCode plugin that stores and recalls project knowledge across sessions using vector embeddings and semantic search.
Not Required
This plugin is entirely optional. OpenCode Manager works fully without it — install it only if you want persistent project knowledge and semantic search capabilities.
Works with Standalone OpenCode
This plugin can also be used with standalone OpenCode installations outside of OpenCode Manager. Simply install the package and add it to your opencode.json plugins array.
Installation
The local embedding model (all-MiniLM-L6-v2) is downloaded automatically via the postinstall script. For API-based embeddings (OpenAI or Voyage), skip the local model and set your provider and API key in the configuration instead.
Then register the plugin in your opencode.json:
Configuration
On first run, the plugin copies a bundled config.json to the global data directory:
~/.local/share/opencode/memory/config.json- Falls back to
$XDG_DATA_HOME/opencode/memory/config.json
The file is only created if it does not already exist. The config is validated on load — if it fails validation, defaults are used automatically.
Full Default Config
{
"embedding": {
"provider": "local",
"model": "all-MiniLM-L6-v2",
"dimensions": 384,
"baseUrl": "",
"apiKey": ""
},
"dedupThreshold": 0.25,
"logging": {
"enabled": false,
"debug": false,
"file": ""
},
"compaction": {
"customPrompt": true,
"maxContextTokens": 4000
},
"memoryInjection": {
"enabled": true,
"debug": false,
"maxTokens": 2000,
"cacheTtlMs": 30000
},
"messagesTransform": {
"enabled": true,
"debug": false
},
"executionModel": ""
}
API-Based Embedding Example
Embedding Providers
| Provider | Models | API Key Required |
|---|---|---|
local |
all-MiniLM-L6-v2 (384d) |
No |
openai |
text-embedding-3-small (1536d), text-embedding-3-large (3072d), text-embedding-ada-002 (1536d) |
Yes |
voyage |
voyage-code-3 (1024d), voyage-2 (1536d) |
Yes |
Set baseUrl to point at any OpenAI-compatible self-hosted service (vLLM, Ollama, LocalAI, LiteLLM, text-embeddings-inference). The URL is automatically normalized — providing http://localhost:11434 appends /v1/embeddings.
All Options
| Key | Description | Default |
|---|---|---|
embedding.provider |
local, openai, or voyage |
local |
embedding.model |
Model name | all-MiniLM-L6-v2 |
embedding.dimensions |
Vector dimensions (auto-detected for known models) | — |
embedding.apiKey |
API key for OpenAI/Voyage | — |
embedding.baseUrl |
Custom endpoint for self-hosted services | — |
embedding.serverGracePeriod |
Time (ms) before idle embedding server shuts down | 30000 |
dedupThreshold |
Similarity threshold for deduplication (0.05–0.40) | 0.25 |
logging.enabled |
Write logs to file | false |
logging.debug |
Enable debug-level log output | false |
logging.file |
Log file path (resolves to ~/.local/share/opencode/memory/logs/memory.log when empty, 10MB limit, auto-rotated) |
— |
compaction.customPrompt |
Use optimized compaction prompt for session continuity | true |
compaction.maxContextTokens |
Max tokens for injected memory context | 4000 |
memoryInjection.enabled |
Inject relevant memories into user messages via semantic search | true |
memoryInjection.debug |
Enable debug logging for memory injection | false |
memoryInjection.maxResults |
Max vector search results to retrieve | 5 |
memoryInjection.distanceThreshold |
Max vector distance for relevance filtering (lower = stricter) | 0.5 |
memoryInjection.maxTokens |
Token budget for injected <project-memory> block |
2000 |
memoryInjection.cacheTtlMs |
Cache TTL (ms) for identical query results | 30000 |
messagesTransform.enabled |
Enable the messages transform hook (memory injection + Architect enforcement) | true |
messagesTransform.debug |
Enable debug logging for messages transform | false |
executionModel |
Model override for plan execution sessions (provider/model). Falls back to OpenCode's default model. |
— |
Architecture
The plugin is composed of several subsystems that work together:
┌──────────────────────────────────────────────────┐
│ Memory Plugin │
├─────────┬──────────┬───────────┬─────────────────┤
│ Tools │ Agents │ Hooks │ Compaction │
├─────────┴──────────┴───────────┴─────────────────┤
│ Memory Service │
├──────────────┬────────────────┬───────────────────┤
│ Embedding │ Vec Search │ Cache │
│ Service │ (sqlite-vec) │ (In-Memory) │
├──────────────┴────────────────┬───────────────────┤
│ KV Service │ Auto-Cleanup │
│ (ephemeral state + TTL) │ (30min interval)│
├──────────────┴────────────────┴───────────────────┤
│ SQLite Database (WAL) │
│ memories | metadata | project_kv (TTL indexed) │
└──────────────────────────────────────────────────┘
Storage Layer
The plugin uses a single SQLite database in WAL mode with four tables:
| Table | Purpose |
|---|---|
memories |
Stores all memory records with scope, content, access tracking |
plugin_metadata |
Tracks the active embedding model and dimensions for drift detection |
project_kv |
Stores ephemeral key-value pairs with TTL expiration (auto-cleaned every 30 minutes) |
SQLite pragmas are tuned for concurrent access:
journal_mode=WAL— concurrent reads during writesbusy_timeout=5000— wait up to 5s on lock contentionsynchronous=NORMAL— balanced durability and performance
KV Store
The KV store provides ephemeral project state management with automatic TTL-based expiration:
- Key-Value Storage: Store arbitrary JSON values under string keys, scoped by project ID
- TTL Management: Each entry has a configurable expiration time (default 24 hours)
- Auto-Cleanup: Background cleanup runs every 30 minutes to remove expired entries
- Graceful Degradation:
get()andlist()methods handle malformed JSON gracefully - Use Cases: Planning progress, code review patterns, session context, temporary state
The KV service is initialized on plugin startup and begins its cleanup interval automatically. Call kvService.destroy() during cleanup to stop the interval.
Vector Search
Vector similarity search is powered by sqlite-vec, a SQLite extension. The vec service:
- Initializes lazily after the database is ready
- Falls back to a no-op service if the extension is unavailable (search still works via exact match, just without semantic ranking)
- Supports insert, delete, search, and similarity-threshold queries
- Scoped by project ID for multi-project isolation
Embedding Subsystem
The embedding system has three provider types and a shared server architecture:
Local Provider
Uses @huggingface/transformers to run all-MiniLM-L6-v2 locally. The model is loaded lazily on first use with a warmup hint at plugin initialization.
Shared Embedding Server
When using the local provider, the plugin runs a shared Unix socket server (embedding.sock) that:
- Loads the model once into memory
- Serves embedding requests to multiple plugin instances via Unix domain socket
- Uses reference counting — clients send
connect/disconnectmessages - Auto-shuts down after a configurable grace period (default 30s) when the last client disconnects
- Uses PID files and startup locks to prevent duplicate server instances
- Falls back to in-process embedding if the server fails to start
This architecture means the model is loaded once regardless of how many OpenCode sessions are running.
API Provider
Supports OpenAI and Voyage embedding APIs:
- Batch processing in chunks of 100 texts
- Automatic URL normalization for self-hosted endpoints
- Bearer token authentication
Embedding Cache
All embeddings are cached in memory using SHA-256 content hashes. Cache entries expire after 24 hours. This prevents redundant API calls or model inference for identical content.
Embedding Sync
On startup, the plugin checks for memories that lack embeddings (e.g., from a model change or failed previous embedding) and backfills them automatically:
- Processes in batches of 50
- Retries failed embeddings up to 3 times
- Stops early if an entire batch fails (prevents infinite loops)
- Caps at 100 iterations to bound startup time
Auto-Validation
After the vec service initializes, the plugin compares the configured embedding model/dimensions against what's stored in plugin_metadata. If there's a mismatch (model drift), it automatically triggers a reindex — no manual memory-health reindex needed.
Memory Model
Scopes
Every memory belongs to exactly one scope:
| Scope | Purpose | Examples |
|---|---|---|
convention |
Rules and patterns to follow | "Use named imports only", "Tests use describe/it blocks" |
decision |
Architectural choices with rationale | "Chose SQLite over PostgreSQL for simplicity" |
context |
Reference information | "Entry point is src/index.ts", "Prices stored as integers" |
Fields
Each memory record contains:
| Field | Description |
|---|---|
id |
Auto-incrementing integer primary key |
projectId |
The OpenCode project this memory belongs to |
scope |
convention, decision, or context |
content |
The memory text |
filePath |
Optional file path reference |
accessCount |
How many times this memory has been read |
lastAccessedAt |
Timestamp of last access |
createdAt |
Creation timestamp |
updatedAt |
Last modification timestamp |
Deduplication
Before storing a new memory, the plugin:
- Checks for an exact content match in the same project
- Computes vector similarity against all existing project memories
- Skips the write if similarity exceeds
dedupThreshold(default 0.25) - Uses a transaction with double-check locking to prevent race conditions
When deduplication triggers, the existing memory's ID is returned instead of creating a duplicate.
Tools
The plugin registers thirteen tools that the AI agent can call:
memory-read
Search and retrieve project memories.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | No | Semantic search query |
scope |
enum | No | Filter by convention, decision, or context |
limit |
number | No | Max results (default: 10) |
When query is provided, results are ranked by vector similarity. Without query, memories are listed in chronological order. Access counts are tracked for every read.
memory-write
Store a new project memory with automatic deduplication.
| Parameter | Type | Required | Description |
|---|---|---|---|
content |
string | Yes | The memory content to store |
scope |
enum | Yes | convention, decision, or context |
Returns the memory ID and whether deduplication matched an existing memory.
memory-edit
Update the content or scope of an existing memory. Re-embeds the content if changed.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
number | Yes | Memory ID to update |
content |
string | Yes | New content |
scope |
enum | No | New scope (keeps existing if omitted) |
memory-delete
Soft-delete a memory by ID. The memory must exist or an error is returned.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
number | Yes | Memory ID to delete |
memory-health
Check plugin health or trigger a reindex of all embeddings.
| Parameter | Type | Required | Description |
|---|---|---|---|
action |
enum | No | check (default) or reindex |
Check returns:
- Overall status:
ok,degraded, orerror - Embedding provider status and operational state
- Shared embedding server status (running, client count, uptime)
- Database health and total memory count
- Configured vs. indexed model comparison
- Whether a reindex is needed
Reindex regenerates all embeddings with the configured model:
- Verifies the provider is operational before starting
- Processes memories in batches of 50
- Updates the
plugin_metadatatable on success - Reports total, success, and failure counts
Model Changes Require Reindex
If you change embedding.model or embedding.dimensions, existing embeddings will have mismatched dimensions. Auto-validation handles this on startup, but you can also trigger it manually with memory-health reindex.
memory-plan-execute
Create a new Code session and send an implementation plan as the first prompt. Designed to be called by the Architect agent after the user approves a plan.
| Parameter | Type | Required | Description |
|---|---|---|---|
plan |
string | Yes | The full implementation plan to send to the Code agent |
title |
string | Yes | Short title for the session (shown in session list, max 60 chars) |
Creates a new session via the OpenCode API and sends the plan as the first message to the Code agent. Returns the session ID and title. Only the Architect agent has access to this tool — it is excluded from Code and Memory agents.
The model used for the new Code session is determined by executionModel in the plugin config (format: provider/model, e.g. anthropic/claude-sonnet-4-20250514). If not set, OpenCode's default model resolution is used — typically the model field from opencode.json.
memory-kv-set
Store a key-value pair for the current project. Values expire after 24 hours by default. Use for ephemeral project state like planning progress, code review patterns, or session context.
| Parameter | Type | Required | Description |
|---|---|---|---|
key |
string | Yes | The key to store the value under |
value |
string | Yes | The value to store (JSON string) |
ttlMs |
number | No | Time-to-live in milliseconds (default: 24 hours) |
Returns confirmation with the key and expiration timestamp.
memory-kv-get
Retrieve a value by key for the current project.
| Parameter | Type | Required | Description |
|---|---|---|---|
key |
string | Yes | The key to retrieve |
Returns the stored value (formatted as JSON if applicable) or a message indicating the key was not found.
memory-kv-delete
Delete a key-value pair for the current project.
| Parameter | Type | Required | Description |
|---|---|---|---|
key |
string | Yes | The key to delete |
Returns confirmation that the key was deleted.
memory-kv-list
List all active key-value pairs for the current project.
No parameters required.
Returns a list of all stored keys with their values and expiration times. Useful for debugging or inspecting current project state.
KV Store vs Memory
The KV store is designed for ephemeral project state that expires automatically (default 24 hours). Use memory-write for durable knowledge that should persist across sessions, such as conventions, decisions, and context.
Workflows
Architect → Code
The Architect and Code agents work together in a plan-then-execute pattern. The Architect researches and designs; the Code agent implements.
Steps:
- Switch to the Architect agent using the agent selector in the chat header
- Describe your task — the Architect researches the codebase, checks memory for conventions and decisions, and designs a plan
- Review the plan — the Architect presents a structured plan with objectives, phases, and decisions for your approval
- Approve the plan — the Architect calls
memory-plan-execute, which creates a new Code session and sends the full plan as context - Switch to the new session — the Code agent executes the plan phase by phase
The Architect operates in read-only mode — it cannot edit files. This separation ensures planning is thorough before any code changes are made.
Recommended Model Strategy
Planning requires strong reasoning — use a smart model (e.g., claude-opus-4-6) for the Architect session. Code execution is more mechanical — set executionModel to a faster, cheaper model (e.g., claude-haiku-3-5-20241022 or a MiniMax model).
This gives you the best of both worlds: high-quality plans at the reasoning tier, fast execution at a fraction of the cost.
Configure the execution model in the memory plugin config (~/.local/share/opencode/memory/config.json):
Or set it from the UI: Settings > Memory Plugin > Execution Model.
Cost Optimization
With this setup, only the planning phase uses the expensive model. The Code session — which typically consumes far more tokens implementing the plan — runs on the cheaper model. The Architect's plan provides enough structure and detail that the Code agent doesn't need the same level of reasoning capability.
Agents
The plugin registers four agents that are configured into OpenCode:
Code Agent (primary)
- Display name:
Code - Mode:
primary(replaces the default agent) - Role: Primary coding agent with memory awareness
The Code agent's system prompt instructs it to:
- Check memory before modifying unfamiliar code areas or making architectural decisions
- Store durable knowledge with rationale (not just "we use X" but "we use X because Y")
- Use the @Memory subagent for complex memory operations (multi-query research, contradiction resolution, bulk curation)
- Check for duplicates with
memory-readbefore writing new memories - Update stale memories with
memory-editrather than creating duplicates
Memory Agent (subagent)
- Display name:
Memory - Mode:
subagent - Role: Institutional memory manager
The Memory agent handles:
- Strategic retrieval across scopes with prioritized results
- Storage with proper scope categorization and rationale
- Contradiction detection between overlapping memories
- Curation: merging duplicates, archiving outdated entries
- Post-compaction knowledge extraction (invoked automatically via SubtaskPart)
Architect Agent (primary)
- Display name:
Architect - Mode:
primary(user-switchable agent, not a subagent) - Temperature: 0.0 (deterministic)
- Permission: Read-only — cannot edit any files (
edit: { '*': 'deny' }) - Role: Memory-aware planning agent
The Architect agent follows a Research → Design → Plan → Execute workflow:
- Research — Reads relevant files, searches the codebase, checks memory for conventions and decisions
- Design — Considers approaches, weighs tradeoffs, asks clarifying questions
- Plan — Presents a structured plan with objectives, phases, decisions, conventions, and key context
- Execute — When the user approves, calls
memory-plan-executewith the plan and title.
The Architect is the only agent with access to the memory-plan-execute tool. Plans must be fully self-contained since the Code agent receiving them has no access to the Architect's conversation.
Code Review Agent (subagent)
- Display name:
Code Review - Mode:
subagent - Temperature: 0.0 (deterministic)
- Role: Convention-aware code reviewer with memory access
The Code Review agent is a read-only subagent invoked by other agents via the Task tool to review diffs, commits, branches, or PRs. It checks changes against stored project conventions and decisions, then returns a structured review summary with issues (bug/warning/suggestion) and observations.
The agent can read memory (memory-read) but cannot write, edit, or delete memories. It also cannot execute plans — memory-plan-execute, memory-write, memory-edit, and memory-delete are excluded.
The /review slash command triggers this agent as a subtask with the template: "Review the current code changes."
Built-in Agent Enhancements
The plugin also modifies built-in OpenCode agents:
| Agent | Enhancement |
|---|---|
plan |
Gets access to memory-read tool |
build |
Hidden (replaced by the Code agent) |
The default agent is set to Code.
Removed Features
The following features were removed in a recent refactor:
- Keyword activation (regex-based detection of "remember this", "recall", etc.)
- LLM parameter adjustment based on detected modes (temperature, thinking budget, maxSteps)
- resumeAfterCompaction config option
Hooks
The plugin registers several hooks into OpenCode's lifecycle:
chat.message
- Tracks session initialization (first message per session)
event
Listens for session.compacted events and triggers automatic knowledge extraction:
1. Fetches the last 4 messages from the session to get the compaction summary
2. Sends a synchronous prompt() call with a SubtaskPart to run the Memory agent
3. Extraction runs within the main session's prompt loop, keeping session busy
experimental.session.compacting
The core compaction hook that fires when a session is about to be compacted. It injects context to preserve knowledge across context window resets:
-
Project memories — Fetches up to 10 conventions and 10 decisions for the project and formats them under
### Conventionsand### Decisionsheadings -
Token budgeting — All sections are trimmed to fit within
maxContextTokens(default 4000). Lower-priority sections are truncated first. -
Custom prompt — If
customPromptis enabled, replaces the default compaction prompt with one optimized for continuation context that preserves active tasks, file paths, decisions, and todo state -
Diagnostics — Appends a summary line showing how many conventions, decisions, and tokens were injected
experimental.chat.messages.transform
Performs two functions on the message array before each LLM inference call:
Memory Injection (all agents):
- Finds the last user message in the message array
- Extracts all text parts and runs a semantic vector search against stored project memories
- Filters results by
distanceThreshold— only memories with distance below the threshold are kept - Formats matching memories into a
<project-memory>block with scope labels (e.g.,[convention],[decision]) - Trims the block to fit within
maxTokensand appends it as a synthetic text part to the user message - Uses SHA-256 content-hash caching (
cacheTtlMs, default 30s) to avoid redundant vector searches across inference steps
The hook fires on every LLM inference step (including tool-use follow-ups), but since OpenCode re-reads messages from the database each iteration, synthetic parts are ephemeral. The cache ensures the vector search only runs once per unique user message within the TTL window.
Memory injection is controlled independently by memoryInjection.enabled (default true). Architect read-only enforcement is controlled by messagesTransform.enabled (default true).
Architect Read-Only Enforcement (Architect agent only):
- Checks if the last user message is addressed to the Architect agent
- If so, appends a synthetic
<system-reminder>part enforcing read-only mode - This provides message-level enforcement on top of the agent's
edit: { '*': 'deny' }permission config
Data Lifecycle
Startup Sequence
- Load and validate config from global data directory
- Create embedding provider (local/API)
- Warmup embedding provider (non-blocking)
- Initialize SQLite database with WAL mode
- Create memory service with no-op vec service
- Initialize KV service and start auto-cleanup interval (30 minutes)
- Initialize vec service asynchronously:
- If available: sync missing embeddings, auto-validate model drift
- If unavailable: continue with no-op (semantic search degraded)
Cleanup
On process exit, SIGINT, or SIGTERM:
- Stop KV cleanup interval
- Dispose vec service
- Destroy in-memory cache
- Dispose embedding provider (disconnect from shared server or release model)
- Close SQLite database
The cleanup function is idempotent — calling it multiple times is safe.
Data Locations
| File | Location | Purpose |
|---|---|---|
memory.db |
{dataDir}/ |
SQLite database with all memories |
config.json |
{dataDir}/ |
Plugin configuration |
embedding.sock |
{dataDir}/ |
Unix socket for shared embedding server |
embedding.pid |
{dataDir}/ |
PID file for the embedding server process |
embedding.startup.lock |
{dataDir}/ |
Directory-based lock to prevent duplicate server starts |
memory.log |
{dataDir}/logs/ |
Debug log (when logging is enabled) |
models/ |
{dataDir}/ |
Hugging Face model cache for local embeddings |
CLI
The plugin includes the ocm-mem CLI for managing memories outside of OpenCode sessions. The CLI auto-detects the project ID from git and resolves the database path automatically.
Global Options
| Flag | Description |
|---|---|
--db-path <path> |
Path to memory database |
--project, -p <name> |
Project name or SHA (auto-detected from git) |
--dir, -d <path> |
Git repo path for project detection |
--help, -h |
Show help |
Commands
| Command | Description |
|---|---|
export |
Export memories to file (JSON or Markdown) |
import |
Import memories from file |
list |
List projects with memory counts |
stats |
Show memory statistics for a project |
cleanup |
Delete memories by criteria |
Usage Examples
# Export all memories as markdown
ocm-mem export --format markdown --output memories.md
# Export filtered by scope
ocm-mem export --project my-project --scope convention
# Import from JSON
ocm-mem import memories.json --project my-project
# Import from Markdown, skip duplicate detection
ocm-mem import memories.md --project my-project --force
# List all projects
ocm-mem list
# Show stats for current project
ocm-mem stats
# Preview cleanup of old memories (dry run)
ocm-mem cleanup --older-than 90 --dry-run
# Delete specific memories
ocm-mem cleanup --ids 1,2,3 --force
Run ocm-mem <command> --help for full options on each command.
Troubleshooting
Plugin shows "degraded" status
The embedding provider is not operational. For local embeddings, the model may not have downloaded. For API providers, check your API key and network connectivity. Run memory-health with action: check for details.
Search returns no results
- Verify memories exist with
memory-read(no query, no scope) - Check if a reindex is needed:
memory-health check— look for "Reindex required" - If using a new model, run
memory-health reindex
Embedding server won't start
- Check if another process holds the startup lock: look for
embedding.startup.lockdirectory in the data dir - If stale, delete it manually:
rm -rf ~/.local/share/opencode/memory/embedding.startup.lock - Check if the socket file exists but the process is dead:
rm ~/.local/share/opencode/memory/embedding.sock - Verify Bun is installed and available on PATH
Memory not injected during compaction
Check that compaction.customPrompt is true in your config. Verify that memories exist for the project by running memory-read without filters.