Skip to content

Memory Plugin

@opencode-manager/memory is an optional OpenCode plugin that stores and recalls project knowledge across sessions using vector embeddings and semantic search.

npm

Not Required

This plugin is entirely optional. OpenCode Manager works fully without it — install it only if you want persistent project knowledge and semantic search capabilities.

Works with Standalone OpenCode

This plugin can also be used with standalone OpenCode installations outside of OpenCode Manager. Simply install the package and add it to your opencode.json plugins array.


Installation

pnpm add @opencode-manager/memory

The local embedding model (all-MiniLM-L6-v2) is downloaded automatically via the postinstall script. For API-based embeddings (OpenAI or Voyage), skip the local model and set your provider and API key in the configuration instead.

Then register the plugin in your opencode.json:

{
  "plugin": ["@opencode-manager/memory"]
}

Configuration

On first run, the plugin copies a bundled config.json to the global data directory:

  • ~/.local/share/opencode/memory/config.json
  • Falls back to $XDG_DATA_HOME/opencode/memory/config.json

The file is only created if it does not already exist. The config is validated on load — if it fails validation, defaults are used automatically.

Full Default Config

{
  "embedding": {
    "provider": "local",
    "model": "all-MiniLM-L6-v2",
    "dimensions": 384,
    "baseUrl": "",
    "apiKey": ""
  },
  "dedupThreshold": 0.25,
  "logging": {
    "enabled": false,
    "debug": false,
    "file": ""
  },
  "compaction": {
    "customPrompt": true,
    "maxContextTokens": 4000
  },
  "memoryInjection": {
    "enabled": true,
    "debug": false,
    "maxTokens": 2000,
    "cacheTtlMs": 30000
  },
  "messagesTransform": {
    "enabled": true,
    "debug": false
  },
  "executionModel": ""
}

API-Based Embedding Example

{
  "embedding": {
    "provider": "openai",
    "model": "text-embedding-3-small",
    "apiKey": "sk-..."
  }
}

Embedding Providers

Provider Models API Key Required
local all-MiniLM-L6-v2 (384d) No
openai text-embedding-3-small (1536d), text-embedding-3-large (3072d), text-embedding-ada-002 (1536d) Yes
voyage voyage-code-3 (1024d), voyage-2 (1536d) Yes

Set baseUrl to point at any OpenAI-compatible self-hosted service (vLLM, Ollama, LocalAI, LiteLLM, text-embeddings-inference). The URL is automatically normalized — providing http://localhost:11434 appends /v1/embeddings.

All Options

Key Description Default
embedding.provider local, openai, or voyage local
embedding.model Model name all-MiniLM-L6-v2
embedding.dimensions Vector dimensions (auto-detected for known models)
embedding.apiKey API key for OpenAI/Voyage
embedding.baseUrl Custom endpoint for self-hosted services
embedding.serverGracePeriod Time (ms) before idle embedding server shuts down 30000
dedupThreshold Similarity threshold for deduplication (0.05–0.40) 0.25
logging.enabled Write logs to file false
logging.debug Enable debug-level log output false
logging.file Log file path (resolves to ~/.local/share/opencode/memory/logs/memory.log when empty, 10MB limit, auto-rotated)
compaction.customPrompt Use optimized compaction prompt for session continuity true
compaction.maxContextTokens Max tokens for injected memory context 4000
memoryInjection.enabled Inject relevant memories into user messages via semantic search true
memoryInjection.debug Enable debug logging for memory injection false
memoryInjection.maxResults Max vector search results to retrieve 5
memoryInjection.distanceThreshold Max vector distance for relevance filtering (lower = stricter) 0.5
memoryInjection.maxTokens Token budget for injected <project-memory> block 2000
memoryInjection.cacheTtlMs Cache TTL (ms) for identical query results 30000
messagesTransform.enabled Enable the messages transform hook (memory injection + Architect enforcement) true
messagesTransform.debug Enable debug logging for messages transform false
executionModel Model override for plan execution sessions (provider/model). Falls back to OpenCode's default model.

Architecture

The plugin is composed of several subsystems that work together:

┌──────────────────────────────────────────────────┐
│                  Memory Plugin                    │
├─────────┬──────────┬───────────┬─────────────────┤
│  Tools  │  Agents  │   Hooks   │   Compaction    │
├─────────┴──────────┴───────────┴─────────────────┤
│               Memory Service                      │
├──────────────┬────────────────┬───────────────────┤
│  Embedding   │   Vec Search   │   Cache          │
│  Service     │   (sqlite-vec) │   (In-Memory)    │
├──────────────┴────────────────┬───────────────────┤
│         KV Service            │   Auto-Cleanup    │
│   (ephemeral state + TTL)     │   (30min interval)│
├──────────────┴────────────────┴───────────────────┤
│              SQLite Database (WAL)                 │
│   memories | metadata | project_kv (TTL indexed)  │
└──────────────────────────────────────────────────┘

Storage Layer

The plugin uses a single SQLite database in WAL mode with four tables:

Table Purpose
memories Stores all memory records with scope, content, access tracking
plugin_metadata Tracks the active embedding model and dimensions for drift detection
project_kv Stores ephemeral key-value pairs with TTL expiration (auto-cleaned every 30 minutes)

SQLite pragmas are tuned for concurrent access:

  • journal_mode=WAL — concurrent reads during writes
  • busy_timeout=5000 — wait up to 5s on lock contention
  • synchronous=NORMAL — balanced durability and performance

KV Store

The KV store provides ephemeral project state management with automatic TTL-based expiration:

  • Key-Value Storage: Store arbitrary JSON values under string keys, scoped by project ID
  • TTL Management: Each entry has a configurable expiration time (default 24 hours)
  • Auto-Cleanup: Background cleanup runs every 30 minutes to remove expired entries
  • Graceful Degradation: get() and list() methods handle malformed JSON gracefully
  • Use Cases: Planning progress, code review patterns, session context, temporary state

The KV service is initialized on plugin startup and begins its cleanup interval automatically. Call kvService.destroy() during cleanup to stop the interval.

Vector similarity search is powered by sqlite-vec, a SQLite extension. The vec service:

  • Initializes lazily after the database is ready
  • Falls back to a no-op service if the extension is unavailable (search still works via exact match, just without semantic ranking)
  • Supports insert, delete, search, and similarity-threshold queries
  • Scoped by project ID for multi-project isolation

Embedding Subsystem

The embedding system has three provider types and a shared server architecture:

Local Provider

Uses @huggingface/transformers to run all-MiniLM-L6-v2 locally. The model is loaded lazily on first use with a warmup hint at plugin initialization.

Shared Embedding Server

When using the local provider, the plugin runs a shared Unix socket server (embedding.sock) that:

  1. Loads the model once into memory
  2. Serves embedding requests to multiple plugin instances via Unix domain socket
  3. Uses reference counting — clients send connect/disconnect messages
  4. Auto-shuts down after a configurable grace period (default 30s) when the last client disconnects
  5. Uses PID files and startup locks to prevent duplicate server instances
  6. Falls back to in-process embedding if the server fails to start

This architecture means the model is loaded once regardless of how many OpenCode sessions are running.

API Provider

Supports OpenAI and Voyage embedding APIs:

  • Batch processing in chunks of 100 texts
  • Automatic URL normalization for self-hosted endpoints
  • Bearer token authentication

Embedding Cache

All embeddings are cached in memory using SHA-256 content hashes. Cache entries expire after 24 hours. This prevents redundant API calls or model inference for identical content.

Embedding Sync

On startup, the plugin checks for memories that lack embeddings (e.g., from a model change or failed previous embedding) and backfills them automatically:

  • Processes in batches of 50
  • Retries failed embeddings up to 3 times
  • Stops early if an entire batch fails (prevents infinite loops)
  • Caps at 100 iterations to bound startup time

Auto-Validation

After the vec service initializes, the plugin compares the configured embedding model/dimensions against what's stored in plugin_metadata. If there's a mismatch (model drift), it automatically triggers a reindex — no manual memory-health reindex needed.


Memory Model

Scopes

Every memory belongs to exactly one scope:

Scope Purpose Examples
convention Rules and patterns to follow "Use named imports only", "Tests use describe/it blocks"
decision Architectural choices with rationale "Chose SQLite over PostgreSQL for simplicity"
context Reference information "Entry point is src/index.ts", "Prices stored as integers"

Fields

Each memory record contains:

Field Description
id Auto-incrementing integer primary key
projectId The OpenCode project this memory belongs to
scope convention, decision, or context
content The memory text
filePath Optional file path reference
accessCount How many times this memory has been read
lastAccessedAt Timestamp of last access
createdAt Creation timestamp
updatedAt Last modification timestamp

Deduplication

Before storing a new memory, the plugin:

  1. Checks for an exact content match in the same project
  2. Computes vector similarity against all existing project memories
  3. Skips the write if similarity exceeds dedupThreshold (default 0.25)
  4. Uses a transaction with double-check locking to prevent race conditions

When deduplication triggers, the existing memory's ID is returned instead of creating a duplicate.


Tools

The plugin registers thirteen tools that the AI agent can call:

memory-read

Search and retrieve project memories.

Parameter Type Required Description
query string No Semantic search query
scope enum No Filter by convention, decision, or context
limit number No Max results (default: 10)

When query is provided, results are ranked by vector similarity. Without query, memories are listed in chronological order. Access counts are tracked for every read.

memory-write

Store a new project memory with automatic deduplication.

Parameter Type Required Description
content string Yes The memory content to store
scope enum Yes convention, decision, or context

Returns the memory ID and whether deduplication matched an existing memory.

memory-edit

Update the content or scope of an existing memory. Re-embeds the content if changed.

Parameter Type Required Description
id number Yes Memory ID to update
content string Yes New content
scope enum No New scope (keeps existing if omitted)

memory-delete

Soft-delete a memory by ID. The memory must exist or an error is returned.

Parameter Type Required Description
id number Yes Memory ID to delete

memory-health

Check plugin health or trigger a reindex of all embeddings.

Parameter Type Required Description
action enum No check (default) or reindex

Check returns:

  • Overall status: ok, degraded, or error
  • Embedding provider status and operational state
  • Shared embedding server status (running, client count, uptime)
  • Database health and total memory count
  • Configured vs. indexed model comparison
  • Whether a reindex is needed

Reindex regenerates all embeddings with the configured model:

  • Verifies the provider is operational before starting
  • Processes memories in batches of 50
  • Updates the plugin_metadata table on success
  • Reports total, success, and failure counts

Model Changes Require Reindex

If you change embedding.model or embedding.dimensions, existing embeddings will have mismatched dimensions. Auto-validation handles this on startup, but you can also trigger it manually with memory-health reindex.

memory-plan-execute

Create a new Code session and send an implementation plan as the first prompt. Designed to be called by the Architect agent after the user approves a plan.

Parameter Type Required Description
plan string Yes The full implementation plan to send to the Code agent
title string Yes Short title for the session (shown in session list, max 60 chars)

Creates a new session via the OpenCode API and sends the plan as the first message to the Code agent. Returns the session ID and title. Only the Architect agent has access to this tool — it is excluded from Code and Memory agents.

The model used for the new Code session is determined by executionModel in the plugin config (format: provider/model, e.g. anthropic/claude-sonnet-4-20250514). If not set, OpenCode's default model resolution is used — typically the model field from opencode.json.

memory-kv-set

Store a key-value pair for the current project. Values expire after 24 hours by default. Use for ephemeral project state like planning progress, code review patterns, or session context.

Parameter Type Required Description
key string Yes The key to store the value under
value string Yes The value to store (JSON string)
ttlMs number No Time-to-live in milliseconds (default: 24 hours)

Returns confirmation with the key and expiration timestamp.

memory-kv-get

Retrieve a value by key for the current project.

Parameter Type Required Description
key string Yes The key to retrieve

Returns the stored value (formatted as JSON if applicable) or a message indicating the key was not found.

memory-kv-delete

Delete a key-value pair for the current project.

Parameter Type Required Description
key string Yes The key to delete

Returns confirmation that the key was deleted.

memory-kv-list

List all active key-value pairs for the current project.

No parameters required.

Returns a list of all stored keys with their values and expiration times. Useful for debugging or inspecting current project state.

KV Store vs Memory

The KV store is designed for ephemeral project state that expires automatically (default 24 hours). Use memory-write for durable knowledge that should persist across sessions, such as conventions, decisions, and context.


Workflows

Architect → Code

The Architect and Code agents work together in a plan-then-execute pattern. The Architect researches and designs; the Code agent implements.

Steps:

  1. Switch to the Architect agent using the agent selector in the chat header
  2. Describe your task — the Architect researches the codebase, checks memory for conventions and decisions, and designs a plan
  3. Review the plan — the Architect presents a structured plan with objectives, phases, and decisions for your approval
  4. Approve the plan — the Architect calls memory-plan-execute, which creates a new Code session and sends the full plan as context
  5. Switch to the new session — the Code agent executes the plan phase by phase

The Architect operates in read-only mode — it cannot edit files. This separation ensures planning is thorough before any code changes are made.

Planning requires strong reasoning — use a smart model (e.g., claude-opus-4-6) for the Architect session. Code execution is more mechanical — set executionModel to a faster, cheaper model (e.g., claude-haiku-3-5-20241022 or a MiniMax model).

This gives you the best of both worlds: high-quality plans at the reasoning tier, fast execution at a fraction of the cost.

Configure the execution model in the memory plugin config (~/.local/share/opencode/memory/config.json):

{
  "executionModel": "anthropic/claude-haiku-3-5-20241022"
}

Or set it from the UI: Settings > Memory Plugin > Execution Model.

Cost Optimization

With this setup, only the planning phase uses the expensive model. The Code session — which typically consumes far more tokens implementing the plan — runs on the cheaper model. The Architect's plan provides enough structure and detail that the Code agent doesn't need the same level of reasoning capability.


Agents

The plugin registers four agents that are configured into OpenCode:

Code Agent (primary)

  • Display name: Code
  • Mode: primary (replaces the default agent)
  • Role: Primary coding agent with memory awareness

The Code agent's system prompt instructs it to:

  • Check memory before modifying unfamiliar code areas or making architectural decisions
  • Store durable knowledge with rationale (not just "we use X" but "we use X because Y")
  • Use the @Memory subagent for complex memory operations (multi-query research, contradiction resolution, bulk curation)
  • Check for duplicates with memory-read before writing new memories
  • Update stale memories with memory-edit rather than creating duplicates

Memory Agent (subagent)

  • Display name: Memory
  • Mode: subagent
  • Role: Institutional memory manager

The Memory agent handles:

  • Strategic retrieval across scopes with prioritized results
  • Storage with proper scope categorization and rationale
  • Contradiction detection between overlapping memories
  • Curation: merging duplicates, archiving outdated entries
  • Post-compaction knowledge extraction (invoked automatically via SubtaskPart)

Architect Agent (primary)

  • Display name: Architect
  • Mode: primary (user-switchable agent, not a subagent)
  • Temperature: 0.0 (deterministic)
  • Permission: Read-only — cannot edit any files (edit: { '*': 'deny' })
  • Role: Memory-aware planning agent

The Architect agent follows a Research → Design → Plan → Execute workflow:

  1. Research — Reads relevant files, searches the codebase, checks memory for conventions and decisions
  2. Design — Considers approaches, weighs tradeoffs, asks clarifying questions
  3. Plan — Presents a structured plan with objectives, phases, decisions, conventions, and key context
  4. Execute — When the user approves, calls memory-plan-execute with the plan and title.

The Architect is the only agent with access to the memory-plan-execute tool. Plans must be fully self-contained since the Code agent receiving them has no access to the Architect's conversation.

Code Review Agent (subagent)

  • Display name: Code Review
  • Mode: subagent
  • Temperature: 0.0 (deterministic)
  • Role: Convention-aware code reviewer with memory access

The Code Review agent is a read-only subagent invoked by other agents via the Task tool to review diffs, commits, branches, or PRs. It checks changes against stored project conventions and decisions, then returns a structured review summary with issues (bug/warning/suggestion) and observations.

The agent can read memory (memory-read) but cannot write, edit, or delete memories. It also cannot execute plans — memory-plan-execute, memory-write, memory-edit, and memory-delete are excluded.

The /review slash command triggers this agent as a subtask with the template: "Review the current code changes."

Built-in Agent Enhancements

The plugin also modifies built-in OpenCode agents:

Agent Enhancement
plan Gets access to memory-read tool
build Hidden (replaced by the Code agent)

The default agent is set to Code.

Removed Features

The following features were removed in a recent refactor: - Keyword activation (regex-based detection of "remember this", "recall", etc.) - LLM parameter adjustment based on detected modes (temperature, thinking budget, maxSteps) - resumeAfterCompaction config option


Hooks

The plugin registers several hooks into OpenCode's lifecycle:

chat.message

  • Tracks session initialization (first message per session)

event

Listens for session.compacted events and triggers automatic knowledge extraction: 1. Fetches the last 4 messages from the session to get the compaction summary 2. Sends a synchronous prompt() call with a SubtaskPart to run the Memory agent 3. Extraction runs within the main session's prompt loop, keeping session busy

experimental.session.compacting

The core compaction hook that fires when a session is about to be compacted. It injects context to preserve knowledge across context window resets:

  1. Project memories — Fetches up to 10 conventions and 10 decisions for the project and formats them under ### Conventions and ### Decisions headings

  2. Token budgeting — All sections are trimmed to fit within maxContextTokens (default 4000). Lower-priority sections are truncated first.

  3. Custom prompt — If customPrompt is enabled, replaces the default compaction prompt with one optimized for continuation context that preserves active tasks, file paths, decisions, and todo state

  4. Diagnostics — Appends a summary line showing how many conventions, decisions, and tokens were injected

experimental.chat.messages.transform

Performs two functions on the message array before each LLM inference call:

Memory Injection (all agents):

  1. Finds the last user message in the message array
  2. Extracts all text parts and runs a semantic vector search against stored project memories
  3. Filters results by distanceThreshold — only memories with distance below the threshold are kept
  4. Formats matching memories into a <project-memory> block with scope labels (e.g., [convention], [decision])
  5. Trims the block to fit within maxTokens and appends it as a synthetic text part to the user message
  6. Uses SHA-256 content-hash caching (cacheTtlMs, default 30s) to avoid redundant vector searches across inference steps

The hook fires on every LLM inference step (including tool-use follow-ups), but since OpenCode re-reads messages from the database each iteration, synthetic parts are ephemeral. The cache ensures the vector search only runs once per unique user message within the TTL window.

Memory injection is controlled independently by memoryInjection.enabled (default true). Architect read-only enforcement is controlled by messagesTransform.enabled (default true).

Architect Read-Only Enforcement (Architect agent only):

  1. Checks if the last user message is addressed to the Architect agent
  2. If so, appends a synthetic <system-reminder> part enforcing read-only mode
  3. This provides message-level enforcement on top of the agent's edit: { '*': 'deny' } permission config

Data Lifecycle

Startup Sequence

  1. Load and validate config from global data directory
  2. Create embedding provider (local/API)
  3. Warmup embedding provider (non-blocking)
  4. Initialize SQLite database with WAL mode
  5. Create memory service with no-op vec service
  6. Initialize KV service and start auto-cleanup interval (30 minutes)
  7. Initialize vec service asynchronously:
    • If available: sync missing embeddings, auto-validate model drift
    • If unavailable: continue with no-op (semantic search degraded)

Cleanup

On process exit, SIGINT, or SIGTERM:

  1. Stop KV cleanup interval
  2. Dispose vec service
  3. Destroy in-memory cache
  4. Dispose embedding provider (disconnect from shared server or release model)
  5. Close SQLite database

The cleanup function is idempotent — calling it multiple times is safe.

Data Locations

File Location Purpose
memory.db {dataDir}/ SQLite database with all memories
config.json {dataDir}/ Plugin configuration
embedding.sock {dataDir}/ Unix socket for shared embedding server
embedding.pid {dataDir}/ PID file for the embedding server process
embedding.startup.lock {dataDir}/ Directory-based lock to prevent duplicate server starts
memory.log {dataDir}/logs/ Debug log (when logging is enabled)
models/ {dataDir}/ Hugging Face model cache for local embeddings

CLI

The plugin includes the ocm-mem CLI for managing memories outside of OpenCode sessions. The CLI auto-detects the project ID from git and resolves the database path automatically.

ocm-mem <command> [options]

Global Options

Flag Description
--db-path <path> Path to memory database
--project, -p <name> Project name or SHA (auto-detected from git)
--dir, -d <path> Git repo path for project detection
--help, -h Show help

Commands

Command Description
export Export memories to file (JSON or Markdown)
import Import memories from file
list List projects with memory counts
stats Show memory statistics for a project
cleanup Delete memories by criteria

Usage Examples

# Export all memories as markdown
ocm-mem export --format markdown --output memories.md

# Export filtered by scope
ocm-mem export --project my-project --scope convention

# Import from JSON
ocm-mem import memories.json --project my-project

# Import from Markdown, skip duplicate detection
ocm-mem import memories.md --project my-project --force

# List all projects
ocm-mem list

# Show stats for current project
ocm-mem stats

# Preview cleanup of old memories (dry run)
ocm-mem cleanup --older-than 90 --dry-run

# Delete specific memories
ocm-mem cleanup --ids 1,2,3 --force

Run ocm-mem <command> --help for full options on each command.


Troubleshooting

Plugin shows "degraded" status

The embedding provider is not operational. For local embeddings, the model may not have downloaded. For API providers, check your API key and network connectivity. Run memory-health with action: check for details.

Search returns no results

  • Verify memories exist with memory-read (no query, no scope)
  • Check if a reindex is needed: memory-health check — look for "Reindex required"
  • If using a new model, run memory-health reindex

Embedding server won't start

  • Check if another process holds the startup lock: look for embedding.startup.lock directory in the data dir
  • If stale, delete it manually: rm -rf ~/.local/share/opencode/memory/embedding.startup.lock
  • Check if the socket file exists but the process is dead: rm ~/.local/share/opencode/memory/embedding.sock
  • Verify Bun is installed and available on PATH

Memory not injected during compaction

Check that compaction.customPrompt is true in your config. Verify that memories exist for the project by running memory-read without filters.