Memory Overview
Conversational AI agents often need to remember past interactions to maintain context, understand user preferences, and provide more coherent and personalized responses. Without memory, each interaction would be treated in isolation, leading to repetitive questions and unnatural conversations.
VoltAgent provides a unified Memory class with pluggable storage adapters. It stores and retrieves conversation history, and optionally supports embedding-powered semantic search and structured working memory.
Why Use Memory?
- Context Preservation: Enables agents to recall previous messages in a conversation, understanding follow-up questions and references.
- Personalization: Allows agents to remember user-specific details (like name, preferences, past requests) for a tailored experience.
- Coherence: Ensures conversations flow naturally without the agent constantly losing track of the topic.
- Long-Term State: Can be used to store summaries or key information extracted from conversations over extended periods.
Default Memory Behavior
By default, agents use in-memory storage (no persistence) with zero configuration. If you don't provide a memory option, VoltAgent automatically uses an in-memory adapter that:
- Stores conversation history in application memory.
- Maintains context during the application runtime.
- Loses data when the application restarts (suitable for development and stateless deployments).
For persistent storage across restarts, configure Memory with a storage adapter such as LibSQLMemoryAdapter, PostgreSQLMemoryAdapter, or SupabaseMemoryAdapter. See the specific adapter docs for details.
Disabling Memory
You can completely disable memory persistence and retrieval by setting the memory property to false in the Agent constructor:
const agent = new Agent({
name: "Stateless Assistant",
instructions: "This agent has no memory.",
model: openai("gpt-4o"),
memory: false, // disable memory entirely
});
When memory is disabled, the agent won't store or retrieve any conversation history, making it stateless for each interaction.
Separate Conversation and History Memory
VoltAgent manages conversation memory via the memory option. Observability (execution logs) is handled via OpenTelemetry and VoltOps integrations, and is not tied to conversation storage.
Working Memory
Working memory lets the agent persist concise, important context across turns (conversation-scoped by default, optionally user-scoped). Configuration is part of the Memory constructor via workingMemory.
Supported modes:
- Template (Markdown):
workingMemory: { enabled: true, template: string } - JSON schema (Zod):
workingMemory: { enabled: true, schema: z.object({...}) } - Free-form:
workingMemory: { enabled: true }
Scope:
scope?: 'conversation' | 'user'(defaults toconversation)
Example (template-based, conversation-scoped):
import { Agent, Memory } from "@voltagent/core";
import { LibSQLMemoryAdapter } from "@voltagent/libsql";
import { openai } from "@ai-sdk/openai";
const memory = new Memory({
storage: new LibSQLMemoryAdapter({ url: "file:./.voltagent/memory.db" }),
workingMemory: {
enabled: true,
template: `
# Profile
- Name:
- Role:
# Goals
-
# Preferences
-
`,
// scope: 'conversation' // default
},
});
const agent = new Agent({
name: "Assistant",
instructions: "Use working memory to maintain key facts.",
model: openai("gpt-4o-mini"),
memory,
});
// When you call the agent with user/conversation IDs, the agent injects
// working memory instructions into the system prompt automatically
const res = await agent.generateText("Let's plan this week", {
userId: "u1",
conversationId: "c1",
});
Example (JSON schema, user-scoped):
import { z } from "zod";
import { Agent, Memory } from "@voltagent/core";
import { LibSQLMemoryAdapter } from "@voltagent/libsql";
import { openai } from "@ai-sdk/openai";
const workingSchema = z.object({
userProfile: z
.object({
name: z.string().optional(),
timezone: z.string().optional(),
})
.optional(),
tasks: z.array(z.string()).optional(),
});
const memory = new Memory({
storage: new LibSQLMemoryAdapter({ url: "file:./.voltagent/memory.db" }),
workingMemory: {
enabled: true,
scope: "user",
schema: workingSchema,
},
});
const agent = new Agent({ name: "Planner", model: openai("gpt-4o-mini"), memory });
Programmatic API:
memory.getWorkingMemory({ conversationId?, userId? }) → Promise<string | null>memory.updateWorkingMemory({ conversationId?, userId?, content })wherecontentis a string or an object matching the schema when configured (validated internally). Stores as string (Markdown or JSON) under the hood.memory.clearWorkingMemory({ conversationId?, userId? })memory.getWorkingMemoryFormat() → 'markdown' | 'json' | nullmemory.getWorkingMemoryTemplate() → string | nullmemory.getWorkingMemorySchema() → z.ZodObject | null
Built-in tools (added automatically when working memory is enabled):
get_working_memory()→ returns the current content stringupdate_working_memory(content)→ updates content (typed to schema if configured)clear_working_memory()→ clears content
Agent prompt integration:
- On each call with
userIdandconversationId, the agent appends a working-memory instruction block to the system prompt (including template/schema and current content if present).
Semantic Search (Embeddings + Vectors)
To enable semantic retrieval of past messages, configure both an embedding adapter and a vector adapter. Memory will automatically embed text parts of messages and store vectors with metadata.
Adapters:
AiSdkEmbeddingAdapter(wraps ai-sdk embedding models)InMemoryVectorAdapter(lightweight dev vector store)LibSQLVectorAdapterfrom@voltagent/libsql(persistent vectors via LibSQL/Turso/SQLite)
Example (dev vector store):
import { Agent, Memory, AiSdkEmbeddingAdapter, InMemoryVectorAdapter } from "@voltagent/core";
import { LibSQLMemoryAdapter } from "@voltagent/libsql";
import { openai } from "@ai-sdk/openai";
const memory = new Memory({
storage: new LibSQLMemoryAdapter({ url: "file:./.voltagent/memory.db" }),
embedding: new AiSdkEmbeddingAdapter(openai.embedding("text-embedding-3-small")),
vector: new InMemoryVectorAdapter(),
enableCache: true, // optional embedding cache
});
const agent = new Agent({ name: "Helper", model: openai("gpt-4o-mini"), memory });
// Enable semantic search per-call (defaults shown; enabled auto when vectors present)
const out = await agent.generateText("What did I say about pricing last week?", {
userId: "u1",
conversationId: "c1",
semanticMemory: {
enabled: true,
semanticLimit: 5,
semanticThreshold: 0.7,
mergeStrategy: "append", // default ('prepend' | 'append' | 'interleave')
},
});
Example (persistent vectors with LibSQL):
import { Agent, Memory, AiSdkEmbeddingAdapter } from "@voltagent/core";
import { LibSQLMemoryAdapter, LibSQLVectorAdapter } from "@voltagent/libsql";
import { openai } from "@ai-sdk/openai";
const memory = new Memory({
storage: new LibSQLMemoryAdapter({ url: "file:./.voltagent/memory.db" }),
embedding: new AiSdkEmbeddingAdapter(openai.embedding("text-embedding-3-small")),
vector: new LibSQLVectorAdapter({ url: "file:./.voltagent/memory.db" }),
});
// For ephemeral tests, use in‑memory DB:
// new LibSQLVectorAdapter({ url: ":memory:" }) // or "file::memory:"
How it works:
- On save, Memory embeds text parts of messages and stores vectors with metadata
{ messageId, conversationId, userId, role, createdAt }and ID patternmsg_${conversationId}_${message.id}. - On read with semantic search enabled, Memory searches similar messages and merges them with recent messages using the configured strategy.
Programmatic search:
memory.hasVectorSupport()→ booleanmemory.searchSimilar(query, { limit?, threshold?, filter? }) → Promise<SearchResult[]>
Memory Providers
VoltAgent achieves persistence via swappable storage adapters you pass to new Memory({ storage: ... }):
LibSQLMemoryAdapter: From@voltagent/libsql(LibSQL/Turso/SQLite)PostgreSQLMemoryAdapter: From@voltagent/postgresSupabaseMemoryAdapter: From@voltagent/supabaseInMemoryStorageAdapter: Default in-memory adapter (no persistence)
Optional components:
- Embeddings via
AiSdkEmbeddingAdapter(choose any ai-sdk embedding model) - Vector store via
InMemoryVectorAdapter(or custom)
How Memory Works with Agents
When you configure an Agent with a memory provider instance (or use the default), VoltAgent's internal MemoryManager automatically handles:
- Retrieval: Before generating a response (e.g., during
agent.generateText()), the manager fetches relevant conversation history or state from the memory provider based on the provideduserIdandconversationId. - Injection: This retrieved context is typically formatted and added to the prompt sent to the LLM, giving it the necessary background information.
- Saving: After an interaction completes, the new messages (user input and agent response) are saved back to the memory provider, associated with the same
userIdandconversationId.
This process happens seamlessly behind the scenes when using the agent's core interaction methods (generateText, streamText, generateObject, streamObject).
User and Conversation Identification
To separate conversations for different users or different chat sessions within the same application, you must provide userId and conversationId in the options when calling agent methods directly in your code. If you are interacting with the agent via the Core API, you can pass these same identifiers within the options object in your request body. See the API examples for details on the API usage.
When calling agent methods directly:
const response = await agent.generateText("Hello, how can you help me?", {
userId: "user-123", // Identifies the specific user
conversationId: "chat-session-xyz", // Identifies this specific conversation thread
});
These identifiers work consistently across all agent generation methods (generateText, streamText, generateObject, streamObject).
Examples
Default (in-memory)
import { Agent } from "@voltagent/core";
import { openai } from "@ai-sdk/openai";
const agent = new Agent({
name: "My Assistant",
instructions: "Uses default in-memory storage.",
model: openai("gpt-4o-mini"),
});
Persistent (LibSQL)
import { Agent, Memory } from "@voltagent/core";
import { LibSQLMemoryAdapter } from "@voltagent/libsql";
import { openai } from "@ai-sdk/openai";
const agent = new Agent({
name: "Persistent Assistant",
instructions: "Uses LibSQL for memory.",
model: openai("gpt-4o-mini"),
memory: new Memory({
storage: new LibSQLMemoryAdapter({ url: "file:./.voltagent/memory.db" }),
}),
});
Semantic Search + Working Memory
import { Agent, Memory, AiSdkEmbeddingAdapter, InMemoryVectorAdapter } from "@voltagent/core";
import { LibSQLMemoryAdapter } from "@voltagent/libsql";
import { openai } from "@ai-sdk/openai";
const memory = new Memory({
storage: new LibSQLMemoryAdapter({ url: "file:./.voltagent/memory.db" }),
embedding: new AiSdkEmbeddingAdapter(openai.embedding("text-embedding-3-small")),
vector: new InMemoryVectorAdapter(),
workingMemory: { enabled: true },
});
const agent = new Agent({
name: "Smart Memory Assistant",
instructions: "Retrieves with semantic search and tracks working memory.",
model: openai("gpt-4o-mini"),
memory,
});
How User and Conversation IDs Work
userId: A unique string identifying the end-user. This ensures memory isolation between different users. If omitted, it defaults to the string"default".conversationId: A unique string identifying a specific conversation thread for a user. This allows a single user to have multiple parallel conversations.- If provided: The agent retrieves and saves messages associated with this specific thread.
- If omitted: A new random UUID is generated for each request, effectively starting a new, separate conversation every time. This is useful for one-off tasks or ensuring a clean slate for each interaction when context isn't needed.
Key Behaviors:
- Context Retrieval: Before calling the LLM, the
MemoryManagerretrieves previous messages associated with the givenuserIdandconversationIdfrom the memory provider. - Message Storage: After the interaction, new user input and agent responses are stored using the same
userIdandconversationId. - Continuity: Providing the same
userIdandconversationIdacross multiple requests ensures the agent remembers the context of that specific thread. - New Conversations: Omitting
conversationIdguarantees a fresh conversation context for each request.
// To start a NEW conversation each time (or for single-turn interactions):
// Omit conversationId; a new one is generated automatically.
const response1 = await agent.generateText("Help with account setup", { userId: "user-123" });
const response2 = await agent.generateText("Question about billing issue", { userId: "user-123" }); // Starts another new conversation
// To MAINTAIN a continuous conversation across requests:
// Always provide the SAME conversationId.
const SUPPORT_THREAD_ID = "case-987-abc";
const responseA = await agent.generateText("My router is not working.", {
userId: "user-456",
conversationId: SUPPORT_THREAD_ID,
});
// Agent remembers the router issue for the next call with the same ID
const responseB = await agent.generateText("I tried restarting it, still no luck.", {
userId: "user-456",
conversationId: SUPPORT_THREAD_ID,
});
Context Management
When interacting with an agent that has memory enabled, the MemoryManager automatically retrieves recent messages for the given userId and conversationId and includes them as context in the prompt sent to the LLM.
// The agent automatically retrieves history for user-123/chat-session-xyz
// and includes up to N recent messages (determined by the provider/manager) in the LLM prompt.
const response = await agent.generateText("What was the first thing I asked you?", {
userId: "user-123",
conversationId: "chat-session-xyz",
// contextLimit: 10, // Note: contextLimit is typically managed by MemoryOptions now
});
How many messages are retrieved is often determined by the storageLimit configured on the Memory Provider or internal logic within the MemoryManager. This is crucial for:
- Coherence: Providing the LLM with enough history to understand the ongoing conversation.
- Cost/Performance: Limiting the context size to manage LLM token usage (cost) and potentially reduce latency.
- Relevance: Ensuring the context is relevant without overwhelming the LLM with excessive or old information.
Implementing Custom Memory Providers
To use a custom database or storage system, implement the StorageAdapter interface (@voltagent/core → memory/types). Observability is separate. The adapter only persists conversation messages, working memory, and workflow state for suspension/resume. Embedding and vector search are separate adapters.
Required methods (summary):
- Messages:
addMessage,addMessages,getMessages,clearMessages - Conversations:
createConversation,getConversation,getConversations,getConversationsByUserId,queryConversations,updateConversation,deleteConversation - Working memory:
getWorkingMemory,setWorkingMemory,deleteWorkingMemory - Workflow state:
getWorkflowState,setWorkflowState,updateWorkflowState,getSuspendedWorkflowStates
Implementation notes:
- Store
UIMessagevalues as data. Return messages in chronological order (oldest first). - Support
storageLimit. When the limit is exceeded, prune the oldest messages. - Working memory content is a string. When a schema is configured,
Memoryconverts the provided object to a JSON string before calling the adapter.
Skeleton:
import type {
StorageAdapter,
UIMessage,
Conversation,
CreateConversationInput,
ConversationQueryOptions,
WorkflowStateEntry,
WorkingMemoryScope,
} from "@voltagent/core";
export class MyStorageAdapter implements StorageAdapter {
// Messages
async addMessage(msg: UIMessage, userId: string, conversationId: string): Promise<void> {}
async addMessages(msgs: UIMessage[], userId: string, conversationId: string): Promise<void> {}
async getMessages(
userId: string,
conversationId: string,
options?: { limit?: number; before?: Date; after?: Date; roles?: string[] }
): Promise<UIMessage[]> {
return [];
}
async clearMessages(userId: string, conversationId?: string): Promise<void> {}
// Conversations
async createConversation(input: CreateConversationInput): Promise<Conversation> {
throw new Error("Not implemented");
}
async getConversation(id: string): Promise<Conversation | null> {
return null;
}
async getConversations(resourceId: string): Promise<Conversation[]> {
return [];
}
async getConversationsByUserId(
userId: string,
options?: Omit<ConversationQueryOptions, "userId">
): Promise<Conversation[]> {
return [];
}
async queryConversations(options: ConversationQueryOptions): Promise<Conversation[]> {
return [];
}
async updateConversation(
id: string,
updates: Partial<Omit<Conversation, "id" | "createdAt" | "updatedAt">>
): Promise<Conversation> {
throw new Error("Not implemented");
}
async deleteConversation(id: string): Promise<void> {}
// Working memory
async getWorkingMemory(params: {
conversationId?: string;
userId?: string;
scope: WorkingMemoryScope;
}): Promise<string | null> {
return null;
}
async setWorkingMemory(params: {
conversationId?: string;
userId?: string;
content: string;
scope: WorkingMemoryScope;
}): Promise<void> {}
async deleteWorkingMemory(params: {
conversationId?: string;
userId?: string;
scope: WorkingMemoryScope;
}): Promise<void> {}
// Workflow state
async getWorkflowState(id: string): Promise<WorkflowStateEntry | null> {
return null;
}
async setWorkflowState(id: string, state: WorkflowStateEntry): Promise<void> {}
async updateWorkflowState(id: string, updates: Partial<WorkflowStateEntry>): Promise<void> {}
async getSuspendedWorkflowStates(workflowId: string): Promise<WorkflowStateEntry[]> {
return [];
}
}
// Usage example
import { Agent, Memory } from "@voltagent/core";
import { openai } from "@ai-sdk/openai";
const memory = new Memory({
storage: new MyStorageAdapter(),
});
const agent = new Agent({
name: "Helper",
model: openai("gpt-4o-mini"),
memory,
});
Best Practices
- Choose the Right Adapter: Use
InMemoryStorageAdapterfor development/testing or stateless deployments. UseLibSQLMemoryAdapterfrom@voltagent/libsql(local/Turso) or a database-backed adapter (likePostgreSQLMemoryAdapterin@voltagent/postgresorSupabaseMemoryAdapterin@voltagent/supabase) for production persistence. - User Privacy: Be mindful of storing conversation data. Implement clear data retention policies and provide mechanisms for users to manage or delete their history (e.g., using
deleteConversationor custom logic) if required by privacy regulations. - Context Management: While
contextLimitis less directly used now, be aware of thestorageLimiton your memory provider, as this often dictates the maximum history retrieved. - Memory Efficiency: For high-volume applications using persistent storage, monitor database size and performance. Consider setting appropriate
storageLimitvalues on your memory provider to prevent unbounded growth and ensure efficient retrieval. - Error Handling: Wrap agent interactions in
try...catchblocks, as memory operations (especially with external databases) can potentially fail. - Use
userIdandconversationId: Always provide these identifiers in production applications to correctly scope memory and maintain context for individual users and conversation threads.
Explore the specific documentation for each provider to learn more: