A comprehensive, step-by-step guide to adding persistent memory to AI agents. Build agents that remember users, maintain context, and improve over time.
Every developer who has built AI agents has experienced this: you create a sophisticated agent with tools, reasoning, and multiple capabilities. It works perfectly in testing. But the moment a user returns the next day, the agent has no idea who they are, what they discussed previously, or any context from past interactions.
This is the "stateless" problem. By default, LLMs and AI agents have no memory between requests. Every conversation starts from scratch. This leads to:
Persistent memory solves these problems by giving agents the ability to remember, learn, and maintain context across sessions.
Not all memory is the same. Understanding the different types of memory helps you design the right architecture for your agents.
The foundational knowledge about who the agent is, its capabilities, boundaries, and behavioral guidelines. This rarely changes and is injected into every prompt.
The immediate conversation history and context currently being processed. Limited by the LLM's context window size.
Facts, preferences, and knowledge that persists across sessions. Stored in a database and retrieved as needed.
External information from documents, codebases, and data sources that the agent can reference.
There are several proven patterns for implementing agent memory. Choose based on your use case and requirements.
Store the last N messages in a buffer and include them in every prompt. Works for short conversations but hits context limits quickly.
const buffer = conversationHistory.slice(-10)
const prompt = `${systemPrompt}
${buffer}
User: ${currentInput}`Periodically summarize conversation history to compress it. Maintains context over longer sessions.
// Every N messages, summarize
const summary = await llm.summarize(buffer)
store({"type": "summary", "content": summary})
// Use summary instead of full historyStore memories as embeddings and retrieve relevant ones based on semantic similarity. Scales to large knowledge bases.
// Store memory
const embedding = await embed(userMemory)
vectorStore.add(embedding)
// Retrieve relevant memories
const query = "What does user prefer?"
const results = await vectorStore.search(query)Combine multiple approaches: buffer for recent context, vector search for long-term memory, and knowledge bases for external information.
// Assemble full context
const context = {
system: systemPrompt,
recent: buffer.slice(-5),
memories: await vectorSearch(userQuery),
knowledge: await knowledgeBase(query)
}
const response = await llm.complete(context)Here's how to implement persistent memory in your AI agents using RetainDB.
First, install the RetainDB SDK in your project:
npm install @whisper/sdkOr use npx for quick setup:
npx whisper-wizardSet up the RetainDB client with your project credentials:
import { RetainDBClient } from '@whisper/sdk'
const client = RetainDBClient.fromEnv()
// Or with explicit config
const client = new RetainDBClient({
apiKey: process.env.RETAINDB_API_KEY || process.env.WHISPER_API_KEY,
projectId: 'my-agent-project'
})After important interactions, store the memory:
// Store user preferences
await client.memory.add({
project: 'my-agent',
user_id: user.id,
content: 'User prefers detailed technical explanations',
memory_type: 'preference'
})
// Store important facts
await client.memory.add({
project: 'my-agent',
user_id: user.id,
content: 'Working on e-commerce checkout flow',
memory_type: 'context'
})
// Store decisions
await client.memory.add({
project: 'my-agent',
user_id: user.id,
content: 'Chose Stripe over PayPal for payments',
memory_type: 'decision'
})Before each LLM call, retrieve relevant memories:
const memories = await client.memory.search({
project: 'my-agent',
user_id: user.id,
query: 'How should I answer this user question?',
top_k: 5,
include_pending: true
})
// Use memories in your prompt
const context = memories
.map(m => `- ${m.content}`)
.join('\n')
const prompt = `User preferences:
${context}
User question: ${userQuestion}
Answer:```Bind user/session context once and use throughout:
// Create a bound client with run context
const run = client.withRunContext({
project: 'my-agent',
userId: user.id,
sessionId: session.id
})
// All operations now include context automatically
const memories = await run.memory.search({
query: 'user preferences'
})Choose the right storage backend based on your needs for speed, scale, and query flexibility.
Best for: Semantic search, finding similar memories, knowledge retrieval.
Best for: Fast lookups, exact matches, user profiles.
Best for: Relationship-heavy memory, knowledge graphs.
Best for: Keyword search, document retrieval.
How you retrieve memories is just as important as how you store them.
Search memories using a semantic query. Best for finding relevant context.
const memories = await client.memory.search({
query: "What are user's code style preferences?",
top_k: 3
})Retrieve memories from specific time periods. Useful for recent context.
const memories = await client.memory.search({
user_id: user.id,
time_range: {
start: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000)
}
})Filter memories by type (preference, context, decision, fact).
const preferences = await client.memory.search({
user_id: user.id,
memory_types: ['preference']
})Combine multiple strategies for best results.
const memories = await client.memory.search({
query: userQuery,
user_id: user.id,
time_range: { last: '30d' },
memory_types: ['preference', 'context'],
top_k: 5
})RetainDB integrates with popular agent frameworks for easy memory implementation.
Memory writes should be asynchronous to avoid blocking agent responses. RetainDB handles this automatically with write queues.
Ensure newly written memories are immediately readable. Use RetainDB's include_pending option for read-after-write visibility.
Implement strategies to remove outdated or irrelevant memories. Too much memory can hurt retrieval quality.
Vector storage and embedding generation have costs. Implement retention policies and batch operations to optimize.
Only store memories that matter. Not every message needs to be remembered.
Categorize memories (preference, context, decision) for better retrieval.
Set TTLs on memories that become stale over time.
Regularly evaluate if retrieved memories are actually relevant.
Set budgets and alerts for storage and embedding generation.
Start with 7 days free. No credit card required.
These pages reinforce the persistent-memory cluster around implementation, benchmarks, and comparisons.
The main product page for persistent memory and user continuity.
The benchmark proof behind the memory claims.
A framework-specific tutorial for durable memory.
How to separate chat history from durable memory.
A buyer-facing comparison page for persistent memory searches.
A practical use case that users notice immediately.