RAG vs. Agent Memory: what RAG is, how to add it to Claude Code, and how it differs from Pam Memory

Fair question. RAG is everywhere right now, the word gets applied to basically everything, and honestly the confusion is understandable — both RAG and memory are about giving agents context they don't have by default. But the difference matters more than most people realize, usually around the time they discover their agent has been confidently answering from a three-week-old index.

Here's the breakdown.

RAG

Retrieves passages from a knowledge store you built and pastes them into the prompt. It's only as current as your last re-index. One question: "find me text relevant to this query."

Pam Memory

Builds its own model of your world from real data, keeps it current as things happen, resolves contradictions, and learns from use. Different question: "what do you know about this person, this deal, this company right now?"

They're not competitors. Strong agents use both: RAG for knowledge, memory for continuity.

What is RAG?

RAG gives a language model information it wasn't trained on, at the moment it answers.

Instead of hoping the model memorized your docs, you keep those docs in a searchable index. When a question comes in, you retrieve the most relevant chunks, add them to the prompt, and the model generates a reply from them. Retrieval. Augmented. Generation. That's the whole thing.

It exists because raw LLMs have two real limits: a training cutoff and a context window. RAG handles both by fetching the relevant slice of your knowledge on demand, rather than stuffing everything into the prompt or fine-tuning a new model every time a doc changes.

How RAG actually works

Five steps, same in every implementation:

Chunk: Split documents into passages small enough to retrieve precisely — a paragraph, a section.
Embeddings: Run each chunk through an embeddings model so similar meaning lands near similar coordinates in vector space.
Store: Keep those vectors in a vector database: FAISS, pgvector, Pinecone, whatever fits.
Retrieve: Embed the incoming query, pull the nearest chunks by similarity, often with a re-ranking pass.
Augment and generate: Paste those chunks into the prompt and let the model answer.

Most of the quality lives in steps 1 and 4. How you chunk, and whether retrieval surfaces the right passage instead of a plausible-looking wrong one. The rest is wiring.

How to connect RAG to Claude Code

Claude Code already reads the files in your project. That's not RAG — that's file access. Real RAG gives Claude Code a retrieval tool over an index you control.

The cleanest way: Model Context Protocol (MCP). You run a small server that exposes a search tool over your vector store, and Claude Code calls it when it needs context.

Build the index: Chunk your docs, embed them, load the vectors into a store.
Expose retrieval as an MCP server: Wrap the query-embed-search-return flow in a tool — something like search_docs(query) returning matched passages and their sources.
Register it:

# register your retrieval MCP server
claude mcp add docs-rag -- node ./mcp-rag-server.js

# confirm it's wired up
claude mcp list

Done. When you ask Claude Code something that needs your knowledge base, it calls search_docs, gets the chunks, and answers from them.

One thing worth knowing before you start: if you only need the model to see your current repo, just let it read files. Reach for RAG when the knowledge is bigger than the context window, lives outside the project, or changes often enough that you want one indexed source of truth rather than hoping the agent reads the right file.

Want to see it in action? Two videos worth watching:

Claude Code + Agentic RAG + MCP in action

Edward Donner walks through the full setup end-to-end — probably the clearest walkthrough available right now.

Use NotebookLM as a RAG system in Claude Code with MCP

A lower-friction entry point if you want to query a large doc collection without building your own pipeline from scratch.

Where RAG stops being enough

RAG does one thing well: finding text relevant to a query. For a random factual question, that's fine. For an agent that's supposed to work alongside you for months, it starts breaking in ways that feel maddening because they're so quiet.

It's only as current as your last re-index.

This is the one that actually costs people. You spend a week wiring up the pipeline, it works great, and three weeks later you realize the agent has been confidently citing information that was updated the day after you indexed it. Nobody noticed because it sounded right.

It returns matches, not truth.

Ask about a contract price and RAG will hand back the old version and the new one. It ranks by similarity, not accuracy. Figuring out which one is right is your problem.

It doesn't decide what's worth remembering.

RAG retrieves what you put in. It won't notice that a deal slipped, a preference changed, or a key person left the account — unless you've explicitly built that logic into the pipeline. Most people haven't, because it turns out that's basically the whole hard part.

It's read-only.

Classic RAG never writes back. The agent can't get smarter about you over time. Every session starts cold.

None of this is a design flaw — RAG was never supposed to solve these problems. But people reach for it like it will, and then wonder why their agent still feels amnesiac. This is the gap. Retrieval is a component, part of a bigger system called Memory.

What is Pam Memory?

Pam is a self-onboarding memory for AI agents.

No profile to fill out, no index to hand-build. Pam reads the data you already produce — email, messages, meetings — and constructs a living model of your context: who the people are, what the deals are, what's true now and what changed since last week. It updates as new events arrive. When facts conflict, it acts proactively to resolve them rather than surfacing both and leaving the agent to guess.

The underlying principle is that memory is a system, not a document. A static file goes stale. A static index bloats and can't scope what's relevant. Real memory decides what to store, keeps it current, and writes back what it learns.

RAG vs. Pam Memory, side by side

Dimension	RAG	Pam Memory
What it is	A retrieval layer over a vector store	A self-onboarding, living memory system
Source of truth	Documents you chunk and index	Your real working data, modeled continuously
Freshness	As fresh as the last re-index	Updates as new events happen
Conflicting facts	Returns all matches, stale included	Works to resolve what's true now
Scope	Global similarity search	Scoped to person, deal, context
Learns over time?	No — read-only retrieval	Yes — writes back what it learns
Setup	You own chunking, indexing, re-indexing	Onboards itself; no profile to fill
Best at	"Find passages about X"	"Know who or what this is, right now"

The analogy I keep coming back to (it's a bit reductive, but it holds up): RAG is a librarian who fetches matching pages from shelves you stocked. Memory is a colleague who's been in the room with you for months. They don't retrieve — they just know. And they remember what happened last Tuesday, including the part where the other version of that contract got superseded.

Do you need both?

Almost always, yes.

RAG for knowledge

Manuals, policies, docs, anything large and reference-like you want the agent to cite accurately.

Memory for continuity

The evolving, person-specific, contradiction-prone context of your actual work that no static index can keep current.

When they work together, it stops feeling like you're interrogating a search engine and starts feeling like working with someone who's actually been paying attention. That's the difference worth building toward.

Pam by Harmix — Memory is a system.

See how Pam builds a living model of your context — who the people are, what the deals are, what's true now.

Frequently asked questions

Is Pam Memory the same as RAG?

No — though I understand why people ask. RAG searches a static document store and pastes matching passages into the prompt. Pam builds and continuously updates its own model of your context, resolves conflicting facts, and learns across sessions. One is a retrieval step. The other is the system around it.

Can RAG remember conversations?

Only if you store each conversation and re-index it, and even then it's still retrieval — it won't reconcile what changed or decide what mattered. Memory does that by design. The difference is whether the system is passive (fetch when asked) or active (notice things and update).

How do I add RAG to Claude Code?

Index your docs into a vector store, expose a retrieval tool via MCP, register it with claude mcp add. Full steps above, and the Edward Donner video linked in that section is worth watching if you prefer to see it first.

What's the difference between memory and a vector database?

A vector database is storage plus similarity search. Memory is the system that decides what to store, keeps it current, scopes it, and resolves conflicts. A vector DB can sit inside a memory system — it's just not one by itself.

Does memory replace RAG?

No. Complementary: RAG for reference knowledge, memory for evolving context. Use both.