Why would you bloat the (already crowded) context window with 27 tools instead of the 2 simplest ones: Save Memory & Search Memory? Or even just search, handling the save process through a listener on a directory of markdown memory files that Claude Code can natively edit?
People are just ricing out AI like they rice out Linux, nvim or any other thing. It's pretty simple to get results from the tech. Use the CLI and know what you're doing.
That's a great point, the reality is that context, at least from personal experience, is brittle and over time will start to lose precision. This is a always there, persistent way for claude to access "memories". I've been running with it for about a week now and did not feel that the context would get bloated.
While I totally agree with you, I also can see a world where we just throw a ton of calls in the MCP and then wrap it in a subagent that has a short description listing every verb it has access to.
I still do, but having this allows for strategies like memory decay for older information. It also allows for much more structured searching capabilities, instead of opening file which are less structured.
.md files work great for small projects. But they hit limits:
1. Size - 100KB context.md won't fit in the window
2. No search - Claude reads the whole file every time
3. Manual - You decide what to save, not Claude
4. Static - Doesn't evolve or learn
Recall fixes this:
- Semantic search finds relevant memories only
- Auto-captures context during conversations
- Handles 10k+ memories, retrieves top 5
- Works across multiple projects
Real example: I have 2000 memories. That's 200KB in .md form. Recall retrieves 5 relevant ones = 2KB.
And of course, there's always the option to use both .md for docs, Recall for dynamic learning.
I'm not sure. You don't use a single context.md file, you use multiple and add them when relevant in context. AIs adjust these as you need, so they do "evolve". So what you try to achieve is already solved.
These two videos on using Claude well explain what I mean:
Yeah that's a solid workflow and honestly simpler than what I built - I think Recall makes sense when you hit the scale where managing multiple .md files becomes tedious (like 50+ conversations across 10 projects), but you're right that for most people your approach works great and is way less complex.
Memory features are useful for the same reason that a human would use a database instead of a large .md file: it's more efficient to query for something and get exactly what you want than it is to read through a large, ultimately less structured document.
That said, Claude now has a native memory feature as of the 2.0 release recently: https://docs.claude.com/en/docs/claude-code/memory so the parent's tool may be too late, unless it offers some kind of advantage over that. I don't know how to make that comparison, personally.
So hilariously, I hadn't actually read those docs yet, I just knew they added the feature. It seems like the docs may not be up to date, as when I read them in response to your reply here, I was like "wait, I thought it was more sophisticated than that!"
It's still ultimately file-based, but it can create non-Claude.md files in a directory it treats more specially. So it's less sophisticated than I expected, but more sophisticated than the previous "add this to claude.md" feature they've had for a while.
Thanks for the nudge to take the time to actually dig into the details :)
The memory feature I'd like to have would need built-in support from anthropic
It'd be essentially
1. Language server support for lookups & keeping track of the code
2. Being able to "pin" memories to functions, classes, properties etc via the language server support/providing this context whenever changes are made in this function/class/properties etc, but not kept, so all following changes outside of that will no longer include this context (basically, changes that touch code with which memories will be done by agents with additional context, and only the results are synced back, not the way to achieve it)
3. Provide a ide integration for this context so you can easily keep track of what's available just by moving the cursor to the point the memory is pinned at
I think everyone concluded at this point that we need to improve models memory capabilities, but different people take different approach.
My experience is that ChatGPT can engage in a very thoughtful conversations but if I ask for a summary it makes something very generic, useful to an outsider, but it does not catch salient points which were the most important outcomes.
Yeah this is what I do, you want the knowledge in md files , but currently you don't want to stuff up the context with everything you know every time. I may be wrong here but my impression is the way that "context" is special and very limited in size vs "things the LLM is trained on" is still an unsolved problem getting AI to act like an "assistant" , AFAICT.
Imho you would have an easier sell if you separate knowledge into tiers: 1)overall design 2) coding standards 3) reasoning that lead to design 4) components and their individual structure 5) your current issue 6) etc
Your project becomes progressively more valuable the further you go down the list. The overall design should be documented and curated to onboard new hires. Documenting current issues is a waste of time compared to capturing live discussion, so Recall is super useful here.
I'm surprised Anthropic doesn't offer something like this server-side, with an API to control it. Seems like it'd be a lot more efficient than having client manually reworking the context and uploading the whole thing.
Imagine having 20 years of context / memories and relying on them. Wouldn't you want to own that? I can't imagine pay-per-query for my real memories and I think that allowing that for AI assisted memory is a mistake. A person's lifetime context will be irreplaceable if high quality interfaces / tools let us find and load context from any conversation / session we've ever had with an LLM.
On the flip side of that, something like a software project should own the context of every conversation / session used during development, right? Ideally, both parties get a copy of the context. I get a copy for my personal "lifetime context" and the project or business gets a copy for the project. However, I can't imagine businesses agreeing to that.
If LLMs become a useful tool for assisting memory recall there's going to be fighting over who owns the context / memories and I worry that normal people will lose out to businesses. Imagine changing jobs and they wipe a bunch of your memory before you leave.
We may even see LLM context ownership rules in employment agreements. It'll be the future version of a non-compete.
Memory is hard! I'm very curious how the version history approach is working for you?
Have you considered an age when retrieving? Is model supposed to manage the version history on its own?
Is the semantic search used to help with that?
> This will reduce token size, performance & operational costs.
How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.
That is what I am looking for. a) LLMs are trained using compressed text tokens and b) use compressed prompts. Don't know how..but that is what I was hoping for.
The whole point of embeddings and tokens are that they are a compressed version of text, a lower dimensionality. now, how low depends on performance, lower amount of vectors=more lossy (usually). https://huggingface.co/spaces/mteb/leaderboard
You can train your own with very very compressed, i mean you could even go down to each token=just 2 float numbers. It will train, but it will be terrible, because it can essentially only capture distance.
Prompting a good LLM to summarize the context is probably funnily enough the best way of actually "compressing" context
The problem is you need to tell prompt Claude to "Store" or "Remember", if you don't it will never call the MCP server. Ideally, Claude would have some mechanism to store memories without any explicit prompting but I don't think that's currently possible today.
imo it would be better to carry the whole memory outside of the inference time where you could use an LLM as a judge to track the output of the chat and the prompts submitted
it would sort of work like grammarly itself and you can use it to metaprompt
i find all the memory tooling, even native ones on claude and chatgpt to be too intrusive
I've been building exactly this. Currently a beta feature in my existing product. Can I reach out to you for your feedback on metaprompting/grammarly aspect of it?
Totally get what you're saying! Having Claude manually call memory tools mid-conversation does feel intrusive, I agree with that, especially since you need to keep saying Yes to the tool access.
Your approach is actually really interesting, like a background process watching the conversation and deciding what's worth remembering. More passive, less in-your-face.
I thought about this too. The tradeoff I made:
Your approach (judge/watcher):
- Pro: Zero interruption to conversation flow
- Pro: Can use cheaper model for the judge
- Con: Claude doesn't know what's in memory when responding
- Con: Memory happens after the fact
Tool-based (current Recall):
- Pro: Claude actively uses memory while thinking
- Pro: Can retrieve relevant context mid-response
- Con: Yeah, it's intrusive sometimes
Honestly both have merit. You could even do both, background judge for auto-capture, tools when Claude needs to look something up.
The Grammarly analogy is spot on. Passive monitoring vs active participation.
Have you built something with the judge pattern? I'd be curious how well it works for deciding what's memorable vs noise.
Maybe Recall needs a "passive mode" option where it just watches and suggests memories instead of Claude actively storing them. That's a cool idea.
OpenCog differentiates between Experiential and Episodic memory; and various processes rewrite a hypergraph stored in RAM in AtomSpace. I don't remember how the STM/LTM limit is handled in OpenCog.
So the MRU/MFU knapsack problem and more predictable primacy/recency bias because context length limits and context compaction?
> Economic Attention Allocation (ECAN) was an OpenCog subsystem intended to control attentional focus during reasoning. The idea was to allocate attention as a scarce resource (thus, "economic") which would then be used to "fund" some specific train of thought. This system is no longer maintained; it is one of the OpenCog Fossils.
(Smart contracts require funds to execute (redundantly and with consensus), and there there are scarce resources).
Now there's ProxyNode and there are StorageNode implementations, but Agent is not yet reimplemented in OpenCog?
Yeah it still uses context but way more efficiently, instead of injecting a 50KB context.md every time, Recall searches 10k memories and only injects the top 5 relevant ones (maybe 2KB), so you can store way more total knowledge.
Yeah people do that but it doesn't scale, after a while your "restart prompt" is 50KB and won't fit, plus you're stuck copying stuff manually instead of just asking "what did we say about Redis" and getting the relevant bits automatically.
I've been using it for a while now, personally. I've found that I have less issues with context, I can easily recall (pun intended) after a context compact, etc.
That's a great point! And also works really well for shared context between claude instances, for example, we use that for our business model in the company, all business rules and model is stored as memories in a central redis that the mcp connects to. The way that memories are stored are specific to a folder or global (similar to CLAUDE.md home directiory), but with this approach you can have an external redis where multiple claudes read and write into as a shared almost hive like memory.
I'm not seeing how this is any different than a standard vector database MCP tool. It's not like Claude is going to know about any of the things you told it to "remember" unless you explicitly tell it to use its memory tool like shown in the demo, to remember something you've stored.
Yep, me too. I've taken the reference memory mcp that anthropic release and bolted on pgsql, but with a bunch of other features that are specific to the app I'm building. Like user segmentation/isolation with RLS (app is multiuser) and some other entity relationship tracking things.
Why would you bloat the (already crowded) context window with 27 tools instead of the 2 simplest ones: Save Memory & Search Memory? Or even just search, handling the save process through a listener on a directory of markdown memory files that Claude Code can natively edit?
MCP's are toys for point-and-click devs that no self-respecting dev has any business using.
Case in point; I'm mostly a Claude user, which has decent background process / BashOutput support to get a long-running process's stdout.
I was using codex just now, and its processes support is ass.
So I asked it, give me 5 options using cli tools to implement process support. After 3 min back and forth, I got this: https://github.com/offline-ant/shellagent-tools/blob/main/ba...
Add single line in AGENTS.md.
> the `background` tool allows running programs in the background. Calling `background` outputs the help.
Now I can go "background ./server; try thing. investigate" and it has access to the stdout.
Stop pre-trashing your context with MCPs people.
People are just ricing out AI like they rice out Linux, nvim or any other thing. It's pretty simple to get results from the tech. Use the CLI and know what you're doing.
That's a great point, the reality is that context, at least from personal experience, is brittle and over time will start to lose precision. This is a always there, persistent way for claude to access "memories". I've been running with it for about a week now and did not feel that the context would get bloated.
While I totally agree with you, I also can see a world where we just throw a ton of calls in the MCP and then wrap it in a subagent that has a short description listing every verb it has access to.
How does Claude know when to try and remember?
Often memory works too well and crowds out new things, so how are you balancing that?
Some of the other similar tools just arbitrarily pick the 3,5 or 10 most relevant memory results, which seems awkward.
Why would you not use context files in form of .md? E.g. how the SpecKit project does it.
I still do, but having this allows for strategies like memory decay for older information. It also allows for much more structured searching capabilities, instead of opening file which are less structured.
.md files work great for small projects. But they hit limits:
1. Size - 100KB context.md won't fit in the window 2. No search - Claude reads the whole file every time 3. Manual - You decide what to save, not Claude 4. Static - Doesn't evolve or learn
Recall fixes this: - Semantic search finds relevant memories only - Auto-captures context during conversations - Handles 10k+ memories, retrieves top 5 - Works across multiple projects
Real example: I have 2000 memories. That's 200KB in .md form. Recall retrieves 5 relevant ones = 2KB.
And of course, there's always the option to use both .md for docs, Recall for dynamic learning.
Does that help?
I'm not sure. You don't use a single context.md file, you use multiple and add them when relevant in context. AIs adjust these as you need, so they do "evolve". So what you try to achieve is already solved.
These two videos on using Claude well explain what I mean:
1. Claude Code best practices: https://youtu.be/gv0WHhKelSE
2. Claude Code with Playwright MCP and subagents: https://youtu.be/xOO8Wt_i72s
Yeah that's a solid workflow and honestly simpler than what I built - I think Recall makes sense when you hit the scale where managing multiple .md files becomes tedious (like 50+ conversations across 10 projects), but you're right that for most people your approach works great and is way less complex.
Memory features are useful for the same reason that a human would use a database instead of a large .md file: it's more efficient to query for something and get exactly what you want than it is to read through a large, ultimately less structured document.
That said, Claude now has a native memory feature as of the 2.0 release recently: https://docs.claude.com/en/docs/claude-code/memory so the parent's tool may be too late, unless it offers some kind of advantage over that. I don't know how to make that comparison, personally.
Claude’s memory function adds a note to the file(s) that it reads on startup. Whereas this tool pulls from a database of memories on-demand.
So hilariously, I hadn't actually read those docs yet, I just knew they added the feature. It seems like the docs may not be up to date, as when I read them in response to your reply here, I was like "wait, I thought it was more sophisticated than that!"
The answer seems to be both yes and no: see their announcement on youtube yesterday: https://www.youtube.com/watch?v=Yct0MvNtdfU&t=181s
It's still ultimately file-based, but it can create non-Claude.md files in a directory it treats more specially. So it's less sophisticated than I expected, but more sophisticated than the previous "add this to claude.md" feature they've had for a while.
Thanks for the nudge to take the time to actually dig into the details :)
It's had native memory in the form of per-directory CLAUDE.md files for a while though. Not just 2.0
The memory feature I'd like to have would need built-in support from anthropic
It'd be essentially
1. Language server support for lookups & keeping track of the code
2. Being able to "pin" memories to functions, classes, properties etc via the language server support/providing this context whenever changes are made in this function/class/properties etc, but not kept, so all following changes outside of that will no longer include this context (basically, changes that touch code with which memories will be done by agents with additional context, and only the results are synced back, not the way to achieve it)
3. Provide a ide integration for this context so you can easily keep track of what's available just by moving the cursor to the point the memory is pinned at
Sadly impossible to achieve via MCP.
I think everyone concluded at this point that we need to improve models memory capabilities, but different people take different approach.
My experience is that ChatGPT can engage in a very thoughtful conversations but if I ask for a summary it makes something very generic, useful to an outsider, but it does not catch salient points which were the most important outcomes.
Did you notice the same problem?
I’ve started asking Claude to write tutorials that live in a _docs folder alongside my code.
Then it can reference those tutorials for specific things.
Interested in giving this a shot but it feels like a lot of infrastructure.
Yeah this is what I do, you want the knowledge in md files , but currently you don't want to stuff up the context with everything you know every time. I may be wrong here but my impression is the way that "context" is special and very limited in size vs "things the LLM is trained on" is still an unsolved problem getting AI to act like an "assistant" , AFAICT.
Imho you would have an easier sell if you separate knowledge into tiers: 1)overall design 2) coding standards 3) reasoning that lead to design 4) components and their individual structure 5) your current issue 6) etc
Your project becomes progressively more valuable the further you go down the list. The overall design should be documented and curated to onboard new hires. Documenting current issues is a waste of time compared to capturing live discussion, so Recall is super useful here.
I'm surprised Anthropic doesn't offer something like this server-side, with an API to control it. Seems like it'd be a lot more efficient than having client manually reworking the context and uploading the whole thing.
Who should own the context?
Imagine having 20 years of context / memories and relying on them. Wouldn't you want to own that? I can't imagine pay-per-query for my real memories and I think that allowing that for AI assisted memory is a mistake. A person's lifetime context will be irreplaceable if high quality interfaces / tools let us find and load context from any conversation / session we've ever had with an LLM.
On the flip side of that, something like a software project should own the context of every conversation / session used during development, right? Ideally, both parties get a copy of the context. I get a copy for my personal "lifetime context" and the project or business gets a copy for the project. However, I can't imagine businesses agreeing to that.
If LLMs become a useful tool for assisting memory recall there's going to be fighting over who owns the context / memories and I worry that normal people will lose out to businesses. Imagine changing jobs and they wipe a bunch of your memory before you leave.
We may even see LLM context ownership rules in employment agreements. It'll be the future version of a non-compete.
They do, new feature, not available in claude code but via API headers. https://docs.claude.com/en/docs/agents-and-tools/tool-use/me...
Memory is hard! I'm very curious how the version history approach is working for you? Have you considered an age when retrieving? Is model supposed to manage the version history on its own? Is the semantic search used to help with that?
Claude introduced it's own memories api.. have you had a look?
Yes I did, I worked on this a while back, before it was availabale I believe. I'll have another check. Thanks for the heads up
Wouldn't the cache over time also be filled up with irrelevant and redundant information?
I wish there was a way to send compressed context to LLMs instead of plain text. This will reduce token size, performance & operational costs.
> This will reduce token size, performance & operational costs.
How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.
That is what I am looking for. a) LLMs are trained using compressed text tokens and b) use compressed prompts. Don't know how..but that is what I was hoping for.
The whole point of embeddings and tokens are that they are a compressed version of text, a lower dimensionality. now, how low depends on performance, lower amount of vectors=more lossy (usually). https://huggingface.co/spaces/mteb/leaderboard
You can train your own with very very compressed, i mean you could even go down to each token=just 2 float numbers. It will train, but it will be terrible, because it can essentially only capture distance.
Prompting a good LLM to summarize the context is probably funnily enough the best way of actually "compressing" context
The problem is you need to tell prompt Claude to "Store" or "Remember", if you don't it will never call the MCP server. Ideally, Claude would have some mechanism to store memories without any explicit prompting but I don't think that's currently possible today.
imo it would be better to carry the whole memory outside of the inference time where you could use an LLM as a judge to track the output of the chat and the prompts submitted
it would sort of work like grammarly itself and you can use it to metaprompt
i find all the memory tooling, even native ones on claude and chatgpt to be too intrusive
I've been building exactly this. Currently a beta feature in my existing product. Can I reach out to you for your feedback on metaprompting/grammarly aspect of it?
Totally get what you're saying! Having Claude manually call memory tools mid-conversation does feel intrusive, I agree with that, especially since you need to keep saying Yes to the tool access.
Your approach is actually really interesting, like a background process watching the conversation and deciding what's worth remembering. More passive, less in-your-face.
I thought about this too. The tradeoff I made:
Your approach (judge/watcher): - Pro: Zero interruption to conversation flow - Pro: Can use cheaper model for the judge - Con: Claude doesn't know what's in memory when responding - Con: Memory happens after the fact
Tool-based (current Recall): - Pro: Claude actively uses memory while thinking - Pro: Can retrieve relevant context mid-response - Con: Yeah, it's intrusive sometimes
Honestly both have merit. You could even do both, background judge for auto-capture, tools when Claude needs to look something up.
The Grammarly analogy is spot on. Passive monitoring vs active participation.
Have you built something with the judge pattern? I'd be curious how well it works for deciding what's memorable vs noise.
Maybe Recall needs a "passive mode" option where it just watches and suggests memories instead of Claude actively storing them. That's a cool idea.
Is this the/a agent model routing problem? Which agent or subagent has context precedence?
jj autocommits when the working copy changes, and you can manually stage against @-: https://news.ycombinator.com/item?id=44644820
OpenCog differentiates between Experiential and Episodic memory; and various processes rewrite a hypergraph stored in RAM in AtomSpace. I don't remember how the STM/LTM limit is handled in OpenCog.
So the MRU/MFU knapsack problem and more predictable primacy/recency bias because context length limits and context compaction?
OpenCogPrime:EconomicAttentionAllocation: https://wiki.opencog.org/w/OpenCogPrime:EconomicAttentionAll... :
> Economic Attention Allocation (ECAN) was an OpenCog subsystem intended to control attentional focus during reasoning. The idea was to allocate attention as a scarce resource (thus, "economic") which would then be used to "fund" some specific train of thought. This system is no longer maintained; it is one of the OpenCog Fossils.
(Smart contracts require funds to execute (redundantly and with consensus), and there there are scarce resources).
Now there's ProxyNode and there are StorageNode implementations, but Agent is not yet reimplemented in OpenCog?
ProxyNode implementers: ReadThruProxy, WriteThruProxy, SequentialReadProxy, ReadWriteProxy, CachingProxy
StorageNode > Implementations: https://wiki.opencog.org/w/StorageNode#Implementations
I'm not super familiar with context and "memory", but adding context manually or via memory doesn't end up consuming context length either way?
Yeah it still uses context but way more efficiently, instead of injecting a 50KB context.md every time, Recall searches 10k memories and only injects the top 5 relevant ones (maybe 2KB), so you can store way more total knowledge.
Every single persistent memory feature is a persistence vector for prompt injection.
Why not just ask CC to write a prompt or Markdown file to re-start the conversation in a new chat?
Yeah people do that but it doesn't scale, after a while your "restart prompt" is 50KB and won't fit, plus you're stuck copying stuff manually instead of just asking "what did we say about Redis" and getting the relevant bits automatically.
If this delivers can be 100% game changer, I will try it out and give some feedback
I've been using it for a while now, personally. I've found that I have less issues with context, I can easily recall (pun intended) after a context compact, etc.
This is excellent for those of us who are building local AIs.
That's a great point! And also works really well for shared context between claude instances, for example, we use that for our business model in the company, all business rules and model is stored as memories in a central redis that the mcp connects to. The way that memories are stored are specific to a folder or global (similar to CLAUDE.md home directiory), but with this approach you can have an external redis where multiple claudes read and write into as a shared almost hive like memory.
Does it work with Valkey as well?
Yep! Valkey should work fine.
Recall just uses basic Redis commands - HSET, SADD, ZADD, etc. Nothing fancy.
Valkey is Redis-compatible so all those commands work the same.
I haven't tested it personally but there's no reason it wouldn't work. The Redis client library (ioredis) should connect to Valkey without issues.
If you try it and hit any problems let me know! Would be good to officially support it.
With redis? Why?
how did you benchmark this against much less convoluted solutions, like "a text file"?
how much better was this to justify all that extra complexity?
I'm not seeing how this is any different than a standard vector database MCP tool. It's not like Claude is going to know about any of the things you told it to "remember" unless you explicitly tell it to use its memory tool like shown in the demo, to remember something you've stored.
Heh, I'm building the same thing this week (albeit with postgres rather than redis). I bet like 15% of the people here are.
Yep, me too. I've taken the reference memory mcp that anthropic release and bolted on pgsql, but with a bunch of other features that are specific to the app I'm building. Like user segmentation/isolation with RLS (app is multiuser) and some other entity relationship tracking things.