I’ve been thinking about a problem that keeps resurfacing as AI systems become more autonomous and long running, but doesn’t seem to be discussed much outside of implementation details.
Most AI systems today treat memory as ephemeral. Context is fetched from a remote store, used briefly, and discarded. Persistence is something you layer on later, usually via a network call to a database that sits outside the reasoning loop. This model works reasonably well when interactions are short lived and connectivity is assumed.
It starts to feel fragile when systems are expected to run continuously, survive restarts, operate offline, or reason repeatedly over long histories. In those cases, memory access becomes part of the critical path rather than a background concern.
What struck me while working on this is that many performance and cost problems people attribute to “scale” are really consequences of where memory lives. If every recall requires a network hop, then latency, reliability, and cost are inherently coupled to usage. You can hide that with caching and batching, but the constraint never goes away.
We’ve been exploring an alternative approach where persistence is treated as part of the hot path instead of something bolted on. Memory lives locally alongside the application, survives restarts by default, and is accessed at hardware speed. Once retrieval stops leaving the machine, a few second order effects emerge that surprised us. Cost stops scaling with traffic. Recovery stops being an operational event. Systems behave the same whether they are online, offline, or at the edge.
I’m very early in this commercially and building it with a co founder, but before locking in assumptions I wanted to sanity check the architectural framing with people here. Does this line up with how others see AI systems evolving, or do you think the current model of ephemeral memory plus remote persistence is still the right long term abstraction?
I’ve documented the architecture and tradeoffs of what we’ve built so far here for anyone who wants concrete details.
I’m much more interested in the discussion than the implementation itself.
I'm approaching this problem via recursive task decomposition [0] mostly for coding but once that is polished, I can imagine it generalizing to other workloads. One idea I've been meaning to explore for a while is using the principles from Erlang/OTP [1] and mapping it onto multi-agent systems. Some nodes are supervisor trees that coordinate and handle restarts, gracefully degrade, are easy to observe + steer.
What customer pain point are you trying to solve? It’s not super clear to me. Agentic memory, just like a mmap region, isn’t ephemeral unless you want it to be.
In my head, I substitute the word “storage” for “agentic memory”, because that’s really what the latter is, once you strip away all the flourish. So what kind of new storage innovation is this?
I’ve been thinking about a problem that keeps resurfacing as AI systems become more autonomous and long running, but doesn’t seem to be discussed much outside of implementation details.
Most AI systems today treat memory as ephemeral. Context is fetched from a remote store, used briefly, and discarded. Persistence is something you layer on later, usually via a network call to a database that sits outside the reasoning loop. This model works reasonably well when interactions are short lived and connectivity is assumed.
It starts to feel fragile when systems are expected to run continuously, survive restarts, operate offline, or reason repeatedly over long histories. In those cases, memory access becomes part of the critical path rather than a background concern.
What struck me while working on this is that many performance and cost problems people attribute to “scale” are really consequences of where memory lives. If every recall requires a network hop, then latency, reliability, and cost are inherently coupled to usage. You can hide that with caching and batching, but the constraint never goes away.
We’ve been exploring an alternative approach where persistence is treated as part of the hot path instead of something bolted on. Memory lives locally alongside the application, survives restarts by default, and is accessed at hardware speed. Once retrieval stops leaving the machine, a few second order effects emerge that surprised us. Cost stops scaling with traffic. Recovery stops being an operational event. Systems behave the same whether they are online, offline, or at the edge.
I’m very early in this commercially and building it with a co founder, but before locking in assumptions I wanted to sanity check the architectural framing with people here. Does this line up with how others see AI systems evolving, or do you think the current model of ephemeral memory plus remote persistence is still the right long term abstraction?
I’ve documented the architecture and tradeoffs of what we’ve built so far here for anyone who wants concrete details.
I’m much more interested in the discussion than the implementation itself.
I'm approaching this problem via recursive task decomposition [0] mostly for coding but once that is polished, I can imagine it generalizing to other workloads. One idea I've been meaning to explore for a while is using the principles from Erlang/OTP [1] and mapping it onto multi-agent systems. Some nodes are supervisor trees that coordinate and handle restarts, gracefully degrade, are easy to observe + steer.
[0] https://github.com/adagradschool/scope [1] https://erlang.org/download/armstrong_thesis_2003.pdf
What customer pain point are you trying to solve? It’s not super clear to me. Agentic memory, just like a mmap region, isn’t ephemeral unless you want it to be.
In my head, I substitute the word “storage” for “agentic memory”, because that’s really what the latter is, once you strip away all the flourish. So what kind of new storage innovation is this?
This is an ad