We're rebooting the model-runner community and wanted to share what we've been up to and where we're headed.
When we first built this, the idea was simple: make running local models as easy as running containers. You get a consistent interface to download and run models from different backends (llama.cpp being a key one) and can even transport them using familiar OCI registries like Docker Hub.
Recently, we've invested a lot of effort into making it a true community project. A few highlights:
- The project is now a monorepo, making it much easier for new contributors to find their way around.
- We've added Vulkan support to open things up for AMD and other non-NVIDIA GPUs.
- We made sure we have day-0 support for the latest NVIDIA DGX hardware.
Docker model run is now part of my demos when deploying ml stack stuff, pretty sure that this is removing the entrypoint of using multiple tools to just do inference, this is great!
Really glad to see DMR getting "new life"... I’ve been experimenting with it for local agentic workloads (MAF, Google's ADK, cagent, Docker MCP, etc...) and it’s such a clean foundation...
A few things that could make it even more powerful (maybe some are out of your scope):
- Persistent model settings (context size, temperature, etc.) across restarts — right now it always resets to 4k, which breaks multi-turn agents.
- HTTP/gRPC interface to let tools and frameworks talk to DMR directly, not only through the CLI. (Here the issue is on Docker MCP side, right?)
- Simple config management (`docker model set` or `docker model config`) so we can tweak GPU, threads, precision, etc. predictably. (there are at least a couple of issues on this topic already...)
TBH, I love how fast the discussion evolved today.
Congrats and good luck with this.
I'll try to help, promised!
Hi everyone, we're the maintainers.
We're rebooting the model-runner community and wanted to share what we've been up to and where we're headed.
When we first built this, the idea was simple: make running local models as easy as running containers. You get a consistent interface to download and run models from different backends (llama.cpp being a key one) and can even transport them using familiar OCI registries like Docker Hub.
Recently, we've invested a lot of effort into making it a true community project. A few highlights:
- The project is now a monorepo, making it much easier for new contributors to find their way around.
- We've added Vulkan support to open things up for AMD and other non-NVIDIA GPUs.
- We made sure we have day-0 support for the latest NVIDIA DGX hardware.
Docker model run is now part of my demos when deploying ml stack stuff, pretty sure that this is removing the entrypoint of using multiple tools to just do inference, this is great!
Any new features you think we should add to further enhance your usage? Glad you find it useful
Nice, I really like the recent Vulkan support.
Thanks very much. It worked well for you? Which hardware? :) Any other feedback, keep it coming!
Really glad to see DMR getting "new life"... I’ve been experimenting with it for local agentic workloads (MAF, Google's ADK, cagent, Docker MCP, etc...) and it’s such a clean foundation...
A few things that could make it even more powerful (maybe some are out of your scope):
- Persistent model settings (context size, temperature, etc.) across restarts — right now it always resets to 4k, which breaks multi-turn agents. - HTTP/gRPC interface to let tools and frameworks talk to DMR directly, not only through the CLI. (Here the issue is on Docker MCP side, right?) - Simple config management (`docker model set` or `docker model config`) so we can tweak GPU, threads, precision, etc. predictably. (there are at least a couple of issues on this topic already...)
TBH, I love how fast the discussion evolved today.
Congrats and good luck with this. I'll try to help, promised!
Keep opening pull requests and issues, we need these things, you are right!
Awesome!
What did you like? Anything stand out?