I get Kimi through OpenCode Zen (kind of like openrouter for the OpenCode harness), periodically top up $20 and laugh every time I see my balance go down by 3 cents for something I would have happily paid someone $30.
I like the GLM coding plan before they raised their prices, now their rate limits are more strict as they are compute constrained. It is still a good deal for 1/3 the price of Claude for the same quality.
I like chutes. I think I get about 5K prompts per day for $20/m, though they may have stricter limits for new customers.
This gives you practically unlimited usage of frontier models like kimi, deepseek, glm.
Their models are always fullsize, never quantised except where the lab themselves provides an 4bit or 8bit model. You can see from the model config exactly which hf model it pulls and the serving co figuration used.
Prompts are encrypted using Trusted Execution Environment (TEE). So neither a model host or neighbour can view your prompts. That's as close as you can get to local level privacy in the cloud.
I tried looking into Chutes just now. Seems like there is no easy way to just pay & start using it with OpenCode or Claude Code, right? Their docs don’t seem to mention it. Do I really have to execute code with their API in order to use the models?
No its super easy. I think the confusion is due to the serving and hosting APIs that let you add your own GPUs to a pool and earn money. But for regular inference they have an openai responses API a basic chat app. You can signup to a $3 subscription, or deposit $5 and use your api key.
I've been using the GLM coding plan. During off-peak hours, usage for GLM-5.1 uses only 1x of your quota, which means you get Sonnet-like quality for a fraction of the cost.
nous portal or openrouter with a harness that uses intelligent multi provider requests,a local memory system, and pre-sub context compaction on input. if you do similar stuff often your token usage will drop after awhile of using a memory subsystem like hindsight or honcho quite a bit, and even more if you're using your harness to build relevant skills for the repeated tasks.
Which DeepSeek plan did you use? I been trying to find a DeepSeek for a while but with no success. I tried to use Claude $20 plan before, token burn like it is air, would be quite hard to believe anything else would burn so fast?
why not cursor? cursor seems pretty decent at what it does and i have had very few issues since i started using it about a couple of months back. But yes, frontend burns through tokens, else no issues with backend and devops
I get Kimi through OpenCode Zen (kind of like openrouter for the OpenCode harness), periodically top up $20 and laugh every time I see my balance go down by 3 cents for something I would have happily paid someone $30.
I like the GLM coding plan before they raised their prices, now their rate limits are more strict as they are compute constrained. It is still a good deal for 1/3 the price of Claude for the same quality.
I like chutes. I think I get about 5K prompts per day for $20/m, though they may have stricter limits for new customers.
This gives you practically unlimited usage of frontier models like kimi, deepseek, glm. Their models are always fullsize, never quantised except where the lab themselves provides an 4bit or 8bit model. You can see from the model config exactly which hf model it pulls and the serving co figuration used.
Prompts are encrypted using Trusted Execution Environment (TEE). So neither a model host or neighbour can view your prompts. That's as close as you can get to local level privacy in the cloud.
I tried looking into Chutes just now. Seems like there is no easy way to just pay & start using it with OpenCode or Claude Code, right? Their docs don’t seem to mention it. Do I really have to execute code with their API in order to use the models?
No its super easy. I think the confusion is due to the serving and hosting APIs that let you add your own GPUs to a pool and earn money. But for regular inference they have an openai responses API a basic chat app. You can signup to a $3 subscription, or deposit $5 and use your api key.
https://chutes.ai/app/chute/2ff25e81-4586-5ec8-b892-3a6f3426...
curl -X POST \ https://llm.chutes.ai/v1/chat/completions \ -H "Authorization: Bearer $CHUTES_API_TOKEN" \ -H "Content-Type: application/json" \ -d ' { "model": "moonshotai/Kimi-K2.5-TEE", "messages": [ { "role": "user", "content": "Tell me a 250 word story." } ], "stream": true, "max_tokens": 1024, "temperature": 0.7 }'
I've been using the GLM coding plan. During off-peak hours, usage for GLM-5.1 uses only 1x of your quota, which means you get Sonnet-like quality for a fraction of the cost.
nous portal or openrouter with a harness that uses intelligent multi provider requests,a local memory system, and pre-sub context compaction on input. if you do similar stuff often your token usage will drop after awhile of using a memory subsystem like hindsight or honcho quite a bit, and even more if you're using your harness to build relevant skills for the repeated tasks.
Do you have a harness recommendation? Sounds like maybe you’re into Hermes?
not good. I use DeepSeek's plan, Kimi AI, OpenRouter and it seemly consumes more tokens, than Claude's.
I consume Claude ~30% per day in of, 1 week, Max,x20. Equivalent in Kimi Ai, is I consume 60% in one day, in one week.
DeepSeek/Latest, 95% discount, with cache, I rack up ~$60/day before I stopped.
I don't know how Claude compute their daily limits, it seems much cheaper.
Which DeepSeek plan did you use? I been trying to find a DeepSeek for a while but with no success. I tried to use Claude $20 plan before, token burn like it is air, would be quite hard to believe anything else would burn so fast?
I'm using the deepseek-v4-pro model is currently offered at a 75% discount. My bad it's 75% discount, via OpenRouter.
I use the Claude-Max-20 ($200) plan. I manage to max it out 2 weeks. Planning to move to maybe multiple accounts.
I use C++ and Claude for big code-base.
[dead]
why not cursor? cursor seems pretty decent at what it does and i have had very few issues since i started using it about a couple of months back. But yes, frontend burns through tokens, else no issues with backend and devops
Antigravity?
[flagged]
[flagged]
[dead]
[flagged]