Cursor has the best Tab model, and I feel like their lead there has kept growing - they're doing some really cool things there. https://cursor.com/blog/tab-rl
I wonder how much the methods/systems/data transfer, if they can pull off the same with their agentic coding model that would be exciting.
I feel like that's like having a lead in producing better buggy whips.
I run Claude Code in the background near constantly for a variety of projects, with --dangerously-skip-permissions, and review progress periodically. Tabbing is only relevant when it's totally failing to make progress and I have to manually intervene, and that to me is a failure scenario that is happening less and less often.
This is just a completely different use of LLMs and has little to do with working at a real business with a live site and users. Cursor is great when you want to gain understanding of an issue quickly, or resolve something clear and specific quickly.
I'm not against YOLO vibe coding, but being against tab completion is just insane to me. At the end of the day, LLMs help you achieve goals quicker. You still need to know what goal you want to achieve, and tab completion basically let's me complete a focused goal nearly as soon as I determine what my goal is.
The lack of transparency here is wild. They aggregate the scores of the models they test against, which obscures the performance. They only release results on their own internal benchmark that they won't release. They talk about RL training but they don't discuss anything else about how the model was trained, including if they did their own pre-training or fine-tuned an existing model. I'm skeptical of basically everything claimed here until either they share more details or someone is able to interpedently benchmark this.
I am an ML researcher at Cursor, and worked on this project. Would love to hear any feedback you may have on the model, and can answer question about the blog post.
Impressive systems write-up. A question: if Composer is an RL finetune on an open model, why keep weights closed? The edge from a slightly better checkpoint erodes quickly in this market, it's not a durable advantage. Composer protects Cursor's margins from being squeezed by the big AI labs, but that is true whether the weights are open or closed, and I think Cursor would have more lasting benefit by generating developer goodwill than from a narrow, short-lived advantage. But, that's just my opinion. I personally find it hard to get excited about yet-another proprietary model. GPT-5 and Sonnet 4.5 are around when I need one of those, but I think the future is open.
Do you have any graphs handy that kind of replicates the one used first in the blog post but a bit less ambiguous, maybe without model grouping? I feel like it would have been a bit more fair to include proper names, and individualize them rather than group everything together by something, and then present your own model on its own.
Why did you stop training shy of the frontier models? From the log plot it seems like you would only need ~50% more compute to reach frontier capability
Makes sense! I like that you guys are more open about it. The other labs just drop stuff from the ivory tower. I think your style matches better with engineers who are used to datasheets etc. and usually don't like poking a black box
Oh good question. Actually speaking at the Ray Summit next week in SF so we will talk more about it. We used Ray throughout the pipeline for running evals, for the RL controller, for data collation, and for visualizations. One tool we found helpful was Ray Data which let us easily scale over data and run logs.
We use Ray data for our map-style processing jobs. For example one tool have runs over all the rollouts from the RL system and collects qualitative statistics to understand which type of agent trajectories are being reward, and what types of searches and terminal commands are being made.
Cheetah was an earlier (and dumber) version of this model that we used to test production speed. They are both developed in-house. If you liked Cheetah, give this model a try.
This is nice. I liked Cheetah for grunt work that I want to get out quickly and is not too hard. The speed is really awesome. A model that would run at even higher speeds like the OSS models at groq/cerebras would really be workflow changing, because the slowness of SOTA models really breaks the flow. I find myself taking a ton of breaks and getting distracted while I wait for a model to complete a task (e.g. just now).
Awesome, thanks for the clarification. So are the rumors around Cheetah being based on a Grok model just straight up untrue? I want to try Composer but have a pretty strict no X/Grok policy.
I prefer the approach of focusing on faster models despite their lower intelligence because I want my IDE to fly when I can see the code. I find this useful when I need to manually debug something that any model is able to do, so I know it's going to fail but at least it will fail fast. On the other hand, if I need more intelligence I have my other CLI that doesn't allow me to see the code but gets the planning and difficult code done.
Our view is that there is a now a minimal amount of intelligence that is necessary to be productive, and that if you can pair that with speed that is awesome.
Maybe I'm an outlier but Sonnet 4.5 quality is about as low as I'm willing to go.
It's generation speed is not the problem or the time sink.
It's wrestling with it to get the right output.
---
And just to clarify as maybe I misunderstood again but people are comparing cursor to Claude Code and codex etc here- isn't this whole article all cursor just using different models?
Yup - just like sibling comment said - my "low bar" is going to be whatever the best model is that isn't unreasonably costly/expensive.
Speed of model just isn't the bottleneck for me.
Before it I used Opus 4.1, and before that Opus 4.0 and before that Sonnet 4.0 - which each have been getting slightly better. It's not like Sonnet 4.5 is some crazy step function improvement (but the speed over Opus is definitely nice)
because engineering is the art of "good enough" and composer is clearly "good enough but a lot faster" which makes up for intelligence gaps in interesting ways
Same... I've found that using a non-Claude model just ends up being more expensive and not worth it. "Auto" tokens are hardly free, and I've had plenty of experiences putting "Auto" to work on a "simple" seeming task to have it consume like 1 USD of tokens quite quickly while producing nothing of value, when I'd replay with Claude 4.5 Sonnet non-thinking and it would provide a solid solution for 0.5 USD.
I'll try it out! I haven't yet - just generally conveying my opinion that I personally weigh "better model" much more important than speed, assuming some "fast enough"
Also, didn't realize you worked at Cursor - I'm a fan of your work - they're lucky to have you!
There’s two different kinds of users, on one side people are more hands off and want the model to autonomously handle longer tasks on its own with minimal guidance, and on the other side is users who want to interactively collaborate with the model to produce desired results. Speed matters much more for the second case, where you know what you want and just want the model to implement whatever you had in mind as quick as possible. Intelligence/ability matters more for the first case when you don’t have full understanding of all the code. I think it’s context dependent for me where more serious work tends to be more interactive. The intelligence of a model doesn’t make up for issues due to lack of context to me.
The reason I pulled out the comparison is to highlight how serious they are about all the important parts that make or break the AI coding experience - speed being very important to me. I’d rather catch my model doing the wrong thing quickly than having a higher chance of one-shotting it at the cost of having to do a lot of specification upfront.
While I am excited to see a new model, I am skeptical when there is so much vagueness - charts with "frontier models" without actually spelling out which ones, charts with no numbers (time axis, or in one chart - entirely).
People on here love to be contrarian about Cursor, but I’ve tried all the popular alternatives (Copilot, Claude Code, Codex, Gemini CLI, Cline) and found Cursor’s overall experience to just be unmatched. A big part of that is its speed, another its reliability.
It’s the only coding agent I’m actually really motivated to use out of the box because it really does make me feel more productive while the others keep messing up the project, from way too large changes I didn’t ask for all the way to constant syntax and request errors.
It’s the only coding agent I’ve used that feels serious about being a product rather than a prototype. Their effort in improving their stack is totally paying off.
I dropped cursor for the precise reason you mention: reliability.
Countless times my requests in the AI chat just hang there for 30+ seconds more until I can retry them.
When I decided to give Claude Code a try (I thought I didn't need it because I used Claude in Cursor) I couldn't believe how faster it was, and literally 100% reliable.
EDIT: given today's release, decided to give it a go. The Composer1 model _is_ fast, but right at the second new agent I started I got this:
> Connection failed. If the problem persists, please check your internet connection or VPN
Sounds like you have a network problem. Did you try checking the network diagnostic in settings? They default to http2 which can throw a wrench in some corporate networks.
I would be willing to bet money your issue is on your side. I am a daily user since the beginning and cannot recall when I have had issues like you describe unless it was related to my corp network.
This is the exact reason I left Cursor for Claude Code. Night and day difference in reliability. The Windows experience might be especially bad, but it would get constantly hung or otherwise fail when trying to run commands. I also had to babysit Cursor and tell it to continue for mid sized tasks.
I use cursor daily, my business partner uses CC. Without a doubt, CC is certainly better, I'm just not willing to let go of the flow I spent the last year fine tuning. I'll probably make the leap after we finish the latest release.
I too have tried them all and have settled with Cursor being the best. That said I see the current space split between folks like me who know generally what I want built and appreciate a tool that helps me get to goal quicker and on the otherwise of the spectrum, folks who want the tool to orchestrate most of the engineering. I have no opinion on which is better but for me I sit on the first camp. In that camp Cursor is by far the best tool.
Yep, it just works seamlessly. Sure, it hangs sometimes, but their UI allows you to retry or undo changes to an earlier point in the conversation easily. The autocompletion is nice as well and pretty satisfying to tab through the small and menial things when refactoring.
Absolutely. CC can be tuned to not do too much crap on its own, but even with the new extension its IDE integration and multi thread management are still significantly worse, as is its status reporting, which I find to be very important.
Also, somehow magically, I’ve found Cursor’s Auto mode to be significantly faster than the specific models I’ve tried, Claude being among them.
If a few weeks is months I would agree I think the change to Auto was 2-3+ months ago when they moved to charging named models and higher limits on Auto.
Auto is only good for trivial stuff at this point. It is quite subpar at everything else. Th is is probably because it almost always defaults to Claude Sonnet 3.5 (which you can tell if you ask the agent to identify itself and tell you its version), and that is pretty outdated.
Again it goes back to what your workflow is. I don’t think trivial is the right word. I use auto to write fairly advanced code but I do it in bite size chunks or relatively bite size. So thinking function level or a couple of interdependent functions ruins being written.
I would agree it is not as good on doing lengthy work where it’s taking design all the way through implementing a feature in a single shot but trivial is not a good description.
I also don’t think you’re right. 3.5 was recently deprecated and even before then, Cursor has been hitting rate limits with Anthropic. Auto is as much a token cost optimization as it is a rate limit optimization.
Absolutely. I actually don’t understand the preference folks have for Claude code. I don’t find it that powerful. That said, I think some of it comes down to preference and work context.
There are lots of good models we like here. But we agree that getting the right point on the smart+fast graph can make agentic coding feel really good.
I love Cursor. I've tried Copilot/Claude/etc. but keep coming back to Cursor. I just want to work, and Cursor tab complete is dang accurate, esp. for refactoring tasks.
I tried going back to VS Code + Copilot a month ago. I only lasted 4 days because it was to bad. It was super slow and gave poor suggestions, but mostly it just flat out did not suggest anything. Cursor feels snappy in comparison and the suggestions are more often than not useful. The most annoying thing about Cursor tab complete, is that it is so fast that when I am doing something unusual then it will keep on jumping in with useless suggestions. They have a snooze function for this though.
I love cursor, the tab completion and agent mode. But I really dislike vscode after using intellij for so many years. I really wish the underlying editor was better, or I could get cursor features in intellij instead. The editing of the files is mostly fine, but its everything else around it that a full IDE provides thats just so much better. Right now its intellij + claude code for me, and its fine, but I wish I could get the AI power of cursor in a better package.
Intellij's tab-complete is coming along; it's hit and miss if it will work but for similar edits I'm finding it picks up the pattern quickly and I can tab - tab - tab to make them happen.
Building off of VSCode was probably Cursors silver bullet and the best decision they could have ever made.
It made migrating for everyone using VSCode (probably the single most popular editor) or another vscode forked editor (but at the time it was basically all VSCode) as simple as install and import settings.
I do not think Cursor would have done nearly as well as it has if it didn't. So even though it can be subpar in some areas due to VSCodes baggage, its probably staying that way for a while.
I dont disagree with anything you said. If I was in their shoes, I would have done exactly the same thing.
Maybe my complaint is that I wish vscode had more features like intellij, or that intellij was the open source baseline a lot of other things could be built on.
Intellij is not without its cruft and problems, dont get me wrong. But its git integration, search, navigation, database tools - I could go on - all of these features are just so much nicer than what vscode offers.
For anyone else who was wondering, it looks like the within-Cursor model pricing for Cursor Composer is identical to gemini-2.5-pro, gpt-5, and gpt-5-codex: https://cursor.com/docs/models#model-pricing
($1.25 input, $1.25 cache write, $0.13 cache read, and $10 output per million tokens)
I'm curious if their near-term expectation is that this is be better than these models or is this a model they tend to use in Auto mode, or if the focus is really if you want speed...? I guess my question is why would I actively chose this over Auto?
Hey - really sorry to hear this - could you email me andrew@cursor.com? Here are 3 suggestions to try-
1. Reset your settings.json - if shared with vscode, sometimes settings can cause perf regressions
2. Could you try cmd-shift-p -> "capture and send debugging data"? Will send us some profiling data to debug
3. Clear your user data (will delete chats) as a last resort - cmd-shift-p, "reveal user data," close the app, then delete this folder and restart the app
As a stealth model, it was priced as $1.25M in / $10M out
Right now, it seems free when you are a Cursor Pro user, but I'd love more clarity on how much it will cost (I can't believe it'll be unlimited usage for subscribers)
Could anyone explain how to use multiple agents and subagents in Cursor, Claude Code, or others? It is already challenging to me taming one model doing work, let alone synchronizing multiple parallel workers.
Do you have to split the plan in parallelizable tasks that could be worked in parallel in one codebase without breaking and confusing the other agents?
you can use git worktrees and just have multiple Claude Code terminal instances working on each worktree. That way they don't clash, just delete the worktree when the task is done.
I have never leveraged git worktrees... That is such a crazy useful tool that I am almost ashamed of not having researched it before. Git is such a beautiful piece of software.
Unfortunately not, as we used our own internal code for the benchmark. We would also like to see more benchmarks that reflect the day-to-day agentic coding use.
Roughly, we had Cursor software engineers record real questions they were asking models, and then had them record the PR that they made that contained the result. We then cleaned these up. That is the benchmark.
I wonder if this custom model is trained on cursor users. There’s a lot of potential on how much better a custom model could be the closer it is integrated with the product. Having the model learn to adapt to different user preferences would make it stand out compared to memoryless frontier models.
The fact that you are wondering this is bad. You definitely should know this. _ALL_ the online ai providers are training on your data. They have more expensive enterprise plans if want to opt out.
I’ve generally seen providers allow you to opt in or out. What may vary is what the default is and what they may offer in exchange for using your data (perhaps they could offer higher rate limits).
>> "Best Frontier" includes GPT-5 and Sonnet 4.5, which both outperform Composer.
Looking at the graph, it would appear there's an implicit "today" in that statement, as they do appear poised to equal or surpass Sonnet 4.5 on that same benchmark in the near future.
Please keep the naming of your models sane, I'd like to know that composer 1 is the first model and composer 2 is second but composer 1o is not yet another 1 variant that's actually newer and better than 2, that's just dumb. Not that you're doing that, some other companies do that.
Cursor has the best Tab model, and I feel like their lead there has kept growing - they're doing some really cool things there. https://cursor.com/blog/tab-rl
I wonder how much the methods/systems/data transfer, if they can pull off the same with their agentic coding model that would be exciting.
We also are big Tab users here at Cursor. In the blog we talk about the motivation for this project came from thinking about a Tab-like agent.
I feel like that's like having a lead in producing better buggy whips.
I run Claude Code in the background near constantly for a variety of projects, with --dangerously-skip-permissions, and review progress periodically. Tabbing is only relevant when it's totally failing to make progress and I have to manually intervene, and that to me is a failure scenario that is happening less and less often.
What are you building with this workflow? Is it an application live in production with users? It is such a foreign way of working to me.
This is just a completely different use of LLMs and has little to do with working at a real business with a live site and users. Cursor is great when you want to gain understanding of an issue quickly, or resolve something clear and specific quickly.
I'm not against YOLO vibe coding, but being against tab completion is just insane to me. At the end of the day, LLMs help you achieve goals quicker. You still need to know what goal you want to achieve, and tab completion basically let's me complete a focused goal nearly as soon as I determine what my goal is.
Some of these projects are at a "real business with a live site and users".
And it's not remotely "YOLO vibe coding". All the code gets reviewed, and tested thoroughly.
What I don't do is babysit the LLM until it's code passes both the test suite and automated review stages, because it's a waste of time.
Others of these projects are research tasks. While I wrote this comment, Claude unilaterally fixed a number of bugs in a compiler.
Tab model is fantastic but I wish it was somehow aware of the conversation happening in the currently active AI chat session.
It's great. BUT: Wish they had selected another shortcut like shift+tab.
Every time I write code myself I find myself racing the AI to get an indentation in before the AI is done... gets annoying
You can change the key bind, I personally set it to ctrl+tab
The lack of transparency here is wild. They aggregate the scores of the models they test against, which obscures the performance. They only release results on their own internal benchmark that they won't release. They talk about RL training but they don't discuss anything else about how the model was trained, including if they did their own pre-training or fine-tuned an existing model. I'm skeptical of basically everything claimed here until either they share more details or someone is able to interpedently benchmark this.
Hi everyone,
I am an ML researcher at Cursor, and worked on this project. Would love to hear any feedback you may have on the model, and can answer question about the blog post.
Impressive systems write-up. A question: if Composer is an RL finetune on an open model, why keep weights closed? The edge from a slightly better checkpoint erodes quickly in this market, it's not a durable advantage. Composer protects Cursor's margins from being squeezed by the big AI labs, but that is true whether the weights are open or closed, and I think Cursor would have more lasting benefit by generating developer goodwill than from a narrow, short-lived advantage. But, that's just my opinion. I personally find it hard to get excited about yet-another proprietary model. GPT-5 and Sonnet 4.5 are around when I need one of those, but I think the future is open.
Do you have any graphs handy that kind of replicates the one used first in the blog post but a bit less ambiguous, maybe without model grouping? I feel like it would have been a bit more fair to include proper names, and individualize them rather than group everything together by something, and then present your own model on its own.
Why did you stop training shy of the frontier models? From the log plot it seems like you would only need ~50% more compute to reach frontier capability
We did a lot of internal testing and thought this model was already quite useful for release.
Makes sense! I like that you guys are more open about it. The other labs just drop stuff from the ivory tower. I think your style matches better with engineers who are used to datasheets etc. and usually don't like poking a black box
Thanks! I do like the labs blog posts as well though, OpenAI and Anthropic have some classics.
Can you please tell us more about how you used Ray for setting up the RL infrastructure?
Oh good question. Actually speaking at the Ray Summit next week in SF so we will talk more about it. We used Ray throughout the pipeline for running evals, for the RL controller, for data collation, and for visualizations. One tool we found helpful was Ray Data which let us easily scale over data and run logs.
Please share more about Ray Data use case.
We use Ray data for our map-style processing jobs. For example one tool have runs over all the rollouts from the RL system and collects qualitative statistics to understand which type of agent trajectories are being reward, and what types of searches and terminal commands are being made.
Is it true that Cheetah is Grok Code Fast 2? Does this mean that the new Cursor model is also based on Grok?
Cheetah was an earlier (and dumber) version of this model that we used to test production speed. They are both developed in-house. If you liked Cheetah, give this model a try.
This is nice. I liked Cheetah for grunt work that I want to get out quickly and is not too hard. The speed is really awesome. A model that would run at even higher speeds like the OSS models at groq/cerebras would really be workflow changing, because the slowness of SOTA models really breaks the flow. I find myself taking a ton of breaks and getting distracted while I wait for a model to complete a task (e.g. just now).
Let us know how you like it.
Awesome, thanks for the clarification. So are the rumors around Cheetah being based on a Grok model just straight up untrue? I want to try Composer but have a pretty strict no X/Grok policy.
Straight up untrue.
There is a youtube livestreamer building with it now, if you are looking for direct feedback: https://www.youtube.com/watch?v=1bDPMVq69ac
neat!
Is the new model trained from scratch? What training data went into it?
Which model did you distill it from? Great work! PS getting a few scenarios where it doesn't follow rules as well as sonnet 4.5
The blog talks about the training process. Specifically we trained with RL post-training on coding examples.
Makes sense, but what model was used for the base? Is it some open-source model, and you're not at liberty to disclose?
that's cool thanks!
I prefer the approach of focusing on faster models despite their lower intelligence because I want my IDE to fly when I can see the code. I find this useful when I need to manually debug something that any model is able to do, so I know it's going to fail but at least it will fail fast. On the other hand, if I need more intelligence I have my other CLI that doesn't allow me to see the code but gets the planning and difficult code done.
Our view is that there is a now a minimal amount of intelligence that is necessary to be productive, and that if you can pair that with speed that is awesome.
How many times have you needed to reset the optimizer during the RL training cycles?
is Composer a fine tune of an existing open source base model?
Our primary focus is on RL post-training. We think that is the best way to get the model to be a strong interactive agent.
So, yes, but you won’t say what the base model is? :)
How do you work with multiple agents?
We train with a single agent. is that the question?
Maybe I'm an outlier but Sonnet 4.5 quality is about as low as I'm willing to go.
It's generation speed is not the problem or the time sink.
It's wrestling with it to get the right output.
---
And just to clarify as maybe I misunderstood again but people are comparing cursor to Claude Code and codex etc here- isn't this whole article all cursor just using different models?
> Sonnet 4.5 quality is about as low as I'm willing to go.
literally a 30 day old model and you've moved the "low" goalpost all the way there haha. funny how humans work
Yup - just like sibling comment said - my "low bar" is going to be whatever the best model is that isn't unreasonably costly/expensive.
Speed of model just isn't the bottleneck for me.
Before it I used Opus 4.1, and before that Opus 4.0 and before that Sonnet 4.0 - which each have been getting slightly better. It's not like Sonnet 4.5 is some crazy step function improvement (but the speed over Opus is definitely nice)
Yes? Because why should we settle for less now that it is available?
because engineering is the art of "good enough" and composer is clearly "good enough but a lot faster" which makes up for intelligence gaps in interesting ways
It's not good enough for a lot of us, though, clearly.
Same... I've found that using a non-Claude model just ends up being more expensive and not worth it. "Auto" tokens are hardly free, and I've had plenty of experiences putting "Auto" to work on a "simple" seeming task to have it consume like 1 USD of tokens quite quickly while producing nothing of value, when I'd replay with Claude 4.5 Sonnet non-thinking and it would provide a solid solution for 0.5 USD.
Agree that Sonnet 4.5 is an excellent model. Would be curious to hear your experience using Composer though, it's quite good.
I'll try it out! I haven't yet - just generally conveying my opinion that I personally weigh "better model" much more important than speed, assuming some "fast enough"
Also, didn't realize you worked at Cursor - I'm a fan of your work - they're lucky to have you!
Thanks! Yeah, been working here for 9 months now. Fascinated byt agentic coding both as a researcher and user.
Totally agree that "smart model" is the table stakes for usefulness these days.
There’s two different kinds of users, on one side people are more hands off and want the model to autonomously handle longer tasks on its own with minimal guidance, and on the other side is users who want to interactively collaborate with the model to produce desired results. Speed matters much more for the second case, where you know what you want and just want the model to implement whatever you had in mind as quick as possible. Intelligence/ability matters more for the first case when you don’t have full understanding of all the code. I think it’s context dependent for me where more serious work tends to be more interactive. The intelligence of a model doesn’t make up for issues due to lack of context to me.
I'm very solidly in the second group - but I review all the code. If it writes faster than I can read, that's fast enough.
The reason I pulled out the comparison is to highlight how serious they are about all the important parts that make or break the AI coding experience - speed being very important to me. I’d rather catch my model doing the wrong thing quickly than having a higher chance of one-shotting it at the cost of having to do a lot of specification upfront.
gpt-5-high is as low as i can go :]
Here's the Composer 1 pelican riding a bicycle: https://static.simonwillison.net/static/2025/cursor-1-pelica...
honestly better than I expected
While I am excited to see a new model, I am skeptical when there is so much vagueness - charts with "frontier models" without actually spelling out which ones, charts with no numbers (time axis, or in one chart - entirely).
There is a footnote that should help with the models. Training is a harder thing to report on, but roughly our finding here is that RL scales.
People on here love to be contrarian about Cursor, but I’ve tried all the popular alternatives (Copilot, Claude Code, Codex, Gemini CLI, Cline) and found Cursor’s overall experience to just be unmatched. A big part of that is its speed, another its reliability.
It’s the only coding agent I’m actually really motivated to use out of the box because it really does make me feel more productive while the others keep messing up the project, from way too large changes I didn’t ask for all the way to constant syntax and request errors.
It’s the only coding agent I’ve used that feels serious about being a product rather than a prototype. Their effort in improving their stack is totally paying off.
I dropped cursor for the precise reason you mention: reliability.
Countless times my requests in the AI chat just hang there for 30+ seconds more until I can retry them.
When I decided to give Claude Code a try (I thought I didn't need it because I used Claude in Cursor) I couldn't believe how faster it was, and literally 100% reliable.
EDIT: given today's release, decided to give it a go. The Composer1 model _is_ fast, but right at the second new agent I started I got this:
> Connection failed. If the problem persists, please check your internet connection or VPN
Sounds like you have a network problem. Did you try checking the network diagnostic in settings? They default to http2 which can throw a wrench in some corporate networks.
I would be willing to bet money your issue is on your side. I am a daily user since the beginning and cannot recall when I have had issues like you describe unless it was related to my corp network.
This is the exact reason I left Cursor for Claude Code. Night and day difference in reliability. The Windows experience might be especially bad, but it would get constantly hung or otherwise fail when trying to run commands. I also had to babysit Cursor and tell it to continue for mid sized tasks.
They've improved performance dramatically in the last few weeks, might have fixed your issues.
Its clear they've been shipping a lot of windows updates.
I use cursor daily, my business partner uses CC. Without a doubt, CC is certainly better, I'm just not willing to let go of the flow I spent the last year fine tuning. I'll probably make the leap after we finish the latest release.
A lot of progress is being made here on the Cursor side I encourage you to try it again.
(Cursor dev)
I too have tried them all and have settled with Cursor being the best. That said I see the current space split between folks like me who know generally what I want built and appreciate a tool that helps me get to goal quicker and on the otherwise of the spectrum, folks who want the tool to orchestrate most of the engineering. I have no opinion on which is better but for me I sit on the first camp. In that camp Cursor is by far the best tool.
Yep, it just works seamlessly. Sure, it hangs sometimes, but their UI allows you to retry or undo changes to an earlier point in the conversation easily. The autocompletion is nice as well and pretty satisfying to tab through the small and menial things when refactoring.
> I’ve tried all the popular alternatives (Copilot, Claude Code, Codex, Gemini CLI, Cline)
Can't help but notice you haven't tried Zed!
You tried Claude and still prefer cursor?
Absolutely. CC can be tuned to not do too much crap on its own, but even with the new extension its IDE integration and multi thread management are still significantly worse, as is its status reporting, which I find to be very important.
Also, somehow magically, I’ve found Cursor’s Auto mode to be significantly faster than the specific models I’ve tried, Claude being among them.
Auto is pretty amazing and I think most folks that have issues or complain about cost are simply not using Auto.
Auto had a big improvement a few weeks ago (around when pricing changed)
If a few weeks is months I would agree I think the change to Auto was 2-3+ months ago when they moved to charging named models and higher limits on Auto.
Auto is only good for trivial stuff at this point. It is quite subpar at everything else. Th is is probably because it almost always defaults to Claude Sonnet 3.5 (which you can tell if you ask the agent to identify itself and tell you its version), and that is pretty outdated.
Again it goes back to what your workflow is. I don’t think trivial is the right word. I use auto to write fairly advanced code but I do it in bite size chunks or relatively bite size. So thinking function level or a couple of interdependent functions ruins being written.
I would agree it is not as good on doing lengthy work where it’s taking design all the way through implementing a feature in a single shot but trivial is not a good description.
I also don’t think you’re right. 3.5 was recently deprecated and even before then, Cursor has been hitting rate limits with Anthropic. Auto is as much a token cost optimization as it is a rate limit optimization.
Absolutely. I actually don’t understand the preference folks have for Claude code. I don’t find it that powerful. That said, I think some of it comes down to preference and work context.
One thing no competitor is serious on is average response completion time. Cursor lapped everyone there
There are lots of good models we like here. But we agree that getting the right point on the smart+fast graph can make agentic coding feel really good.
(Cursor researcher)
I love Cursor. I've tried Copilot/Claude/etc. but keep coming back to Cursor. I just want to work, and Cursor tab complete is dang accurate, esp. for refactoring tasks.
I tried going back to VS Code + Copilot a month ago. I only lasted 4 days because it was to bad. It was super slow and gave poor suggestions, but mostly it just flat out did not suggest anything. Cursor feels snappy in comparison and the suggestions are more often than not useful. The most annoying thing about Cursor tab complete, is that it is so fast that when I am doing something unusual then it will keep on jumping in with useless suggestions. They have a snooze function for this though.
Damn TIL, I always used > Cursor: disable completions and forgot to turn it on again I need to try snooze then!
I love cursor, the tab completion and agent mode. But I really dislike vscode after using intellij for so many years. I really wish the underlying editor was better, or I could get cursor features in intellij instead. The editing of the files is mostly fine, but its everything else around it that a full IDE provides thats just so much better. Right now its intellij + claude code for me, and its fine, but I wish I could get the AI power of cursor in a better package.
Intellij's tab-complete is coming along; it's hit and miss if it will work but for similar edits I'm finding it picks up the pattern quickly and I can tab - tab - tab to make them happen.
Still not up to Cursor standards though :)
Building off of VSCode was probably Cursors silver bullet and the best decision they could have ever made.
It made migrating for everyone using VSCode (probably the single most popular editor) or another vscode forked editor (but at the time it was basically all VSCode) as simple as install and import settings.
I do not think Cursor would have done nearly as well as it has if it didn't. So even though it can be subpar in some areas due to VSCodes baggage, its probably staying that way for a while.
I dont disagree with anything you said. If I was in their shoes, I would have done exactly the same thing.
Maybe my complaint is that I wish vscode had more features like intellij, or that intellij was the open source baseline a lot of other things could be built on.
Intellij is not without its cruft and problems, dont get me wrong. But its git integration, search, navigation, database tools - I could go on - all of these features are just so much nicer than what vscode offers.
This looks like a model RLed on top of Qwen3-Coder or GLM 4.6 as per their graph and foot note.
For anyone else who was wondering, it looks like the within-Cursor model pricing for Cursor Composer is identical to gemini-2.5-pro, gpt-5, and gpt-5-codex: https://cursor.com/docs/models#model-pricing
($1.25 input, $1.25 cache write, $0.13 cache read, and $10 output per million tokens)
I'm curious if their near-term expectation is that this is be better than these models or is this a model they tend to use in Auto mode, or if the focus is really if you want speed...? I guess my question is why would I actively chose this over Auto?
Insane velocity from the Cursor team. I wonder how they move so fast?
We don't wear shoes [1].
[1] https://www.businessinsider.com/no-shoes-policy-in-office-cu...
I would have thought it's because you use Cursor...
Cursor 2.0 keeps crashing on me while having an agent running and opening the IDE part of the application. I might have to rollback.
Hey - really sorry to hear this - could you email me andrew@cursor.com? Here are 3 suggestions to try- 1. Reset your settings.json - if shared with vscode, sometimes settings can cause perf regressions 2. Could you try cmd-shift-p -> "capture and send debugging data"? Will send us some profiling data to debug 3. Clear your user data (will delete chats) as a last resort - cmd-shift-p, "reveal user data," close the app, then delete this folder and restart the app
I wish it was easy to find out how much it costs relative to Claude :)
As a stealth model, it was priced as $1.25M in / $10M out
Right now, it seems free when you are a Cursor Pro user, but I'd love more clarity on how much it will cost (I can't believe it'll be unlimited usage for subscribers)
Facts. They really need to make pricing more clear across the entire product.
Could anyone explain how to use multiple agents and subagents in Cursor, Claude Code, or others? It is already challenging to me taming one model doing work, let alone synchronizing multiple parallel workers.
Do you have to split the plan in parallelizable tasks that could be worked in parallel in one codebase without breaking and confusing the other agents?
you can use git worktrees and just have multiple Claude Code terminal instances working on each worktree. That way they don't clash, just delete the worktree when the task is done.
I have never leveraged git worktrees... That is such a crazy useful tool that I am almost ashamed of not having researched it before. Git is such a beautiful piece of software.
I built an open source project to make the whole workflow easier: https://github.com/built-by-as/FleetCode
see also https://cursor.com/changelog/2-0 and https://cursor.com/blog/2-0
other links across the web:
https://x.com/amanrsanger/status/1983581288755032320?s=46
https://x.com/cursor_ai/status/1983567619946147967?s=46
my very small nit is... why is the model called Composer?? of all things?? when there was already a Cursor Composer from 2024.
Cursor Cheetah wouldve been amazing. reusing the Composer name feels like the reverse OpenAI Codex move haha
We like the name Composer and were sad to see it go. Excited to bring it back. (Agree Cheetah is a cool name too.)
What I can't stand about cursor is the constantly changing and confusing billing and usage.
I think competition in the space is a good thing, but I'm very skeptical their model will outperform Claude.
is Cursor Bench open? Would like to see an open benchmark for agentic coding
Unfortunately not, as we used our own internal code for the benchmark. We would also like to see more benchmarks that reflect the day-to-day agentic coding use.
Is there any information at all available, anywhere, on what Cursor Bench is testing and how?
It's the most prominent part of the release post - but it's really hard to understand what exactly it's saying.
Roughly, we had Cursor software engineers record real questions they were asking models, and then had them record the PR that they made that contained the result. We then cleaned these up. That is the benchmark.
Which programming languages/tools/libraries did the teams questions/code involve?
I wonder if this custom model is trained on cursor users. There’s a lot of potential on how much better a custom model could be the closer it is integrated with the product. Having the model learn to adapt to different user preferences would make it stand out compared to memoryless frontier models.
The fact that you are wondering this is bad. You definitely should know this. _ALL_ the online ai providers are training on your data. They have more expensive enterprise plans if want to opt out.
I’ve generally seen providers allow you to opt in or out. What may vary is what the default is and what they may offer in exchange for using your data (perhaps they could offer higher rate limits).
Very cool, congrats!
Where is the comparison with Sonnet 4.5? That would be the only thing that matters, really.
> "Best Frontier" includes GPT-5 and Sonnet 4.5, which both outperform Composer.
>> "Best Frontier" includes GPT-5 and Sonnet 4.5, which both outperform Composer.
Looking at the graph, it would appear there's an implicit "today" in that statement, as they do appear poised to equal or surpass Sonnet 4.5 on that same benchmark in the near future.
Does anyone code with GPT-5? I've never had it work in Cursor. I mean, like, at all.
A lot of people use it! It scores very well on our benchmarks, significantly better than Composer-1.
Please keep the naming of your models sane, I'd like to know that composer 1 is the first model and composer 2 is second but composer 1o is not yet another 1 variant that's actually newer and better than 2, that's just dumb. Not that you're doing that, some other companies do that.
We will do our best. Luckily I don't think there are major telecom companies called Composer-2.