The prompt caching change is awesome for any agent. Claude is far behind with increased costs for caching and manual caching checkpoints. Certainly depends on your application but prompt caching is also ignored in a lot of cost comparisons.
Though to be fair, thinking tokens are also ignored in a lot of cost comparisons and in my experience Claude generally uses fewer thinking tokens for the same intelligence
A few hours of playing around and I'm suitably impressed.
Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly. Additionally Claude Code has developed a number of bugs, including rapidly re-scrolling the terminal buffer, pegging local CPU to 100%, and consuming vast amounts of RAM. Codex CLI was woefully behind a few months ago and, despite overly conservative out-of-the-box sandbox settings, has quite caught up to Claude Code. (Gemini CLI is an altogether embarrassing experience, but Google did just put a solid PM behind it and 3.0 Pro should be out this month if we're lucky.)
Codex with 5.1 high managed to thoughtfully paw through the documentation and source code and - with a little help pulling down parts of the Swift Book - managed to correctly resolve the issue.
I remember getting the thread manager right being one of the harder parts of my operating systems course doing an undergrad in computer science; testing threaded programs has always been a challenge. It's a strange circle-of-life moment to realize that what was hard for undergrads also serves as a benchmark for coding agents!
"including rapidly re-scrolling the terminal buffer"
Yes this bug is brutal.
"consuming vast amounts of RAM"
Also this. Claude will leave hanging instances all the time. If you check your task manager after a few days of using it without doing a full reset you'll see a number of hanging Claude processes using up 400 mb of RAM.
Claude actually has a huge number of very painful bugs. I'm aware of at least a dozen.
> On coding, we’ve worked closely with startups like Cursor, Cognition, Augment Code, Factory, and Warp to improve GPT‑5.1’s coding personality, steerability, and code quality.
This is the first low-key, silent feature rollout, treated like "just another software update", with no hype or buzz beforehand. Prior to this point, every other feature release was pumped for weeks or even months with "leaks" from insiders and deliberately getting people amped. I don't know if OpenAI changed marketing tactics, or if they're in a new chapter in some book, but this is a radical shift from what they were doing before.
I feel like the rollout was a bit rushed. Benchmarks for 5.1 came out a day after the launch. New models weren't immediately available through the API. And then there's 5-Codex-Mini which was deprecated only six days later by 5.1-Codex-Mini. Wondering if Gemini 3 forced their hand here?
The prompt caching change is awesome for any agent. Claude is far behind with increased costs for caching and manual caching checkpoints. Certainly depends on your application but prompt caching is also ignored in a lot of cost comparisons.
Though to be fair, thinking tokens are also ignored in a lot of cost comparisons and in my experience Claude generally uses fewer thinking tokens for the same intelligence
A few hours of playing around and I'm suitably impressed.
Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly. Additionally Claude Code has developed a number of bugs, including rapidly re-scrolling the terminal buffer, pegging local CPU to 100%, and consuming vast amounts of RAM. Codex CLI was woefully behind a few months ago and, despite overly conservative out-of-the-box sandbox settings, has quite caught up to Claude Code. (Gemini CLI is an altogether embarrassing experience, but Google did just put a solid PM behind it and 3.0 Pro should be out this month if we're lucky.)
Codex with 5.1 high managed to thoughtfully paw through the documentation and source code and - with a little help pulling down parts of the Swift Book - managed to correctly resolve the issue.
I remember getting the thread manager right being one of the harder parts of my operating systems course doing an undergrad in computer science; testing threaded programs has always been a challenge. It's a strange circle-of-life moment to realize that what was hard for undergrads also serves as a benchmark for coding agents!
"including rapidly re-scrolling the terminal buffer" Yes this bug is brutal.
"consuming vast amounts of RAM" Also this. Claude will leave hanging instances all the time. If you check your task manager after a few days of using it without doing a full reset you'll see a number of hanging Claude processes using up 400 mb of RAM.
Claude actually has a huge number of very painful bugs. I'm aware of at least a dozen.
> Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly.
What solved that for me was to leverage the for-LLM docs Apple ships with Xcode, and then build a swift6-concurrency skill. Here's an example script to copy the Xcode docs into your repo: https://gist.github.com/CharlesWiltgen/75583f53114d1f2f5bae3...
Lovely find!
/Applications/Xcode.app/Contents/PlugIns/IDEIntelligenceChat.framework/Versions/A/Resources/AdditionalDocumentation/Swift-Concurrency-Updates.md
is exactly the primer to give an agent.
> On coding, we’ve worked closely with startups like Cursor, Cognition, Augment Code, Factory, and Warp to improve GPT‑5.1’s coding personality, steerability, and code quality.
Why no GitHub?
Microsoft isn’t a startup and I suspect open AI is working closely with Microsoft already.
Already live in Cursor btw
This got only a single comment and 34 points in 3 hours. Crazy how the dynamics have changed around model releases in just a single year.
There was already an announcement post for 5.1 yesterday: https://news.ycombinator.com/item?id=45904551
Thanks! Macroexpanded:
GPT-5.1: A smarter, more conversational ChatGPT - https://news.ycombinator.com/item?id=45904551 - Nov 2025 (672 comments)
More of the same, I suppose.
You have to be called Apple to get raving reviews for that.
This is the first low-key, silent feature rollout, treated like "just another software update", with no hype or buzz beforehand. Prior to this point, every other feature release was pumped for weeks or even months with "leaks" from insiders and deliberately getting people amped. I don't know if OpenAI changed marketing tactics, or if they're in a new chapter in some book, but this is a radical shift from what they were doing before.
I feel like the rollout was a bit rushed. Benchmarks for 5.1 came out a day after the launch. New models weren't immediately available through the API. And then there's 5-Codex-Mini which was deprecated only six days later by 5.1-Codex-Mini. Wondering if Gemini 3 forced their hand here?