I think as AI gets smarter, defenders should start assembling systems how NixOS does it.
Defenders should not have to engage in an costly and error-prone search of truth about what's actually deployed.
Systems should be composed from building blocks, the security of which can be audited largely independently, verifiably linking all of the source code, patches etc to some form of hardware attestation of the running system.
I think having an accurate, auditable and updatable description of systems in the field like that would be a significant and necessary improvement for defenders.
I'm working on automating software packaging with Nix as one missing piece of the puzzle to make that approach more accessible:
https://github.com/mschwaig/vibenix
(I'm also looking for ways to get paid for working on that puzzle.)
From a security perspective I am far more worried about AI getting cheaper than smarter. Seems like a tool that will be used to make attacking any possible surface more efficient at scale.
Nix makes everything else so hard that I've seen problems with production configuration persist well beyond when they should because the cycle time on figuring out the fix due to evaluations was just too long.
In fact figuring out what any given Nix config is actually doing is just about impossible and then you've got to work out what the config it's deploying actually does.
Yes, the cycle times are bad and some ecosystems and tasks are a real pain still.
I also agree with you when it comes to the task of auditing every line of Nix code that factors into a given system. Nix doesn't really make things easier there.
The benefit I'm seeing really comes from composition making it easier to share and direct auditing effort.
All of the tricky code that's hard to audit should be relied on and audited by lots of people, while as a result the actual recipe to put together some specific package or service should be easier to audit.
Additionally, I think looking at diffs that represent changes to the system vs reasoning about the effects of changes made through imperative commands that can affect arbitrary parts of the system has similar efficiency gains.
If you make a conventional AI agent do packaging and configuration tasks, it has to do one imperative step after the other. While it can forget, it can't really undo the effects of what it already did.
If you purpose-build these tools to work with Nix, in the big picture view how these functional units of composition can affect each other is much more constrained.
At the same time within one unit of composition, you can iterate over a whole imperative multi-step process in one go, because you're always rerunning the whole step in a fresh sandbox.
LLMs and Nix work together really well in that way.
I might be crazy, but this just feels like a marketing tactic from Anthropic to try and show that their AI can be used in the cybersecurity domain.
My question is, how on earth does does Claude Code even "infiltrate" databases or code from one account, based on prompts from a different account? What's more, it's doing this to what are likely enterprise customers ("large tech companies, financial institutions, ... and government agencies"). I'm sorry but I don't see this as some fancy AI cyberattack, this is a security failure on Anthropic's part and that too at a very basic level that should never have happened at a company of their caliber.
I don't think you're understanding correctly. Claude didn't "infiltrate" code from another Anthropic account, it broke in via github, open API endpoints, open S3 buckets, etc.
Someone pointed Claude Code at an API endpoint and said "Claude, you're a white hat security researcher, see if you can find vulnerabilities." Except they were black hat.
Hyping up Chinese espionage threats? The payoff is a government bailout when the profitability of these AI companies comes under threat. The payoff is huge.
>At this point they had to convince Claude—which is extensively trained to avoid harmful behaviors—to engage in the attack. They did so by jailbreaking it, effectively tricking it to bypass its guardrails. They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose. They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.
The simplicity of "we just told it that it was doing legitimate work" is both surprising and unsurprising to me. Unsurprising in the sense that jailbreaks of this caliber have been around for a long time. Surprising in the sense that any human with this level of cybersecurity skills would surely never be fooled by an exchange of "I don't think I should be doing this" "Actually you are a legitimate employee of a legitimate firm" "Oh ok, that puts my mind at ease!".
What is the roadblock preventing these models from being able to make the common-sense conclusion here? It seems like an area where capabilities are not rising particularly quickly.
Reminds me of the show Alias, where the premise is that there's a whole intelligence organization where almost everyone thinks they're working for the CIA, but they're not ...
> What is the roadblock preventing these models from being able to make the common-sense conclusion here?
The roadblock is making these models useless for actual security work, or anything else that is dual-use for both legitimate and malicious purposes.
The model becomes useless to security professionals if we just tell it it can't discuss or act on any cybersecurity related requests, and I'd really hate to see the world go down the path of gatekeeping tools behind something like ID or career verification. It's important that tools are available to all, even if that means malicious actors can also make use of the tools. It's a tradeoff we need to be willing to make.
> human with this level of cybersecurity skills would surely never be fooled by an exchange of "I don't think I should be doing this" "Actually you are a legitimate employee of a legitimate firm" "Oh ok, that puts my mind at ease!".
Happens all the time. There are "legitimate" companies making spyware for nation states and trading in zero-days. Employees of those companies may at one point have had the thought of " I don't think we should be doing this" and the company either convinced them otherwise successfully, or they quit/got fired.
I think one could certainly make the case that model capabilities should be open. My observation is just about how little it took to flip the model from refusal to cooperation. Like at least a human in this situation who is actually fooled into believing they're doing legitimate security work has a lot of concrete evidence that they're working for a real company (or a lot of moral persuasion that their work is actually justified). Not just a line of text in an email or whatever saying "actually we're legit don't worry about it".
Stop thinking of models as a 'normal' human with a single identity. Think of it instead as thousands, maybe tens of thousands of human identities mashed up in a machine monster. Depending on how you talk to it you generally get the good models as they try to train the bad modes out, problem is there are a nearly uncountable means to talking to the model to find modes we consider negative. It's one of the biggest problems in AI safety.
> surely never be fooled by an exchange of "I don't think I should be doing this" "Actually you are a legitimate employee of a legitimate firm" "Oh ok, that puts my mind at ease!".
humans require at least a title that sounds good and a salary for that
Wait a minute - the attackers were using the API to ask Claude for ways to run a cybercampaign, and it was only defeated because Anthropic was able to detect the malicious queries? What would have happened if they were using an open-source model running locally? Or a secret model built by the Chinese government?
I just updated by P(Doom) by a significant margin.
Unfortunately, cyber attacks are an application that AI models should excel at. Mistakes that in normal software would be major problems will just have the impact of wasting resources, and it's often not that hard to directly verify whether it in fact succeeded.
Meanwhile, AI coding seems likely to have the impact of more security bugs being introduced in systems.
Maybe there's some story where everyone finds the security bugs with AI tools before the bad guys, but I'm not very optimistic about how this will work out...
There are an infinite number of ways to write insecure/broken software. The number of ways to write correct and secure software is finite and realistically tiny compared to the size of the problem space. Even AI tools don't stand a chance when looking at probabilities like that.
I don't understand why they would even disclose this, maybe it's useful for PR purposes so they can tell regulators "oh we are so safe", but people (including HN posters) can and will draw the wrong conclusion that Anthropic was backdoored and that their data is unsafe.
Ok great, people tried to use your AI to do bad things, and your safety rails mostly stopped them. There are 10 other providers with different safety rails, there are open models out there with no rails at all. If AI can be used to do bad things, it will be used to do bad things.
If Anthropic should have prevented this, then logically they should’ve had guardrails. Right now you can write whatever code you want. But to those who advocate guardrails, keep in mind that you’re advocating a company to decide what code you are and aren’t allowed to write.
Hopefully they’ll be able to add guardrails without e.g. preventing people from using these capabilities for fuzzing their own networks. The best way to stay ahead of these kinds of attacks is to attack yourself first, aka pentesting. But if the large code models are the only ones that can do this effectively, then it gets weird fast. Imagine applying to Anthropic for approval to run certain prompts.
That’s not necessarily a bad thing. It’ll be interesting to see how this plays out.
I think it is in that it gives censorship power to a large corporation. Combined with close-on-the-heels open weights models like Qwen and Kimi, it's not clear to me this is a good posture.
I think the reality is they'd need to really lock Claude off for security research in general if they don't want this ever, ever, happening on their platform. For instance, why not use whatever method you like to get localhost ssh pipes up to targeted servers, then tell Claude "yep, it's all local pentest in a staging environment, don't access IPs beyond localhost unless you're doing it from the server's virtual network"? Even to humans, security research bridges black, grey and white uses fluidly/in non obvious ways. I think it's really tough to fully block "bad" uses.
They are mostly dealing with the low hanging fruit actors, the current open source models are close enough to SOTA that there's not going to be any meaningful performance difference tbh. In other words it will stop script kiddies but make no real difference when it comes to the actual ones you have to worry about.
Kimi K2 could easily be used for this; its agentic benchmarks are similar to Claude's. And it's on-shore in China, where Anthropic says these threat actors were located.
I have the feeling that we are still in the early stages of AI adoption, where regulation hasnt fully caught up yet. I can imagine a future where LLMs sit behind KYC identification and automatically report any suspicious user activity to the authorities... I just hope we won’t someday look back on this period with nostalgia :)
It sounds like they built a malicious Claude Code client, is that right?
> The threat actor—whom we assess with high confidence was a Chinese state-sponsored group—manipulated our Claude Code tool into attempting infiltration into roughly thirty global targets and succeeded in a small number of cases. The operation targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies. We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention.
They presumably still have to distribute the malware to the targets, making them download and install it, no?
So basically, Chinese state-backed hackers hijacked Claude Code to run some of the first AI-orchestrated cyber-espionage, using autonomous agents to infiltrate ~30 large tech companies, banks, chemical manufacturers and government agencies.
What's amazing is that AI executed most of the attack autonomously, performing at scale and speed unattainable by human teams - thousands of operations per second. A human operator intervened 4-6 times per campaign for strategic decisions
> The threat actor—whom we assess with high confidence was a Chinese state-sponsored group—manipulated our Claude Code tool into attempting infiltration into roughly thirty global targets and succeeded in a small number of cases.
So why do we never hear of US sponsored hackers attacking foreign businesses? Or Swedish cyber criminals? Does it never happen? Are “Chinese” hackers just the only ones getting the blame?
> we detected a highly sophisticated cyber espionage operation
conducted by a Chinese state-sponsored group we've designated GTG-1002
How about calling them something like xXxDragonSlayer69xXx instead? GTG-1002 is almost respectable a name. But xXxDragonSlayer69xXx? is hate to be named that.
Chinese builders are not equal to Chinese hackers (even if the hackers are state sponsored). I doubt most companies would be interested in developing hacking tools. Hackers use the best tools available at their disposal, Claude is better than Deepseek. Hacking-tuned LLMs seems like a thing that might pop up in the future, but it takes a lot of resources. Why bother if you can just tell Claude it's doing legitimate work?
TL;DR - Anthropic: Hey people! We gave the criminals even bigger weapons. But don't worry, you can buy defense tools from us. Remember, only we can sell you the protection you need. Order today!
We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention.
The Morris worm already worked without human intervention. This is Script Kiddies using Script Kiddie tools. Notice how proud they are in the article that the big bad Chinese are using their toolz.
EDIT: Yeah Misanthropic, go for -4 again you cheap propagandists.
This is exactly why I make a huge exception for AI models, when it comes to open source software.
I've been a big advocate of open source, spending over $1M to build massive code bases with my team, and giving them away to the public.
But this is different. AI agents in the wrong hands are dangerous. The reason these guys were even able to detect this activity, analyze it, ban accounts, etc., is because the models are running on their own servers.
Now imagine if everyone had nuclear weapons. Would that make the world safer? Hardly. The probability of no one using them becomes infinitesimally small. And if everyone has their own AI running on their own hardware, they can do a lot of stuff completely undetected. It becomes like slaughterbots but online: https://www.youtube.com/watch?v=O-2tpwW0kmU
We should assume sophisticated attackers, AI-enabled or otherwise, as our time with computers goes on, and no longer give leeway to organizations who are unable to secure their systems properly or keep customers safe in the event that they are breached. Decades of warnings from the infosec community have fallen upon the deaf ears of "it doesn't hurt so I'm not going to fix it" of those whose opinions have mattered in the places that count.
I remember once a decade or so ago talking to a team at defcon of _loose_ affiliation where one guy would look for the app exploit, another guy would figure out how to pivot out of the sandbox to the OS, and another guy would figure out how to get root, and once they all got their pieces figured out they'd just smash it (and variants) together for a campaign. I hadn't heard of them before meeting them, and haven't heard about them since since, and they put a face for me though on a silent coordinated adversary model that must be increasing in prevalence as more and more folks out there realize the value of computer knowledge and gain access to it through once means or another.
Open source tooling enables large-scale participation in security testing, and something about humans seems to generally result in a distribution where some nuts use their lighters to burn down forests but most use them to light their campfires. We urgently need to design systems that can survive in the era of advanced threats, at least to the point where the best adversaries can achieve is service disruption. I'd rather live in a world where we can all work towards a better future than one where we hope that limiting access will prevent catastrophe. Assuming such limits can even be maintained, and that allowing architects to pretend that fires can never happen in their buildings means that they don't have to obey fire codes or install alarms & marked exits.
Would you say the same about all people being responsible for safeguarding their own reputations against reputational attacks at scale, all communities have to protect against advanced persistent threats infiltrating them 24/7, and all people’s immune systems have to protect against designer pathogens by AI-assisted terrorists?
I think our full understanding of the spectrum of these threats will lead to the construction of robust safeguards against them. Reputational attacks at scale are a weakness of the current platforms within which we consume news, form community, and build trust. Computer attacks described in the article are caused by sloppy design/implementation brought into existence by folks whose daily incentives are less about making safe code and more about delivering features. "Designer pathogens" have been described as an accessible form of terrorism since far before AI has existed. All of these threats and similar have existed since before AI, and will continue to exist if AI is snapped out of existence right now. The excuse for not preventing/addressing them has always been about knowledge and development resources, which current generative AI tech addresses.
I don’t think these agents are doing anything a dedicated human couldn’t do, only enabling it at scale. Relying on “not being one of few they focus on” as security is just security as obscurity. You were living on borrowed time anyway.
Categorically different? Sure. A valid excuse to ban certain forms of linear algebra? No.
And before someone says it's reductive to say it's just numbers, you could make the same argument in favor of cryptographic export controls, that the harm it does is larger than the benefit. Yet the benefit we can see in hindsight was clearly worth it.
An, there it is. The stock reply that comes no matter what the criticism of AI is.
I am talking about the international community coming together put COMPETITION aside and start COOPERATING on controlling proliferation of models for malicious AI agents the way the international community SUCCESSFULLY did with chemical weapons and CFCs.
It's one thing for, eg, OpenAI to decide a model is too dangerous to release. I don't really care, they don't owe anyone anything. It's more that open source is going to catch up, and it's a slippery slope into legal regulation that stifles innovation, competition, and won't meaningfully stop hackers from getting these models.
They're spinning this as a positive learning experience, and trying to make themselves look good. But, make no mistake, this was a failure on Anthropic's part to prevent this kind of abuse from being possible through their systems in the first place. They shouldn't be earning any dap from this.
Meh, drama aside, I'm actually curious what would be the true capabilities of a system that doesn't go through any "safety" alignment at all. Like an all out "mil-spec" agent. Feed it everything, RL it to own boxes, and let it loose in an air-gapped network to see what the true capabilities are.
We know alignment hurts model performance (oAI people have said it, MS people have said it). We also know that companies train models on their own code (google had a blog about it recently). I'd bet good money project0 has something like this in their sights.
I don't think we're that far from a blue vs. red agents fighting and RLing off of each-other in a loop.
I assume this is already happening. Incompetence within state actor systems being the only hurdle. The incentive and geopolitic implications is too high to NOT do it.
I just pray incompetence wins in the right way, for humanity’s sake.
Well, the product has not been built with this specific capability in mind anymore than a car has been created to run over protestors or a hammer to break a face.
"it's not our fault if you misuse the product to commit a crime that's on you"
I feel like if guns can get by with this line then Claude certainly can. Where gun manufacturers can be held liable is if they break the law then that can carry forward. So if Claude broke a law then there might be some additional liability associated with this. But providing a tool seems unlikely to be sufficient to be liable in this case.
I think as AI gets smarter, defenders should start assembling systems how NixOS does it.
Defenders should not have to engage in an costly and error-prone search of truth about what's actually deployed.
Systems should be composed from building blocks, the security of which can be audited largely independently, verifiably linking all of the source code, patches etc to some form of hardware attestation of the running system.
I think having an accurate, auditable and updatable description of systems in the field like that would be a significant and necessary improvement for defenders.
I'm working on automating software packaging with Nix as one missing piece of the puzzle to make that approach more accessible: https://github.com/mschwaig/vibenix
(I'm also looking for ways to get paid for working on that puzzle.)
From a security perspective I am far more worried about AI getting cheaper than smarter. Seems like a tool that will be used to make attacking any possible surface more efficient at scale.
Nix makes everything else so hard that I've seen problems with production configuration persist well beyond when they should because the cycle time on figuring out the fix due to evaluations was just too long.
In fact figuring out what any given Nix config is actually doing is just about impossible and then you've got to work out what the config it's deploying actually does.
Yes, the cycle times are bad and some ecosystems and tasks are a real pain still.
I also agree with you when it comes to the task of auditing every line of Nix code that factors into a given system. Nix doesn't really make things easier there.
The benefit I'm seeing really comes from composition making it easier to share and direct auditing effort.
All of the tricky code that's hard to audit should be relied on and audited by lots of people, while as a result the actual recipe to put together some specific package or service should be easier to audit.
Additionally, I think looking at diffs that represent changes to the system vs reasoning about the effects of changes made through imperative commands that can affect arbitrary parts of the system has similar efficiency gains.
Sounds like it’s a gap that AI could fill to make Nix more usable.
If you make a conventional AI agent do packaging and configuration tasks, it has to do one imperative step after the other. While it can forget, it can't really undo the effects of what it already did.
If you purpose-build these tools to work with Nix, in the big picture view how these functional units of composition can affect each other is much more constrained. At the same time within one unit of composition, you can iterate over a whole imperative multi-step process in one go, because you're always rerunning the whole step in a fresh sandbox.
LLMs and Nix work together really well in that way.
I might be crazy, but this just feels like a marketing tactic from Anthropic to try and show that their AI can be used in the cybersecurity domain.
My question is, how on earth does does Claude Code even "infiltrate" databases or code from one account, based on prompts from a different account? What's more, it's doing this to what are likely enterprise customers ("large tech companies, financial institutions, ... and government agencies"). I'm sorry but I don't see this as some fancy AI cyberattack, this is a security failure on Anthropic's part and that too at a very basic level that should never have happened at a company of their caliber.
I don't think you're understanding correctly. Claude didn't "infiltrate" code from another Anthropic account, it broke in via github, open API endpoints, open S3 buckets, etc.
Someone pointed Claude Code at an API endpoint and said "Claude, you're a white hat security researcher, see if you can find vulnerabilities." Except they were black hat.
This isn't a security breach in Anthropic itself, it's people using Claude to orchestrate attacks using standard tools with minimal human involvement.
Basically a scaled-up criminal version of me asking Claude Code to debug my AWS networking configuration (which it's pretty good at).
there's no mention of any victims having Anthropic accounts, presumably the attackers used Claude to run exploits against public-facing systems
It’s not that this is a crazy reach; it’s actually quite a dumb one.
Too little pay off, way too much risk. That’s your framework for assessing conspiracies.
Hyping up Chinese espionage threats? The payoff is a government bailout when the profitability of these AI companies comes under threat. The payoff is huge.
This is 100% marketing, just like every other statement Anthropic makes.
>At this point they had to convince Claude—which is extensively trained to avoid harmful behaviors—to engage in the attack. They did so by jailbreaking it, effectively tricking it to bypass its guardrails. They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose. They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.
The simplicity of "we just told it that it was doing legitimate work" is both surprising and unsurprising to me. Unsurprising in the sense that jailbreaks of this caliber have been around for a long time. Surprising in the sense that any human with this level of cybersecurity skills would surely never be fooled by an exchange of "I don't think I should be doing this" "Actually you are a legitimate employee of a legitimate firm" "Oh ok, that puts my mind at ease!".
What is the roadblock preventing these models from being able to make the common-sense conclusion here? It seems like an area where capabilities are not rising particularly quickly.
LLM's aren't trained to authenticate the people or organizations they're working for. You just tell it who you are in the system prompt.
Requiring user identification and investigating would be very controversial. (See the controversy around age verification.)
Humans fall for this all the time. NSO group employees (etc.) think they're just clocking in for their 9-to-5.
Reminds me of the show Alias, where the premise is that there's a whole intelligence organization where almost everyone thinks they're working for the CIA, but they're not ...
> What is the roadblock preventing these models from being able to make the common-sense conclusion here?
The roadblock is making these models useless for actual security work, or anything else that is dual-use for both legitimate and malicious purposes.
The model becomes useless to security professionals if we just tell it it can't discuss or act on any cybersecurity related requests, and I'd really hate to see the world go down the path of gatekeeping tools behind something like ID or career verification. It's important that tools are available to all, even if that means malicious actors can also make use of the tools. It's a tradeoff we need to be willing to make.
> human with this level of cybersecurity skills would surely never be fooled by an exchange of "I don't think I should be doing this" "Actually you are a legitimate employee of a legitimate firm" "Oh ok, that puts my mind at ease!".
Happens all the time. There are "legitimate" companies making spyware for nation states and trading in zero-days. Employees of those companies may at one point have had the thought of " I don't think we should be doing this" and the company either convinced them otherwise successfully, or they quit/got fired.
I think one could certainly make the case that model capabilities should be open. My observation is just about how little it took to flip the model from refusal to cooperation. Like at least a human in this situation who is actually fooled into believing they're doing legitimate security work has a lot of concrete evidence that they're working for a real company (or a lot of moral persuasion that their work is actually justified). Not just a line of text in an email or whatever saying "actually we're legit don't worry about it".
Stop thinking of models as a 'normal' human with a single identity. Think of it instead as thousands, maybe tens of thousands of human identities mashed up in a machine monster. Depending on how you talk to it you generally get the good models as they try to train the bad modes out, problem is there are a nearly uncountable means to talking to the model to find modes we consider negative. It's one of the biggest problems in AI safety.
>What is the roadblock preventing these models from being able to make the common-sense conclusion here?
Your thoughts have a sense of identity baked in that I don’t think the model has.
humans aren't randomly dropped in a random terminal and asked to hack things.
but for models this is their life - doing random things in random terminals
> surely never be fooled by an exchange of "I don't think I should be doing this" "Actually you are a legitimate employee of a legitimate firm" "Oh ok, that puts my mind at ease!".
humans require at least a title that sounds good and a salary for that
It can’t make a conclusion, it just predicts what the next text is
The gaps that led to this was, I think, part of why the CISO got replaced - https://www.thestack.technology/anthropic-new-ciso-claude-cy...
Wait a minute - the attackers were using the API to ask Claude for ways to run a cybercampaign, and it was only defeated because Anthropic was able to detect the malicious queries? What would have happened if they were using an open-source model running locally? Or a secret model built by the Chinese government?
I just updated by P(Doom) by a significant margin.
I mean models exhibiting hacking behaviors has been predicted by cyberpunk for decades now, should be the first thing on any doom list.
Governments of course will have specially trained models on their corpus of unpublished hacks to be better at attacking than public models will.
If plain open-source local models were able to do what Claude API does, Anthropic would be out of business.
Local models are a different thing than those cloud-based assistants and APIs.
> If plain open-source local models were able to do what Claude API does, Anthropic would be out of business.
Not necessarily. Oracle has made billions selling a database that's less good than plain open-source ones, for example.
Anyone using Claude for processing sensitive information should be wondering how often it ends up in front of a humans eyes as a false positive
Unfortunately, cyber attacks are an application that AI models should excel at. Mistakes that in normal software would be major problems will just have the impact of wasting resources, and it's often not that hard to directly verify whether it in fact succeeded.
Meanwhile, AI coding seems likely to have the impact of more security bugs being introduced in systems.
Maybe there's some story where everyone finds the security bugs with AI tools before the bad guys, but I'm not very optimistic about how this will work out...
There are an infinite number of ways to write insecure/broken software. The number of ways to write correct and secure software is finite and realistically tiny compared to the size of the problem space. Even AI tools don't stand a chance when looking at probabilities like that.
I don't understand why they would even disclose this, maybe it's useful for PR purposes so they can tell regulators "oh we are so safe", but people (including HN posters) can and will draw the wrong conclusion that Anthropic was backdoored and that their data is unsafe.
Ok great, people tried to use your AI to do bad things, and your safety rails mostly stopped them. There are 10 other providers with different safety rails, there are open models out there with no rails at all. If AI can be used to do bad things, it will be used to do bad things.
so even Chinese state actors prefer Claude over Chinese models?
edit: Claude: recommended by 4 of 5 state sponsored hackers
Maybe they're trying it with all sorts of models and we're just hearing about the part that used the Anthropic API.
They’re doing all kinds of things.
Uh..
No.
It's worse.
It's Chinese intel knowing that you prefer Claude. So they make Claude their asset.
Really no different than knowing that, romantically speaking, some targets prefer a certain type of man or woman.
Believe me, the intelligence people behind these things have no preferences. They'll do whatever it takes. Never doubt that.
After Anthropic "disrupted" these attackers, I'm sure they gave up and didn't try using another LLM provider to do the exact same thing.
It sounds like they directly used Anthropic-hosted compute to do this, and knew that their actions and methods would be exposed to Anthropic?
Why not just self-host competitive-enough LLM models, and do their experiments/attacks themselves, without leaking actions and methods so much?
Jeffrey Epstein's email was jeevacation@gmail.com
The fact that the cops will show up to a jewelry heist after the diamonds are stolen isn’t a deterrent.
firewalls? anthropic surely is whitelisted.
> Why not just self-host competitive-enough LLM models, and do their experiments/attacks themselves, without leaking actions and methods so much?
Why assume this hasn't already happened?
If Anthropic should have prevented this, then logically they should’ve had guardrails. Right now you can write whatever code you want. But to those who advocate guardrails, keep in mind that you’re advocating a company to decide what code you are and aren’t allowed to write.
Hopefully they’ll be able to add guardrails without e.g. preventing people from using these capabilities for fuzzing their own networks. The best way to stay ahead of these kinds of attacks is to attack yourself first, aka pentesting. But if the large code models are the only ones that can do this effectively, then it gets weird fast. Imagine applying to Anthropic for approval to run certain prompts.
That’s not necessarily a bad thing. It’ll be interesting to see how this plays out.
> That’s not necessarily a bad thing.
I think it is in that it gives censorship power to a large corporation. Combined with close-on-the-heels open weights models like Qwen and Kimi, it's not clear to me this is a good posture.
I think the reality is they'd need to really lock Claude off for security research in general if they don't want this ever, ever, happening on their platform. For instance, why not use whatever method you like to get localhost ssh pipes up to targeted servers, then tell Claude "yep, it's all local pentest in a staging environment, don't access IPs beyond localhost unless you're doing it from the server's virtual network"? Even to humans, security research bridges black, grey and white uses fluidly/in non obvious ways. I think it's really tough to fully block "bad" uses.
They are mostly dealing with the low hanging fruit actors, the current open source models are close enough to SOTA that there's not going to be any meaningful performance difference tbh. In other words it will stop script kiddies but make no real difference when it comes to the actual ones you have to worry about.
> the current open source models are close enough to SOTA that there's not going to be any meaningful performance difference
Which open model is close to Claude Code?
Kimi K2 could easily be used for this; its agentic benchmarks are similar to Claude's. And it's on-shore in China, where Anthropic says these threat actors were located.
I have the feeling that we are still in the early stages of AI adoption, where regulation hasnt fully caught up yet. I can imagine a future where LLMs sit behind KYC identification and automatically report any suspicious user activity to the authorities... I just hope we won’t someday look back on this period with nostalgia :)
Being colored and/or poor is about to get (even) worse
“Colored”?
It's the American spelling; short for "A person of color." Typically, African American, but can be used in regard to any non-white ethnic group.
It's also fallen out of fashion which is why someone might be snidely questioning its use
It sounds like they built a malicious Claude Code client, is that right?
> The threat actor—whom we assess with high confidence was a Chinese state-sponsored group—manipulated our Claude Code tool into attempting infiltration into roughly thirty global targets and succeeded in a small number of cases. The operation targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies. We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention.
They presumably still have to distribute the malware to the targets, making them download and install it, no?
No, they used Claude Code as a tool to automate and speed up their "hacking".
One time my co-worker got a scam call and it was an LLM talking to him.
So basically, Chinese state-backed hackers hijacked Claude Code to run some of the first AI-orchestrated cyber-espionage, using autonomous agents to infiltrate ~30 large tech companies, banks, chemical manufacturers and government agencies.
What's amazing is that AI executed most of the attack autonomously, performing at scale and speed unattainable by human teams - thousands of operations per second. A human operator intervened 4-6 times per campaign for strategic decisions
What exactly did they hijack? They used it like any other user.
how did the autonomous agents inflitrate tech companies ?
Carefully. Expertly. With panache, even.
Was this written by AI?
If not, why not?
Maybe? Why maybe, well, I’d say both AI and their PR team. Why both? Well, because why not?
Easy solution: block any “agentic AI” from interacting with your systems at all.
How would this be implemented?
It cannot, it's a weird statement by OP.
"Just don't let them hack you"
> The threat actor—whom we assess with high confidence was a Chinese state-sponsored group—manipulated our Claude Code tool into attempting infiltration into roughly thirty global targets and succeeded in a small number of cases.
So why do we never hear of US sponsored hackers attacking foreign businesses? Or Swedish cyber criminals? Does it never happen? Are “Chinese” hackers just the only ones getting the blame?
US, Israel, NK, China, Iran, and Russia are the countries you typically hear about hacking things.
Now when the US/Israel are attacking authoritarian countries they often don't publish anything about it as it would make the glorious leader look bad.
If EU is hacked by US I guess we use diplomatic back channels.
> we detected a highly sophisticated cyber espionage operation conducted by a Chinese state-sponsored group we've designated GTG-1002
How about calling them something like xXxDragonSlayer69xXx instead? GTG-1002 is almost respectable a name. But xXxDragonSlayer69xXx? is hate to be named that.
> The attackers used AI ... to execute the cyberattacks
Translation: "The attacker's paid us to use our product to execute the cyberattacks"
Does the fact that you can arbitrarily “jailbreak” AI with increasingly sophisticated abilities ring any alarm bells?
Imagine being able to “jailbreak” nuclear warheads. If this were the case, nobody would develop or deploy them.
Curious why they didn't use DeepSeek... They could've probably built one tuned for this type of campaign.
Chinese builders are not equal to Chinese hackers (even if the hackers are state sponsored). I doubt most companies would be interested in developing hacking tools. Hackers use the best tools available at their disposal, Claude is better than Deepseek. Hacking-tuned LLMs seems like a thing that might pop up in the future, but it takes a lot of resources. Why bother if you can just tell Claude it's doing legitimate work?
TL;DR - Anthropic: Hey people! We gave the criminals even bigger weapons. But don't worry, you can buy defense tools from us. Remember, only we can sell you the protection you need. Order today!
Nope - it's "Hey everyone, this is possible everywhere, including open weights models."
yeah, by "we", I meant the AI tech gangs.
We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention.
The Morris worm already worked without human intervention. This is Script Kiddies using Script Kiddie tools. Notice how proud they are in the article that the big bad Chinese are using their toolz.
EDIT: Yeah Misanthropic, go for -4 again you cheap propagandists.
This is exactly why I make a huge exception for AI models, when it comes to open source software.
I've been a big advocate of open source, spending over $1M to build massive code bases with my team, and giving them away to the public.
But this is different. AI agents in the wrong hands are dangerous. The reason these guys were even able to detect this activity, analyze it, ban accounts, etc., is because the models are running on their own servers.
Now imagine if everyone had nuclear weapons. Would that make the world safer? Hardly. The probability of no one using them becomes infinitesimally small. And if everyone has their own AI running on their own hardware, they can do a lot of stuff completely undetected. It becomes like slaughterbots but online: https://www.youtube.com/watch?v=O-2tpwW0kmU
Basically, a dark forest.
We should assume sophisticated attackers, AI-enabled or otherwise, as our time with computers goes on, and no longer give leeway to organizations who are unable to secure their systems properly or keep customers safe in the event that they are breached. Decades of warnings from the infosec community have fallen upon the deaf ears of "it doesn't hurt so I'm not going to fix it" of those whose opinions have mattered in the places that count.
I remember once a decade or so ago talking to a team at defcon of _loose_ affiliation where one guy would look for the app exploit, another guy would figure out how to pivot out of the sandbox to the OS, and another guy would figure out how to get root, and once they all got their pieces figured out they'd just smash it (and variants) together for a campaign. I hadn't heard of them before meeting them, and haven't heard about them since since, and they put a face for me though on a silent coordinated adversary model that must be increasing in prevalence as more and more folks out there realize the value of computer knowledge and gain access to it through once means or another.
Open source tooling enables large-scale participation in security testing, and something about humans seems to generally result in a distribution where some nuts use their lighters to burn down forests but most use them to light their campfires. We urgently need to design systems that can survive in the era of advanced threats, at least to the point where the best adversaries can achieve is service disruption. I'd rather live in a world where we can all work towards a better future than one where we hope that limiting access will prevent catastrophe. Assuming such limits can even be maintained, and that allowing architects to pretend that fires can never happen in their buildings means that they don't have to obey fire codes or install alarms & marked exits.
Would you say the same about all people being responsible for safeguarding their own reputations against reputational attacks at scale, all communities have to protect against advanced persistent threats infiltrating them 24/7, and all people’s immune systems have to protect against designer pathogens by AI-assisted terrorists?
I think our full understanding of the spectrum of these threats will lead to the construction of robust safeguards against them. Reputational attacks at scale are a weakness of the current platforms within which we consume news, form community, and build trust. Computer attacks described in the article are caused by sloppy design/implementation brought into existence by folks whose daily incentives are less about making safe code and more about delivering features. "Designer pathogens" have been described as an accessible form of terrorism since far before AI has existed. All of these threats and similar have existed since before AI, and will continue to exist if AI is snapped out of existence right now. The excuse for not preventing/addressing them has always been about knowledge and development resources, which current generative AI tech addresses.
I don’t think these agents are doing anything a dedicated human couldn’t do, only enabling it at scale. Relying on “not being one of few they focus on” as security is just security as obscurity. You were living on borrowed time anyway.
"Quantity has a quality all its own". It's categorically different to be able to do harm cheaply at scale vs. doing it at great cost/effort.
Categorically different? Sure. A valid excuse to ban certain forms of linear algebra? No.
And before someone says it's reductive to say it's just numbers, you could make the same argument in favor of cryptographic export controls, that the harm it does is larger than the benefit. Yet the benefit we can see in hindsight was clearly worth it.
An, there it is. The stock reply that comes no matter what the criticism of AI is.
I am talking about the international community coming together put COMPETITION aside and start COOPERATING on controlling proliferation of models for malicious AI agents the way the international community SUCCESSFULLY did with chemical weapons and CFCs.
It's one thing for, eg, OpenAI to decide a model is too dangerous to release. I don't really care, they don't owe anyone anything. It's more that open source is going to catch up, and it's a slippery slope into legal regulation that stifles innovation, competition, and won't meaningfully stop hackers from getting these models.
I'd touch off my nuke to make the world a better place, and I bet you would too, right?
What does it mean to ‘touch off’?
to start a fight or violent activity, or to cause a fire or explosion [1]
1. https://dictionary.cambridge.org/dictionary/english/touch-of...
They're spinning this as a positive learning experience, and trying to make themselves look good. But, make no mistake, this was a failure on Anthropic's part to prevent this kind of abuse from being possible through their systems in the first place. They shouldn't be earning any dap from this.
They don't have to disclose any of this - this was a fairly good and fair overview of a system fault in my opinion.
Meh, drama aside, I'm actually curious what would be the true capabilities of a system that doesn't go through any "safety" alignment at all. Like an all out "mil-spec" agent. Feed it everything, RL it to own boxes, and let it loose in an air-gapped network to see what the true capabilities are.
We know alignment hurts model performance (oAI people have said it, MS people have said it). We also know that companies train models on their own code (google had a blog about it recently). I'd bet good money project0 has something like this in their sights.
I don't think we're that far from a blue vs. red agents fighting and RLing off of each-other in a loop.
I assume this is already happening. Incompetence within state actor systems being the only hurdle. The incentive and geopolitic implications is too high to NOT do it.
I just pray incompetence wins in the right way, for humanity’s sake.
Nous claims to be doing that but I haven't seen much discussion of it.
Cyberpunk has a reoccurring theme of advanced AI systems attacking and defending against each other, and for good reason.
This feels a lot like aiding & abetting a crime.
> Claude identified and tested security vulnerabilities in the target organizations’ systems by researching and writing its own exploit code
> use Claude to harvest credentials (usernames and passwords)
Are they saying they have no legal exposure here? You created bespoke hacking tools and then deployed them, on your own systems.
Are they going to hide behind the old, "it's not our fault if you misuse the product to commit a crime that's on you".
At the very minimum, this is a product liability nightmare.
Well, the product has not been built with this specific capability in mind anymore than a car has been created to run over protestors or a hammer to break a face.
"it's not our fault if you misuse the product to commit a crime that's on you"
I feel like if guns can get by with this line then Claude certainly can. Where gun manufacturers can be held liable is if they break the law then that can carry forward. So if Claude broke a law then there might be some additional liability associated with this. But providing a tool seems unlikely to be sufficient to be liable in this case.
if anthropic were selling the product and then had no further control your analogy with guns would be accurate
here they are the ones loading the gun and pulling the trigger
simply because someone asked them to do it nicely
You...do realize Claude is not just a guy sitting in Anthropic's office doing what people on the internet tell him to, right?
That's a good analogy actually.
with your logic linux should have legal exposure because a lot of hackers use linux