This is a trendy article, rehashing themes that were prevalent over the last year, and, like those themes, will age like milk.
If you look at the past 3 years and plot capabilities in 3 key areas, the conclusions will be vastly different.
Code completion was "awww, how cute, this almost looks like python" in early 2023. It's now at the level of "oh my, this actually looks decent".
Then there's e2e "agentic" stuff, where you needed tons of glue 2 years ago to have a decent workflow working 50% of the time. Now you have agents taking a spec, working for 2h uninterrupted, and delivering working, tested, linted code. Unattended.
Lastly, these capabilities have led to CTF challenges going from 0 - 80% since RL was used to train these things. The first one was ~2y ago when a popular CTF site saw the first <10s capture on a new task. Now, several companies are selling CTF as a service, with more and more competitions being dominated by said agents.
So yeah, rehashing all the old "arguments" is a futile attempt. This thing is getting better and better. RL does something really interesting, unlocking an interesting fixation with task completion. Give it a verifiable reward (i.e. capture a flag), and it will bang its head against the wall until it gets that flag. And what's more important, in security stuff you don't need perfect accuracy, nor maj@n. What you're looking for is pass@n, which usually gives 20-30% more on any benchmark. So, yeah, all your flags are belong to AI.
----
AI will compromise your cybersecurity posture, but that's because our postures have been bad all along. It will find more and more exploits, and the value in red-blue teams will be much more than the "bugs" and "exploits" LLM-assisted coding will "bring". Those will get automatically caught as well. But there's vastly more grass-fed guaranteed human-wrote good old fashion bugs out there.
Has anyone noticed how poorly tools like claude code (the main one I tried) themselves are working? You'd expect software from company with an infinite AI allowance to be unattainably excellent, instead it lags, hangs, flickers, and feels like unpleasant mvp mess.
I hear at every corner people telling, how they can 100x now, and if my AI use is not laying prime code it's my skill issue. But where is this excellent AI generated software? Do you maybe have some examples you can share?
A lot of good information for infra teams to internalise, although I worry that it gets a bit lost in the structure of the piece (there's kind of like 3-5 separate essays here but nothing a good edit couldn't fix.) One thing I'll add (or at least crystallise because I think the pieces are there) is that attack surface management is critical. A lot of the issues here are relevant in exactly the same scenario as exposing web applications. I have reported vulnerabilities in a lot of AI applications in prod and the issues aren't magic or even novel. They're typically the same authorisation and injection issues people have been talking about for decades. The methods of securing them are the same. Unfortunately it's not uncommon for companies to get compromised via a good old fashioned REST API on an exposed dev domain, but I probably wouldn't go so far as to say "REST APIs will compromise your cybersecurity posture." I would just say companies have found another tool to flex their indifference towards protecting user and company data.
Properly securing LLMs goes agains branding, I guess. "this tool is like getting new intern every 15 minutes! they read and write fast and know a lot of stuff, but can accidentally attack or sabotage you if they get distracted! oh, and they work remotely only!" doesn't sound like a good pitch
I have been asking if the business would be happy to employ an extremely gullible insider with a short memory, who sometimes just makes things up, with no fear of any legal repercussions or being fired, to work on important stuff.
lights cigarette Not mine, directly, but I'm sure I'll be part of the next 150-million-strong data breach because some suit shouted, red-faced, "WE NEED AI" into a Teams meeting, and several people with mortgages and children made it happen.
This is a trendy article, rehashing themes that were prevalent over the last year, and, like those themes, will age like milk.
If you look at the past 3 years and plot capabilities in 3 key areas, the conclusions will be vastly different.
Code completion was "awww, how cute, this almost looks like python" in early 2023. It's now at the level of "oh my, this actually looks decent".
Then there's e2e "agentic" stuff, where you needed tons of glue 2 years ago to have a decent workflow working 50% of the time. Now you have agents taking a spec, working for 2h uninterrupted, and delivering working, tested, linted code. Unattended.
Lastly, these capabilities have led to CTF challenges going from 0 - 80% since RL was used to train these things. The first one was ~2y ago when a popular CTF site saw the first <10s capture on a new task. Now, several companies are selling CTF as a service, with more and more competitions being dominated by said agents.
So yeah, rehashing all the old "arguments" is a futile attempt. This thing is getting better and better. RL does something really interesting, unlocking an interesting fixation with task completion. Give it a verifiable reward (i.e. capture a flag), and it will bang its head against the wall until it gets that flag. And what's more important, in security stuff you don't need perfect accuracy, nor maj@n. What you're looking for is pass@n, which usually gives 20-30% more on any benchmark. So, yeah, all your flags are belong to AI.
----
AI will compromise your cybersecurity posture, but that's because our postures have been bad all along. It will find more and more exploits, and the value in red-blue teams will be much more than the "bugs" and "exploits" LLM-assisted coding will "bring". Those will get automatically caught as well. But there's vastly more grass-fed guaranteed human-wrote good old fashion bugs out there.
Some citations would help your case a lot.
Has anyone noticed how poorly tools like claude code (the main one I tried) themselves are working? You'd expect software from company with an infinite AI allowance to be unattainably excellent, instead it lags, hangs, flickers, and feels like unpleasant mvp mess.
I hear at every corner people telling, how they can 100x now, and if my AI use is not laying prime code it's my skill issue. But where is this excellent AI generated software? Do you maybe have some examples you can share?
> Do you maybe have some examples you can share?
Microsoft 365 Copilot /s
A lot of good information for infra teams to internalise, although I worry that it gets a bit lost in the structure of the piece (there's kind of like 3-5 separate essays here but nothing a good edit couldn't fix.) One thing I'll add (or at least crystallise because I think the pieces are there) is that attack surface management is critical. A lot of the issues here are relevant in exactly the same scenario as exposing web applications. I have reported vulnerabilities in a lot of AI applications in prod and the issues aren't magic or even novel. They're typically the same authorisation and injection issues people have been talking about for decades. The methods of securing them are the same. Unfortunately it's not uncommon for companies to get compromised via a good old fashioned REST API on an exposed dev domain, but I probably wouldn't go so far as to say "REST APIs will compromise your cybersecurity posture." I would just say companies have found another tool to flex their indifference towards protecting user and company data.
Properly securing LLMs goes agains branding, I guess. "this tool is like getting new intern every 15 minutes! they read and write fast and know a lot of stuff, but can accidentally attack or sabotage you if they get distracted! oh, and they work remotely only!" doesn't sound like a good pitch
Haha, yes.
I have been asking if the business would be happy to employ an extremely gullible insider with a short memory, who sometimes just makes things up, with no fear of any legal repercussions or being fired, to work on important stuff.
Strangely this is not a compelling proposal.
It seemed obvious from the outset that AI would be or become a security risk.
lights cigarette Not mine, directly, but I'm sure I'll be part of the next 150-million-strong data breach because some suit shouted, red-faced, "WE NEED AI" into a Teams meeting, and several people with mortgages and children made it happen.
I'm sorry you have to use Teams, but at least they let you smoke
> and several people with mortgages and children made it happen.
Solution seems to be don’t have kids.
Then the employees are less scared of losing their jobs and can push back against management’s idiotic AI requests.
When you put it that way, I will gladly live in fear. Some things in life are just more important than getting to have your way all the time at work.
No shit Sherlock
we're just waking up to a flood of browser extensions with "AI integration" that are harvesting content and ex-filtrating it.
The only solution is to tighten the nuts down and block all but approved extensions. not a super popular position.
Or to do things offline instead of rushing to put all info online.