I've spoken with quite a few C++ developers who swear that they don't need memory safety tooling because they're good enough to achieve it on their own. More than a few also expect that the worst that can happen from breaking memory safety is a SEGFAULT, which also suggests that they don't understand memory safety.
So, I'd say that there is still some outreach to do on the topic.
On the other hand, you're absolutely right that Rust is only one of the many ways to get there.
OK, so those are skeptics about tooling, not about memory safety per se.
And not even about tooling per se, since achieving safety on their own doesn't literally mean on their own; they are relying on tooling.
True Scotsman's "on your own" means working in assembly language, in which you have to carefully manage even just calling a function and returning from it: not leaving stray arguments on the stack, saving and restoring all callee-saved register that are used in the function and so on.
Someone who thinks that their job is not to have segfaults is pretty green, obviously.
I wish memory safety skepticism was nothing more than a rhetorical strawman. It's not hard to find prominent people who think differently though. Take Herb Sutter for example, who argues that "memory safety" as defined in this article is an extreme goal and we should instead focus on a more achievable 95% safety instead to spend the remaining effort on other types of safety.
I can also point to more extreme skeptics like Dan O'dowd, who argue that memory safety is just about getting gud and you don't actually need language affordances.
Discussions about this topic would be a lot less heated if everyone was on the same page to start. They're not. It's taken advocates years of effort to get to the point where we can start talking about memory safety without immediate negative reactions and that process is still ongoing.
It's hard to imagine that if a memory problem were reported to Sutter about one of his own programs, that he would not prioritize fixing that, over most other work.
However, I imagine he would probably take into consideration the context. Who and what is the program for? And does the issue only reproduce if the program is misused? Does the program handle untrusted inputs? Or are there conceivable situations in which a user of the program could be duped by a bad actor into feeding the program a malicious input?
Imagine Sutter wrote a C compiler, and someone found a way to crash it. But the only way to reproduce that crash is via code that invokes undefined behavior. why would Herb prioritize fixing that over other work?
Suppose the user insists that he's running the compiler as a CGI script, allowing unauthenticated visitors to their site to compile programs, making it a security issue.
Yeah, what kind of Crazy Person would make a web site where unauthenticated visitors can write programs and it just compiles them?
What would you even call such a thing? "Compiler Explorer"?
I guess maybe if Herb had helped the guy who owned that web site, say, Matt Godbolt, to enable his "Syntax 2 for C++" compiler cppfront, on that site, it would feel like Herb ought to take some responsibility right ?
It's worth differentiating the case of a specific program from the more general case of memory safety as a language feature. A specific program might take additional measures appropriate to the problem domain like static analysis or using a restricted subset of the language. Memory safety at the language level has to work for most or all code written using that language.
Herb is usually talking about the latter because of the nature of his role, like he does here [0]. I'm willing to give him the benefit of the doubt on his opinions about specific programs, because I disagree with his language opinions.
> Take Herb Sutter for example, who argues that "memory safety" as defined in this article is an extreme goal and we should instead focus on a more achievable 95% safety
I wonder how you figure out when your codebase has reached 95% safety? Or is it OK to stop looking for memory unsafety when you hit, say, 92% safe?
> Take Herb Sutter for example, who argues that "memory safety" as defined in this article is an extreme goal and we should instead focus on a more achievable 95% safety instead to spend the remaining effort on other types of safety.
I don't really see how that's a) a scepticism of memory safety or b) how it's not seen as a reasonable position. Just because someone doesn't think X is the most important thing ever doesn't mean they are skeptical of it, but rather that the person holding the 100% viewpoint is probably the one with the extreme position.
[A] program execution is memory safe so long as a particular list of bad things, called memory-access errors, never occur
"95% memory safety" is not a meaningful concept under this definition! That's very much skepticism of memory safety as defined in this article, to highlight the key phrase in the comment you're quoting.
It's also not a meaningful concept within the C++ language standard written by the committee Herb Sutter chairs. Memory unsafety is undefined behavior (UB). C++ code containing UB has no defined semantics and is inherently incorrect, whether that's 1 violation or 1000.
Now, we can certainly discuss the practical ramifications of 95% vs 100%, but even here Herb's arguments have fallen notoriously flat. I'll link Sean Baxter's piece on why Herb's actual proposals fail to achieve even these more modest goals as an entry point [0]. No need to rehash the volumes of digital ink already spilled on this subject in this particular comment thread.
Skepticism of an absolutist binary take on memory safety is not the same as skepticism of memory safety in general and it's important to distinguish the two.
It's like saying that people skeptical of formal verification are actually skeptical of eliminating bugs. Most people are not skeptical of eliminating bugs, but they might be skeptical of extreme approaches to do so.
We simply don't treat "gcc segfaults on my example.c" file the same way as "libssl has an exploitable buffer overflow". That's a synopsis of the nuance.
Materials to be consumed by engineers are often unsafe when misused. Not just programs like toolchains with undefined behaviors, but in general. Steel beams buckle of overloaded. Transistors overhead and explode outside of their SOA (safe operating area).
When engineers make something for the public, their job is to combine the unsafe bits, but make something which is safe, even against casual misuse.
When engineers make something for other engineers, that is less so; engineers are expected to read the data sheet.
I prefer to treat testing like insurance. You purchase enough insurance to get the coverage you need, and not a penny more. Anything beyond that could be invested better.
Same thing with tests, get the coverage you need to build the confidence in your codebase, but don't tie yourself in knots trying to get that last 10%. It's not worth it. Create some manual and integration tests and move one.
I feel like type safety, memory safety, thread safety, etc. are are all similar. Building a physics core to simulate the stability of your nuclear stockpile? The typing should be second to none. Building yet another CSV exporter? Who gives a damn.
This is a perfectly reasonable argument if memory safety issues are essentially similar to logic bugs, but memory unsafety isn't like a logic bug.
A logic bug in a library doesn't break unrelated code. It's meaningful to talk about the continued execution of a program in the presence of logic bugs. Logic bugs don't time travel. There are ways to exhaustively prove the absence of logic bugs, e.g. MC/DC or state space exploration, even if they're expensive.
None of these properties are necessarily true of memory safety. A single memory safety violation in a library can smash your stack, or allow your code to be exploited. You can't exhaustively defend against this with error handling either. In C and C++, it's not meaningful to even talk about continued execution in the presence of memory safety violations. In C++, memory safety violations can time travel. You typically can't prove the absence of memory safety violations, except in languages designed to allow that.
With appropriate caveats noted (Fil-C, etc), we don't have good ways to retrofit memory safety onto languages and programs built without it or good ways to exhaustively diagnose violations. All we can do is structurally eliminate the possibility of memory unsafety in any code that might ever be used in a context where it's an important property. That's most code.
If your attacker controls the data you're exporting to a CSV file, they can take advantage of a memory safety issue in your CSV exporter to execute arbitrary code on your machine.
> Building yet another CSV exporter? Who gives a damn.
The problem with memory unsafe code is that it can have unexpected and unpredictable side-effects. Such as subtly altering the critical data you're exporting, of letting an attacker take control of your CSV exporter.
In other words, you need quite a lot of context to figure out that a memory bug in your CSV exporter won't be used for escalation. Figuring out that context, documenting it and making sure that the context never changes for the lifetime of your code? That sounds like a much complex proposition that using memory-safe tools in the first place.
I’m curious, what memory safe alternative is there for a C/C++ codebase that doesn’t give up performance?
Also for what it’s worth Rust ports tend to perform faster according to Russinovich. Part of that may be second system syndrome although the more likely explanation is that the default std library is just better optimized (eg hash tables in Rust are significantly better than unordered_map)
Ada has been around for years. The approach to memory safety isn't as strong as Rust, but it is a lot strong than C or C++. C++ is also adding a lot of memory safety, it is a lot easier to bypass than it is in Rust (though I've seen Rust code where everything is marked unsafe), but you still get some memory safety if you try.
All benchmarks between Ada, C, C++, and Rust (and others) should come down to a wash. A skilled programmer can find a difference but it won't be significant. A skilled C++ programmer wouldn't be using unordered_map so it is unfair to point out you can use something bad.
It has but you need spark too to avoid the runtime overhead. And I haven’t seen adoption of Ada in the broader industry so I wouldn’t pick it based on that. I would need to understand why it remains limited to industry’s that mandate government certification.
> A skilled C++ programmer wouldn't be using unordered_map so it is unfair to point out you can use something bad.
Pretending defaults don’t matter is naive especially in a language that is so hostile to being easy to add 3p dependencies (even without that defaults matter).
std::unordered_map basically specifies a bucket-based hashtable implementation (read: lots of extra pointer chasing). Most high-performance hashtables are based on probing.
Bluntly: exactly why does Ada matter, at all? The code footprint of software (1) written in Ada and (2) of concern when we talk about memory safety has measure zero. Is Ada better than C++? It turns out, I don't have to care: to go from C++ to Ada, one needs to rewrite, and if one is going to rewrite for memory safety, they're not going to rewrite to Ada.
It may be that they've implemented it differently in a way that is more performant but has fewer features. A "rust port" is not automatically or apparently a 1:1 comparison.
It could be, but it's often just that the things you got in the box were higher quality and so your results are higher quality by default.
Better types like VecDeque<T>, better implementations of common ideas like sorting, even better fundamental concepts like providing the destructive move, or the owning Mutex by default.
Even the unassuming growable array type, Rust's Vec<T>, is just plain better than C++ std::vector<T>. It's not a huge difference and for many applications it won't matter, but that's the sort of pervasive quality difference I'm talking about and so I can well believe that in practice this ends up showing through.
There was an article about Zig on the front page just a few hours ago that attracted many "Why do I need memory safety?" comments. The fact that new languages like Zig aren't taking memory safety as a foundational design goal should be evidence enough that many people are still skeptical about its value
Zig's approach to memory safety is an open question. I don't like it (obviously, a very subjective statement), but as more software is written in it, we'll get empirical data about whether Zig's bet pays off. It very well might.
> Hopefully, there are no memory safety skeptics, other than rhetorical strawmen.
There are plenty of such skeptics. It's why Google, Microsoft, etc all needed to publish things like "70% of our vulnerabilities are memory-safety linked".
Even today, the increasing popularity of Zig indicates that memory-safety is not taken as a baseline.
1 The numbers on memory safety should nowadays separate between spatial ones (bounds-checked in most languages with sane flags) and temporal ones. Temporal ones will be lower than 70%.
2 The article does not mention (compilation) time costs of static checks and its influence on growing code bases, which is a more fundamental system trade-off on scalability and development velocity/efficiency.
> Wrap unsafe code with safe interfaces
3 That sounds like an equivalent of using permission-based separation logic hopefully soon available for LLVM.
> "Get good" is not a strategy
4 Being good is all about knowing exactly techniques, processes, and tools with exact trade-offs and applying them; so I would expect here teaching process knowledge about static and dynamic analysis strategies, tooling and tactics to eliminate bug classes.
However, neither have we sane bug classes overviews nor can we generate sane probabilities/statistics of occurrence based on source code and available static and dynamic analysis even when ignoring functionality requirements.
This reads somewhat like developers should "stay mediocre" and "trust the tools", even though "Get good and improve processes for groups" is probably the here intended strategy.
What do you mean by "temporal ones will be lower than 70%"? Are you suggesting --- referring to the article's cite --- that subcomponent ports to Rust reduce spatial vulnerabilities more than temporal ones? If so, why do you believe that?
I would expect more of the opposite for languages offering easy to enable bound checks. Only observing the general trend for when bound checks are available (and used) would be helpful information.
Temporal ones will remain tricky to debug until at least static + dynamic analysis via scheduling in kernel, hypervisor/simulation and/or run-time will become common-place. Probably even longer, because analysis is cumbersome for bigger programs without Seperation Logic (like Rust has).
If you're talking about the impact of replacing a subcomponent in a larger C/C++ codebase with a memory safe language and saying you'd expect that to make less of an impact on the temporal memory safety issues latent in the remaining C/C++ code, I guess I get that.
If you're saying that you think memory safe languages are less successful at dealing with their own temporal memory safety concerns than spatial memory safety concerns, that doesn't make sense to me and I would push back on it.
I do agree with point 1 and with most of point 2 besides some more arcane things like intentionally racy memory access and some write-cache eviction instruction not being properly modeled (in for example Rust).
Rust certainly was not the first "systems programming" language that was memory safe, ADA was aiming for the same title and I think achieved it way before Rust.
While Ada is a great and sadly underused language, if my memory serves, it's not out-of-the-box memory-safe by today's definitions. I seem to recall that it takes Spark to make it memory-safe.
What are todoay's definitions? If Ada simply had more thorough rules but introduced an "unsafe {}" construct, then what would the practical difference actually be? Compiler defaults?
Small nit: As someone curious about a definition of memory safety, I had come across Michael Hicks' post. He does not use the list of errors as definition, and argues that such a definition is lacking rigor and he is right. He says;
> Ideally, the fact that these errors are ruled out by memory safety is a consequence of its definition, rather than the substance of it. What is the idea that unifies these errors?
He then offers a technical definition (model) involving pointers that come with capability of accessing memory (as if carrying the bounds), which seems like one way to be precise about it.
I have come to the conclusion that language safety is about avoiding untrapped errors, also known as "undefined behavior". This is not at all new, it just seems to have been forgotten or was never widely known somehow. If interested, find the argument here https://burakemir.ch/post/memory-safety-the-missing-def/
What's important is the context in which the term is used today: it's specifically about security and software vulnerabilities, not about a broader notion of correctness and program reliability. Attempts to push past that have the effect of declaring languages like Java and Python memory-unsafe, which is not a defensible claim.
1) Null pointer derefs can sometimes lead to privilege escalation (look up "mapping the zero page", for instance). 2) As I understand it (could be off base), if you're already doing static checking for other memory bugs, eliminating null derefs comes "cheap". In other words, it follows pretty naturally from the systems that provide other memory safety guarantees (such as the famous "borrow checker" employed by Rust).
UB is in fact not worse than a memory safety issue, and the original question is a good one: NULL pointer dereferences are almost never exploitable, and preventing exploitation is the goal of "memory safety" as conceived of by this post and the articles it references.
This article comes up with yet another definition of memory safety. Thankfully, it does not conflate thread safety with memory safety. But it does a thing that makes is both inaccurate (I think) and also not helpful for having a good discussion:
TFA hints at memory safety requiring static checking, in the sense that it's written in a way that would satisfy folks who think that way, by saying thingys like "never occur" and including null pointer safety.
Is it necessary for the checking to be static? No. I think reasonable folks would agree that Java is memory safe, yet it does so much dynamic checking (null and bounds). Even Rust does dynamic checking (for bounds).
But even setting that aside, I don't like how the way that the definition is written in TFA doesn't even make it unambiguous if the author thinks it should be static or dynamic, so it's hard to debate with what they're saying.
EDIT: The definition in TFA has another problem: it enumerates things that should not happen from a language standpoint, but I don't think that definition is adequate for avoiding weird execution. For example, it says nothing about bad casts, or misuses of esoteric language features (like misusing longjmp). We need a better definition of memory safety.
I want to be there with you, but the definition this piece uses is, I think, objectively the correct one --- "memory safety", at least as used in things like "The Case For Memory Safe Roadmaps" government guidance, is simply the property of not admitting to memory corruption vulnerabilities.
I don't see where you're seeing the article drawing a line between static and dynamic defenses. The article opens by noticing Rust isn't the first memory safe language. It is by implication referring to things like Java, which have dynamic, runtime-based protections against memory corruption.
> I want to be there with you, but the definition this piece uses is, I think, objectively the correct one --- "memory safety", at least as used in things like "The Case For Memory Safe Roadmaps" government guidance, is simply the property of not admitting to memory corruption vulnerabilities.
This piece does not define memory safety as "not admitting memory corruption vulnerabilities". If it was using that definition, then:
- You and I would be on the same page.
- I would have a different complaint, which is that now we have to define "memory corruption vulnerability". (Admittedly, that's maybe not too hard, but it does get a bit weird when you get into the details.)
The definition in TFA is quoted from Hicks, and it enumerates a set of things that should never happen. It's not defining memory safety the way you want.
From this, I think we probably just mostly read the piece differently, and you probably read it more carefully than I did. (I know your background in this space).
I'm always a little guarded about message board definitions of "memory safety", because they tend to be axiomatically derived from the words "memory" and "safety", and they tend to have an objective of saying that there's only one mainstream language that provides memory safety.
> and they tend to have an objective of saying that there's only one mainstream language that provides memory safety.
Yeah!
I agree it's hard to do it without some kind of axioms or circularity, but it's also important to get the definition right, because at some point, we'll need to be able to objectively say whether some piece of critical software is memory safe, or not.
So we have to keep trying to find a good definition. I think that means rejecting the bad definitions.
TFA is too long, like all articles since the arrival of you know what. So the definitions are scattered. Here it claims:
Rust's big step function was to offer memory safety at compile time through the use of static analysis borrowed and grown out of prior efforts such as Cyclone, a research programming language formulated as a safe subset of C.
In other words, Rust has solved the halting problem since the static checking of array bounds is undecidable in the general case!
Nope. You don't need to "solve the halting problem" and I guess I'll explain why here.
Firstly lets attack the direct claim. The reason you'd reduce to the halting problem is via Rice's theorem. But, Rice only matters if we want to allow exactly all the programs with the desired semantics. In practice what you do is either allow some incorrect programs too (like C++ does, that's what IFNDR is about, now some programs have no defined behaviour at all, but oh well, at least they compiled) or you reject some correct programs too (as Rust does). Now what we're doing is merely difficult rather than provably impossible, and people do difficult things all the time.
This is an important choice (and indeed there's a third option but it's silly, you could do both, rejecting some correct programs while allowing some incorrect ones, the worst of both worlds) but in neither case do we need to solve an impossible problem.
Now, a brief aside on what Rust is actually doing here, because it'll be useful in a moment. Rust's compiler does not need to perform a static bounds check on array index, what Rust's compiler needs to statically check is only that somebody wrote the runtime check. This satisfies the requirement.
But now back to laughing at you. While in Rust it's common to have fallible bounds checks at runtime (only their presence being tested at compile time) in WUFFS it's common for all the checks to be at compile time and so to have your code rejected if the tools can't see why your indexing is always in bounds.
When WUFFS sees you've written arr[k] it considers this as a claim that you've proved 0 <= k < arr.len() if it can't see how you've proved that then your code is wrong, you get an error. The result is that you're going to write a bunch of math when you write software, but the good news is that instead of going unread because nobody reviewing the code bothered reading the maths the machine read your math and it runs a proof checker, so if you were wrong it won't compile.
I'm glad that I provide amusement, but rustc-1.63.0 still compiles this program that panics at runtime. My experience with Rust is 5 min, so the style may provide further amusement:
fn f(n : usize) -> usize {
if n == 0 {
10
} else if n % 32327 == 0 {
1
}
else {
f(n-1)
}
}
fn main() {
let a = [1,2,3,4,5];
let n = f(12312);
println!("{:?}", a[n]);
}
"offer memory safety at compile time through the use of static analysis"
For arrays, this problem is not computable at compile time, hence the sarcastic remark that, IF THE ABOVE DEFINITION IS TAKEN AT FACE VALUE, Rust must have solved the halting problem. Downvoters are so dumb here.
Why are you shouting? That's what twits do. You don't want to be a twit, do you? Read the site guidelines, emphasis is done with italicized text marked up with an * at the beginning and another * at the end.
But to respond to the topic at hand: Are you familiar with the distinction between sound (what Rust aims for) and complete analyses?
Are you familiar with the fact that Rust does array bounds checking at runtime, contrary to the cited claim? And that this was kind of the topic of this subthread?
If the 5 eye agencies recommend memory safety, can we conclude that they already get all their information via "AI" data harvesting in Office 365 and similar? What will happen to Pegasus? Or do they have yet unknown backdoors in Rust?
It seems obvious that with hardware-level memory safety on the way[1], just gradually modernizing existing C and C++ codebases to take advantage of safer constructs (like smart pointers or checked arithmetic) makes much more sense than rewriting everything in Rust. Even better, thanks to GCC, is that you don't need to sacrifice any portability to take advantage of even bleeding-edge features, due to its front-end/back-end separation and multitude of supported platform back-ends. Fish shell had to drop support for some platforms when it was rewritten in Rust[2].
Ugh. I know all that and I am still sick of hearing about memory safety. My teammates spend way more time fixing security issues in “safe” languages than C/C++/whatever. It simply doesn’t matter…
It's hard to compare the two. A low-level memory safety issue can intersect with security. So can a flaw in logic that touches on security, but is reproducible and not undefined in any way.
The latter can often be more easily exploited than the former, but the former can remain undetected longer, affect more components and installations, and be harder to reproduce in order to identify and resolve.
As an exmaple of "more easily exploited". Say that you have a web application that generates session cookies that are easy to forge, leading to session hijack. Not much skill is needed to do that, compared to exploiting a memory safety problem (particuarly if the platform has some layers defenses against it: scrambled address space, non-executable stacks, and whatnot).
> If you're tired of hearing about memory safety, this article is for you.
Tell me more about memory safety, any time; just hold the Rust.
Rust skeptics are not memory safety skeptics. Hopefully, there are no memory safety skeptics, other than rhetorical strawmen.
I've spoken with quite a few C++ developers who swear that they don't need memory safety tooling because they're good enough to achieve it on their own. More than a few also expect that the worst that can happen from breaking memory safety is a SEGFAULT, which also suggests that they don't understand memory safety.
So, I'd say that there is still some outreach to do on the topic.
On the other hand, you're absolutely right that Rust is only one of the many ways to get there.
OK, so those are skeptics about tooling, not about memory safety per se.
And not even about tooling per se, since achieving safety on their own doesn't literally mean on their own; they are relying on tooling.
True Scotsman's "on your own" means working in assembly language, in which you have to carefully manage even just calling a function and returning from it: not leaving stray arguments on the stack, saving and restoring all callee-saved register that are used in the function and so on.
Someone who thinks that their job is not to have segfaults is pretty green, obviously.
I wish memory safety skepticism was nothing more than a rhetorical strawman. It's not hard to find prominent people who think differently though. Take Herb Sutter for example, who argues that "memory safety" as defined in this article is an extreme goal and we should instead focus on a more achievable 95% safety instead to spend the remaining effort on other types of safety.
I can also point to more extreme skeptics like Dan O'dowd, who argue that memory safety is just about getting gud and you don't actually need language affordances.
Discussions about this topic would be a lot less heated if everyone was on the same page to start. They're not. It's taken advocates years of effort to get to the point where we can start talking about memory safety without immediate negative reactions and that process is still ongoing.
It's hard to imagine that if a memory problem were reported to Sutter about one of his own programs, that he would not prioritize fixing that, over most other work.
However, I imagine he would probably take into consideration the context. Who and what is the program for? And does the issue only reproduce if the program is misused? Does the program handle untrusted inputs? Or are there conceivable situations in which a user of the program could be duped by a bad actor into feeding the program a malicious input?
Imagine Sutter wrote a C compiler, and someone found a way to crash it. But the only way to reproduce that crash is via code that invokes undefined behavior. why would Herb prioritize fixing that over other work?
Suppose the user insists that he's running the compiler as a CGI script, allowing unauthenticated visitors to their site to compile programs, making it a security issue.
How should Herb reasonably reply to that?
Yeah, what kind of Crazy Person would make a web site where unauthenticated visitors can write programs and it just compiles them?
What would you even call such a thing? "Compiler Explorer"?
I guess maybe if Herb had helped the guy who owned that web site, say, Matt Godbolt, to enable his "Syntax 2 for C++" compiler cppfront, on that site, it would feel like Herb ought to take some responsibility right ?
Or maybe I am being unreasonable ?
It's worth differentiating the case of a specific program from the more general case of memory safety as a language feature. A specific program might take additional measures appropriate to the problem domain like static analysis or using a restricted subset of the language. Memory safety at the language level has to work for most or all code written using that language.
Herb is usually talking about the latter because of the nature of his role, like he does here [0]. I'm willing to give him the benefit of the doubt on his opinions about specific programs, because I disagree with his language opinions.
[0] https://herbsutter.com/2024/03/11/safety-in-context/
> Take Herb Sutter for example, who argues that "memory safety" as defined in this article is an extreme goal and we should instead focus on a more achievable 95% safety
I wonder how you figure out when your codebase has reached 95% safety? Or is it OK to stop looking for memory unsafety when you hit, say, 92% safe?
> Take Herb Sutter for example, who argues that "memory safety" as defined in this article is an extreme goal and we should instead focus on a more achievable 95% safety instead to spend the remaining effort on other types of safety.
I don't really see how that's a) a scepticism of memory safety or b) how it's not seen as a reasonable position. Just because someone doesn't think X is the most important thing ever doesn't mean they are skeptical of it, but rather that the person holding the 100% viewpoint is probably the one with the extreme position.
Look at the definition quoted in the article:
"95% memory safety" is not a meaningful concept under this definition! That's very much skepticism of memory safety as defined in this article, to highlight the key phrase in the comment you're quoting.It's also not a meaningful concept within the C++ language standard written by the committee Herb Sutter chairs. Memory unsafety is undefined behavior (UB). C++ code containing UB has no defined semantics and is inherently incorrect, whether that's 1 violation or 1000.
Now, we can certainly discuss the practical ramifications of 95% vs 100%, but even here Herb's arguments have fallen notoriously flat. I'll link Sean Baxter's piece on why Herb's actual proposals fail to achieve even these more modest goals as an entry point [0]. No need to rehash the volumes of digital ink already spilled on this subject in this particular comment thread.
[0] https://www.circle-lang.org/draft-profiles.html
Skepticism of an absolutist binary take on memory safety is not the same as skepticism of memory safety in general and it's important to distinguish the two.
It's like saying that people skeptical of formal verification are actually skeptical of eliminating bugs. Most people are not skeptical of eliminating bugs, but they might be skeptical of extreme approaches to do so.
We simply don't treat "gcc segfaults on my example.c" file the same way as "libssl has an exploitable buffer overflow". That's a synopsis of the nuance.
Materials to be consumed by engineers are often unsafe when misused. Not just programs like toolchains with undefined behaviors, but in general. Steel beams buckle of overloaded. Transistors overhead and explode outside of their SOA (safe operating area).
When engineers make something for the public, their job is to combine the unsafe bits, but make something which is safe, even against casual misuse.
When engineers make something for other engineers, that is less so; engineers are expected to read the data sheet.
I prefer to treat testing like insurance. You purchase enough insurance to get the coverage you need, and not a penny more. Anything beyond that could be invested better.
Same thing with tests, get the coverage you need to build the confidence in your codebase, but don't tie yourself in knots trying to get that last 10%. It's not worth it. Create some manual and integration tests and move one.
I feel like type safety, memory safety, thread safety, etc. are are all similar. Building a physics core to simulate the stability of your nuclear stockpile? The typing should be second to none. Building yet another CSV exporter? Who gives a damn.
Context is so damn important.
This is a perfectly reasonable argument if memory safety issues are essentially similar to logic bugs, but memory unsafety isn't like a logic bug.
A logic bug in a library doesn't break unrelated code. It's meaningful to talk about the continued execution of a program in the presence of logic bugs. Logic bugs don't time travel. There are ways to exhaustively prove the absence of logic bugs, e.g. MC/DC or state space exploration, even if they're expensive.
None of these properties are necessarily true of memory safety. A single memory safety violation in a library can smash your stack, or allow your code to be exploited. You can't exhaustively defend against this with error handling either. In C and C++, it's not meaningful to even talk about continued execution in the presence of memory safety violations. In C++, memory safety violations can time travel. You typically can't prove the absence of memory safety violations, except in languages designed to allow that.
With appropriate caveats noted (Fil-C, etc), we don't have good ways to retrofit memory safety onto languages and programs built without it or good ways to exhaustively diagnose violations. All we can do is structurally eliminate the possibility of memory unsafety in any code that might ever be used in a context where it's an important property. That's most code.
If your attacker controls the data you're exporting to a CSV file, they can take advantage of a memory safety issue in your CSV exporter to execute arbitrary code on your machine.
https://georgemauer.net/2017/10/07/csv-injection.html
> Building yet another CSV exporter? Who gives a damn.
The problem with memory unsafe code is that it can have unexpected and unpredictable side-effects. Such as subtly altering the critical data you're exporting, of letting an attacker take control of your CSV exporter.
In other words, you need quite a lot of context to figure out that a memory bug in your CSV exporter won't be used for escalation. Figuring out that context, documenting it and making sure that the context never changes for the lifetime of your code? That sounds like a much complex proposition that using memory-safe tools in the first place.
I’m curious, what memory safe alternative is there for a C/C++ codebase that doesn’t give up performance?
Also for what it’s worth Rust ports tend to perform faster according to Russinovich. Part of that may be second system syndrome although the more likely explanation is that the default std library is just better optimized (eg hash tables in Rust are significantly better than unordered_map)
A lot of Rust versus C or C++ comparisons be like: "Yo, check this Rust rewrite of Foo, which runs 2.5 faster¹ than the C original²".
---
1. Using 8 cores.
2. Single-threaded
Ada has been around for years. The approach to memory safety isn't as strong as Rust, but it is a lot strong than C or C++. C++ is also adding a lot of memory safety, it is a lot easier to bypass than it is in Rust (though I've seen Rust code where everything is marked unsafe), but you still get some memory safety if you try.
All benchmarks between Ada, C, C++, and Rust (and others) should come down to a wash. A skilled programmer can find a difference but it won't be significant. A skilled C++ programmer wouldn't be using unordered_map so it is unfair to point out you can use something bad.
It has but you need spark too to avoid the runtime overhead. And I haven’t seen adoption of Ada in the broader industry so I wouldn’t pick it based on that. I would need to understand why it remains limited to industry’s that mandate government certification.
> A skilled C++ programmer wouldn't be using unordered_map so it is unfair to point out you can use something bad.
Pretending defaults don’t matter is naive especially in a language that is so hostile to being easy to add 3p dependencies (even without that defaults matter).
> A skilled C++ programmer wouldn't be using unordered_map so it is unfair to point out you can use something bad.
C++ isn't my primary language. Pray tell - what's wrong with unordered_map, and what's the alternative?
std::unordered_map basically specifies a bucket-based hashtable implementation (read: lots of extra pointer chasing). Most high-performance hashtables are based on probing.
Bluntly: exactly why does Ada matter, at all? The code footprint of software (1) written in Ada and (2) of concern when we talk about memory safety has measure zero. Is Ada better than C++? It turns out, I don't have to care: to go from C++ to Ada, one needs to rewrite, and if one is going to rewrite for memory safety, they're not going to rewrite to Ada.
> Part of that may be second system syndrome
It may be that they've implemented it differently in a way that is more performant but has fewer features. A "rust port" is not automatically or apparently a 1:1 comparison.
It could be, but it's often just that the things you got in the box were higher quality and so your results are higher quality by default.
Better types like VecDeque<T>, better implementations of common ideas like sorting, even better fundamental concepts like providing the destructive move, or the owning Mutex by default.
Even the unassuming growable array type, Rust's Vec<T>, is just plain better than C++ std::vector<T>. It's not a huge difference and for many applications it won't matter, but that's the sort of pervasive quality difference I'm talking about and so I can well believe that in practice this ends up showing through.
There was an article about Zig on the front page just a few hours ago that attracted many "Why do I need memory safety?" comments. The fact that new languages like Zig aren't taking memory safety as a foundational design goal should be evidence enough that many people are still skeptical about its value
Zig's approach to memory safety is an open question. I don't like it (obviously, a very subjective statement), but as more software is written in it, we'll get empirical data about whether Zig's bet pays off. It very well might.
> Hopefully, there are no memory safety skeptics, other than rhetorical strawmen.
There are plenty of such skeptics. It's why Google, Microsoft, etc all needed to publish things like "70% of our vulnerabilities are memory-safety linked".
Even today, the increasing popularity of Zig indicates that memory-safety is not taken as a baseline.
Good point. There are even two posts about Zig on the front page along side this post.
> Rust skeptics are not memory safety skeptics
Definitely not all of them, yes.
> Hopefully, there are no memory safety skeptics, other than rhetorical strawmen.
You'll find the reality disappointing then…
1 The numbers on memory safety should nowadays separate between spatial ones (bounds-checked in most languages with sane flags) and temporal ones. Temporal ones will be lower than 70%.
2 The article does not mention (compilation) time costs of static checks and its influence on growing code bases, which is a more fundamental system trade-off on scalability and development velocity/efficiency.
> Wrap unsafe code with safe interfaces
3 That sounds like an equivalent of using permission-based separation logic hopefully soon available for LLVM.
> "Get good" is not a strategy
4 Being good is all about knowing exactly techniques, processes, and tools with exact trade-offs and applying them; so I would expect here teaching process knowledge about static and dynamic analysis strategies, tooling and tactics to eliminate bug classes. However, neither have we sane bug classes overviews nor can we generate sane probabilities/statistics of occurrence based on source code and available static and dynamic analysis even when ignoring functionality requirements. This reads somewhat like developers should "stay mediocre" and "trust the tools", even though "Get good and improve processes for groups" is probably the here intended strategy.
What do you mean by "temporal ones will be lower than 70%"? Are you suggesting --- referring to the article's cite --- that subcomponent ports to Rust reduce spatial vulnerabilities more than temporal ones? If so, why do you believe that?
I would expect more of the opposite for languages offering easy to enable bound checks. Only observing the general trend for when bound checks are available (and used) would be helpful information. Temporal ones will remain tricky to debug until at least static + dynamic analysis via scheduling in kernel, hypervisor/simulation and/or run-time will become common-place. Probably even longer, because analysis is cumbersome for bigger programs without Seperation Logic (like Rust has).
If you're talking about the impact of replacing a subcomponent in a larger C/C++ codebase with a memory safe language and saying you'd expect that to make less of an impact on the temporal memory safety issues latent in the remaining C/C++ code, I guess I get that.
If you're saying that you think memory safe languages are less successful at dealing with their own temporal memory safety concerns than spatial memory safety concerns, that doesn't make sense to me and I would push back on it.
I do agree with point 1 and with most of point 2 besides some more arcane things like intentionally racy memory access and some write-cache eviction instruction not being properly modeled (in for example Rust).
Rust certainly was not the first "systems programming" language that was memory safe, ADA was aiming for the same title and I think achieved it way before Rust.
While Ada is a great and sadly underused language, if my memory serves, it's not out-of-the-box memory-safe by today's definitions. I seem to recall that it takes Spark to make it memory-safe.
> by today's definitions.
What are todoay's definitions? If Ada simply had more thorough rules but introduced an "unsafe {}" construct, then what would the practical difference actually be? Compiler defaults?
This is a good article.
Small nit: As someone curious about a definition of memory safety, I had come across Michael Hicks' post. He does not use the list of errors as definition, and argues that such a definition is lacking rigor and he is right. He says;
> Ideally, the fact that these errors are ruled out by memory safety is a consequence of its definition, rather than the substance of it. What is the idea that unifies these errors?
He then offers a technical definition (model) involving pointers that come with capability of accessing memory (as if carrying the bounds), which seems like one way to be precise about it.
I have come to the conclusion that language safety is about avoiding untrapped errors, also known as "undefined behavior". This is not at all new, it just seems to have been forgotten or was never widely known somehow. If interested, find the argument here https://burakemir.ch/post/memory-safety-the-missing-def/
What's important is the context in which the term is used today: it's specifically about security and software vulnerabilities, not about a broader notion of correctness and program reliability. Attempts to push past that have the effect of declaring languages like Java and Python memory-unsafe, which is not a defensible claim.
What is the justification to call `Null pointer dereference` as a memory safety issue?
1) Null pointer derefs can sometimes lead to privilege escalation (look up "mapping the zero page", for instance). 2) As I understand it (could be off base), if you're already doing static checking for other memory bugs, eliminating null derefs comes "cheap". In other words, it follows pretty naturally from the systems that provide other memory safety guarantees (such as the famous "borrow checker" employed by Rust).
It's worse than a memory safety issue, it's undefined behaviour (at least in C, C++, and Rust)
UB is in fact not worse than a memory safety issue, and the original question is a good one: NULL pointer dereferences are almost never exploitable, and preventing exploitation is the goal of "memory safety" as conceived of by this post and the articles it references.
UB can lead to memory safety issues[0], among other terrible outcomes. Hence it’s worse than memory safety issues.
0: https://lwn.net/Articles/342330/
No, that doesn't hold logically.
Doesn't null-pointer-dereference always crash the application?
Is it only an undefined-behavior because program-must-crash is not the explicitly required by these languages' specs?
This article comes up with yet another definition of memory safety. Thankfully, it does not conflate thread safety with memory safety. But it does a thing that makes is both inaccurate (I think) and also not helpful for having a good discussion:
TFA hints at memory safety requiring static checking, in the sense that it's written in a way that would satisfy folks who think that way, by saying thingys like "never occur" and including null pointer safety.
Is it necessary for the checking to be static? No. I think reasonable folks would agree that Java is memory safe, yet it does so much dynamic checking (null and bounds). Even Rust does dynamic checking (for bounds).
But even setting that aside, I don't like how the way that the definition is written in TFA doesn't even make it unambiguous if the author thinks it should be static or dynamic, so it's hard to debate with what they're saying.
EDIT: The definition in TFA has another problem: it enumerates things that should not happen from a language standpoint, but I don't think that definition is adequate for avoiding weird execution. For example, it says nothing about bad casts, or misuses of esoteric language features (like misusing longjmp). We need a better definition of memory safety.
I want to be there with you, but the definition this piece uses is, I think, objectively the correct one --- "memory safety", at least as used in things like "The Case For Memory Safe Roadmaps" government guidance, is simply the property of not admitting to memory corruption vulnerabilities.
I don't see where you're seeing the article drawing a line between static and dynamic defenses. The article opens by noticing Rust isn't the first memory safe language. It is by implication referring to things like Java, which have dynamic, runtime-based protections against memory corruption.
> I want to be there with you, but the definition this piece uses is, I think, objectively the correct one --- "memory safety", at least as used in things like "The Case For Memory Safe Roadmaps" government guidance, is simply the property of not admitting to memory corruption vulnerabilities.
This piece does not define memory safety as "not admitting memory corruption vulnerabilities". If it was using that definition, then:
- You and I would be on the same page.
- I would have a different complaint, which is that now we have to define "memory corruption vulnerability". (Admittedly, that's maybe not too hard, but it does get a bit weird when you get into the details.)
The definition in TFA is quoted from Hicks, and it enumerates a set of things that should never happen. It's not defining memory safety the way you want.
From this, I think we probably just mostly read the piece differently, and you probably read it more carefully than I did. (I know your background in this space).
I'm always a little guarded about message board definitions of "memory safety", because they tend to be axiomatically derived from the words "memory" and "safety", and they tend to have an objective of saying that there's only one mainstream language that provides memory safety.
> and they tend to have an objective of saying that there's only one mainstream language that provides memory safety.
Yeah!
I agree it's hard to do it without some kind of axioms or circularity, but it's also important to get the definition right, because at some point, we'll need to be able to objectively say whether some piece of critical software is memory safe, or not.
So we have to keep trying to find a good definition. I think that means rejecting the bad definitions.
TFA is too long, like all articles since the arrival of you know what. So the definitions are scattered. Here it claims:
Rust's big step function was to offer memory safety at compile time through the use of static analysis borrowed and grown out of prior efforts such as Cyclone, a research programming language formulated as a safe subset of C.
In other words, Rust has solved the halting problem since the static checking of array bounds is undecidable in the general case!
Nope. You don't need to "solve the halting problem" and I guess I'll explain why here.
Firstly lets attack the direct claim. The reason you'd reduce to the halting problem is via Rice's theorem. But, Rice only matters if we want to allow exactly all the programs with the desired semantics. In practice what you do is either allow some incorrect programs too (like C++ does, that's what IFNDR is about, now some programs have no defined behaviour at all, but oh well, at least they compiled) or you reject some correct programs too (as Rust does). Now what we're doing is merely difficult rather than provably impossible, and people do difficult things all the time.
This is an important choice (and indeed there's a third option but it's silly, you could do both, rejecting some correct programs while allowing some incorrect ones, the worst of both worlds) but in neither case do we need to solve an impossible problem.
Now, a brief aside on what Rust is actually doing here, because it'll be useful in a moment. Rust's compiler does not need to perform a static bounds check on array index, what Rust's compiler needs to statically check is only that somebody wrote the runtime check. This satisfies the requirement.
But now back to laughing at you. While in Rust it's common to have fallible bounds checks at runtime (only their presence being tested at compile time) in WUFFS it's common for all the checks to be at compile time and so to have your code rejected if the tools can't see why your indexing is always in bounds.
When WUFFS sees you've written arr[k] it considers this as a claim that you've proved 0 <= k < arr.len() if it can't see how you've proved that then your code is wrong, you get an error. The result is that you're going to write a bunch of math when you write software, but the good news is that instead of going unread because nobody reviewing the code bothered reading the maths the machine read your math and it runs a proof checker, so if you were wrong it won't compile.
Edited: Fix off-by-one error
I'm glad that I provide amusement, but rustc-1.63.0 still compiles this program that panics at runtime. My experience with Rust is 5 min, so the style may provide further amusement:
> In other words, Rust has solved the halting problem
No one is making that claim.
"offer memory safety at compile time through the use of static analysis"
For arrays, this problem is not computable at compile time, hence the sarcastic remark that, IF THE ABOVE DEFINITION IS TAKEN AT FACE VALUE, Rust must have solved the halting problem. Downvoters are so dumb here.
> IF THE ABOVE DEFINITION IS TAKEN AT FACE VALUE
Why are you shouting? That's what twits do. You don't want to be a twit, do you? Read the site guidelines, emphasis is done with italicized text marked up with an * at the beginning and another * at the end.
But to respond to the topic at hand: Are you familiar with the distinction between sound (what Rust aims for) and complete analyses?
Are you familiar with the fact that Rust does array bounds checking at runtime, contrary to the cited claim? And that this was kind of the topic of this subthread?
https://stackoverflow.com/questions/28389371/why-does-rust-c...
> a roughly 70 percent reduction in memory-safety vulnerabilities.
Couldn't find this in the reference text. Is it my interpretation? https://www.memorysafety.org/docs/memory-safety/#how-common-...
If the 5 eye agencies recommend memory safety, can we conclude that they already get all their information via "AI" data harvesting in Office 365 and similar? What will happen to Pegasus? Or do they have yet unknown backdoors in Rust?
It seems obvious that with hardware-level memory safety on the way[1], just gradually modernizing existing C and C++ codebases to take advantage of safer constructs (like smart pointers or checked arithmetic) makes much more sense than rewriting everything in Rust. Even better, thanks to GCC, is that you don't need to sacrifice any portability to take advantage of even bleeding-edge features, due to its front-end/back-end separation and multitude of supported platform back-ends. Fish shell had to drop support for some platforms when it was rewritten in Rust[2].
[1] https://community.intel.com/t5/Blogs/Tech-Innovation/open-in...
[2] http://fishshell.com/blog/rustport/
Ugh. I know all that and I am still sick of hearing about memory safety. My teammates spend way more time fixing security issues in “safe” languages than C/C++/whatever. It simply doesn’t matter…
It's hard to compare the two. A low-level memory safety issue can intersect with security. So can a flaw in logic that touches on security, but is reproducible and not undefined in any way.
The latter can often be more easily exploited than the former, but the former can remain undetected longer, affect more components and installations, and be harder to reproduce in order to identify and resolve.
As an exmaple of "more easily exploited". Say that you have a web application that generates session cookies that are easy to forge, leading to session hijack. Not much skill is needed to do that, compared to exploiting a memory safety problem (particuarly if the platform has some layers defenses against it: scrambled address space, non-executable stacks, and whatnot).
What security issues are biting you in safe languages that wouldn't also appear in C/C++ ?
A large number of real world security issues are attacks on humans not software. No programming language can solve social engineering problems.