The performance observation is real but the two approaches are not equivalent, and the article doesn't mention what you're actually trading away, which is the part that matters.
The C++11 threadsafety guarantee on static initialization is explicitly scoped to block local statics. That's not an implementation detail, that's the guarantee.
The __cxa_guard_acquire/release machinery in the assembly is the standard fulfilling that contract. Move to a private static data member and you're outside that guarantee entirely. You've quietly handed that responsibility back to yourself.
Then there's the static initialization order fiasco, which is the whole reason the meyers singleton with a local static became canonical. Block local static initializes on first use, lazily, deterministically, thread safely. A static data member initializes at startup in an order that is undefined across translation units. If anything touches Instance() during its own static initialization from a different TU, you're in UB territory. The article doesn't mention this.
Real world singleton designs also need: deferred/configuration-driven initialization, optional instantiation, state recycling, controlled teardown. A block local static keeps those doors open. A static data member initializes unconditionally at startup, you've lost lazy-init, you've lost the option to not initialize it, and configuration based instantiation becomes awkward by design.
Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.
> Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.
There's a large group of engineers who are totally unaware of Amdahl's law and they are consequently obsessed with the performance implications of what are usually most non-important parts of the codebase.
I learned that being in the opposite group of people became (or maybe has been always) somewhat unpopular because it breaks many of the myths that we have been taught for years, and on top of which many people have built their careers. This article may or may not be an example of that. I am not reading too much into it but profiling and identifying the actual bottlenecks seems like a scarce skill nowadays.
You leveled up past a point a surprising number of people get stuck on essentially.
I feel likethe mindset you are describing is kind of this intermediate senior level. Sadly a lot of programmers can get stuck there for their whole career. Even worse when they get promoted to staff/principal level and start spreading dogma.
I 100 percent agree. If you can't show me a real world performance difference you are just spinning your wheels and wasting time.
Yes, I agree, and my experience is the same - there's just too many folks getting stuck in that mindset and never leaving it. Looking into the history I think software engineering domain has a lot of cargo-cult, which is somewhat surprising given that people who are naturally attracted to this domain are supposed to be critical thinkers. It turns out that this may not be true for most of the time. I know that I was also afoul of that but I learned my lesson.
On the flip side, it’s easy to get a bit stuck down the road by the mere fact that you have a singleton. Maybe you have amazing performance and very carefully managed safety, but you still have a single object that is inherently shared by all users in the same process, and it’s very very easy to end up regretting the semantic results. Been there, done that.
agreed. Strong emphasis on "profiling and identifying the actual bottleneck". Every benchmark will show a nested stack of performance offenders, but a solid interpretation requires a much deeper understanding of systems in general. My biggest aha moment yrs ago was when I realized that removing the function I was trying to optimize will still result in a benchmark output that shows top offenders and without going into too many details that minor perspective shift ended up paying dividends as it helped me rebuild my perspective on what benchmarks tell us.
Yeah ... and so it happens that this particular function in the profile is just a symptom, merely being an observation (single) data point of system behavior under given workload, and not the root cause for, let's say, load instruction burning 90% of the CPU cycles by waiting on some data from the memory, and consequently giving you a wrong clue about the actual code creating that memory bus contention.
I have to say that up until I grasped a pretty good understanding of CPU internals, memory subsystem, kernel, and generally the hardware, reading into the perf profiles was just a fun exercise giving me almost no meaningful results.
The fact that he calls the generated code good/bad without discussing the semantic differences tells that the original author doesn't really know what he is talking about. That seems problematic to me as he is selling c++ online course.
Yes definitely not dismissing the lock overhead, but I wanted to bring attention to the implicit false equivalence made in the post. That said, I am surprised the lock check was showing up and not the logging/formatting functions.
a real human. threads can exist before main() starts. for example, you can include another tu which happens to launch a thread and call instance(). Singletons used to be a headache before C++11 and it was common(maybe still is) to see macros in projects that expand to a singleton class definition to avoid common pitfalls.
It's a bit contrived, but a global with a nontrivial constructor can spawn a thread that uses another global, and without synchronization the thread can see an uninitialized or partially initialized value.
I haven't written C++ in a long time, but isn't the issue here that the initialization order of globals in different translation units is unspecified? Lazy initialization avoids that problem at very modest cost.
I liked using singletons back in the day, but now I simply make a struct with static members which serves the same purpose with less verbose code. Initialization order doesn't matter if you add one explicit (and also static) init function, or a lazy initialization check.
> A bit like how java people insisted on making naive getFoo() and setFoo() to pretend that was different from making foo public
But it's absolutely different and sometimes it really matters.
I primarily work with C# which has the "property" member type which is essentially a first-class language feature for having a get and set method for a field on a type. What's nice about C# properties is that you don't have to manually create the backing field and implement the logic to get/set it, but you still have the option to do it at a later time if you want.
When you compile C# code (I expect Java is the essentially same) which accesses the member of another class, the generated IL/Bytecode is different depending on whether you're accessing a field, property or method.
This means that if you later find it would be useful to intercept gets or updates to a field and add some additional logic for some reason (e.g. you want to now do lazy initialization), if you naively change the field to a method/property (even with the same name), existing code compiled against your original class will now fail at runtime with something like a "member not found" exception. Consumers of your library will be forced to recompile their code against your latest version for things to work again.
By having getters and setters, you have the option of changing things without breaking existing consumers of your code. For certain libraries or platforms, this is the practical difference between being stuck with certain (now undesirable) behaviour forever or trivially being able to change it.
Adding lots of code for the common case to support consumers of the code not recompiling for some uncommon potential future corner-cases seems like a bad deal.
In a product world where customers are building on your platform, requiring that they schedule time with their own developers to recompile everything in order to move to the latest version of your product is an opportunity to lose one or more of those paying customers.
These customers would also be quite rightfully annoyed when their devs report back to them that the extra work could have been entirely avoided if your own devs had done the industry norm of using setters/getters.
Maybe you're not a product but there are various other teams at your organization which use your library, now in order to go live you need to coordinate with various different teams that they also update their code so that things don't break. These teams will report to their PMs how this could have all been avoided if only you had used getters and setters, like the entire industry recommends.
Unless you're in a company with a single development team building a small system whose code would never be touched by anyone else, it's a good idea to do the setters/getters. And even then, what's true today might not be true years from now.
Focusing on micro-"optimizations" like this one do absolutely nothing for performance (how many times are you actually calling Instance() per frame?) and skips over the absolutely-mandatory PROFILE BEFORE YOU OPTIMIZE rule.
If a coworker asked me to review this CL, my comment would be "Why are you wasting both my time and yours?"
Among C++ programmers "Best performance" isn't about actual optimization it's a brag with about the same semantic value as for Broadway and so it's not measurable. The woman who wins a Tony for "Best performance in a musical" isn't measurably faster, or smaller, she's just "best" according to some panel. Did audiences like it? Did the musical make money? Doesn't matter, she won a "Best performance" Tony.
Only those that have C++ tatoos and their whole career bound to mastering a single language, selling that knowledge.
Other of us, use it as tool required to integrate with existing products, language runtimes, and SDKs written in C++, which most likely won't get replaced anytime soon.
Getting into the weeds of what a compiler does with your code is fun.
People have been doing micro optimisations since computers became a thing, you benefit from them every day without realising - and seemingly not appreciating - it.
> If a coworker asked me to review this CL, my comment would be "Why are you wasting both my time and yours?"
If a coworker submitted a patch to existing code, I'd be right there with you. If they submitted new code, and it just so happened to be using this more optimal strategy, I wouldn't blink twice before accepting it.
This is about constant vs dynamic initialization, not trivial vs nontrivial default construction. To be fair, the article doesn't claim this, but that's the comparison being made.
The standard allows to optimize away dynamic initialization, but AFAIK there are ABI implications of doing that, so compilers tend to not do that.
If you absolutely want to guarantee that a global is constant initialized, use "constinit" on the variable declarations too. It can also have some positive codegen effects on declarations of thread_locals.
Honestly the guard overhead is a non-issue in practice — it's one atomic check after first init. The real problem with the static data member approach is
initialization order across translation units. If singleton A touches singleton B during startup you get fun segfaults that only show up in release builds with a
different link order.
I ended up using std::call_once for those cases. More boilerplate but at least you're not debugging init order at 2am.
"it's one atomic check after first init" And that's slow :P [0] If you don't need to access it from multiple threads, cutting that out can mean a huge difference in a hot path.
Came here to say the same thing. Static is OK as long as the object has no dependencies but as soon as it does you're asking for trouble. Second the call_once approach. Another approach is an explicit initialization order system that ensures dependencies are set up in the right order, but that's more complex and only works for binaries you control.
A static class member function would be the same in your example. If you don't have any state to maintain, then that's fine. The reason people use singletons is to manage that state; it's easier to handle it all in a class instance. If you end up having to manage it in a function using globals or static instances somewhere you get the same issues. Also often the object exposed as a Singleton is not the only use of that object.
In your case SetResolution could be a static method calling a private instance-method SetResolutionImpln, for example, similar to what other people said.
> what's the point of globally visible singletons except "everything is an object" cargo-culting?
Having the singleton be an object becomes interesting when:
1) it contains attributes that themselves have non-trivial constructors and/or destructors. Order of initialization and destruction is guaranteed (init is in forward order of declaration, destruction in reverse order)
2) more rarely, inheritance (code reuse)
In the case of 1), you can just opt to construct the singleton on a sufficiently-aligned byte buffer in-place with `std::construct_at`. This gets rid of static-init order fiasco, __cxa bloat (if applicable), atexit bloat, and you can chose to just not call `std::destroy_at` if you don't need to.
In these two scenarios it's a lot more efficient to group many related objects into a bigger object.
Nice breakdown. I’m curious how often the guard check for a function-local static actually shows up in real profiles. In most codebases Instance() isn’t called in tight loops, so the safety of lazy initialization might matter more than a few extra instructions. Has anyone run into this being a real bottleneck in practice?
Globals are very useful in a lot of places - but they have significant downsides. Aviod them but when they are the best answer just
rquire approval of that design from a lot of smart people - to make sure you are not uverlooking something.
The performance observation is real but the two approaches are not equivalent, and the article doesn't mention what you're actually trading away, which is the part that matters.
The C++11 threadsafety guarantee on static initialization is explicitly scoped to block local statics. That's not an implementation detail, that's the guarantee.
The __cxa_guard_acquire/release machinery in the assembly is the standard fulfilling that contract. Move to a private static data member and you're outside that guarantee entirely. You've quietly handed that responsibility back to yourself.
Then there's the static initialization order fiasco, which is the whole reason the meyers singleton with a local static became canonical. Block local static initializes on first use, lazily, deterministically, thread safely. A static data member initializes at startup in an order that is undefined across translation units. If anything touches Instance() during its own static initialization from a different TU, you're in UB territory. The article doesn't mention this.
Real world singleton designs also need: deferred/configuration-driven initialization, optional instantiation, state recycling, controlled teardown. A block local static keeps those doors open. A static data member initializes unconditionally at startup, you've lost lazy-init, you've lost the option to not initialize it, and configuration based instantiation becomes awkward by design.
Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.
> Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.
There's a large group of engineers who are totally unaware of Amdahl's law and they are consequently obsessed with the performance implications of what are usually most non-important parts of the codebase.
I learned that being in the opposite group of people became (or maybe has been always) somewhat unpopular because it breaks many of the myths that we have been taught for years, and on top of which many people have built their careers. This article may or may not be an example of that. I am not reading too much into it but profiling and identifying the actual bottlenecks seems like a scarce skill nowadays.
You leveled up past a point a surprising number of people get stuck on essentially.
I feel likethe mindset you are describing is kind of this intermediate senior level. Sadly a lot of programmers can get stuck there for their whole career. Even worse when they get promoted to staff/principal level and start spreading dogma.
I 100 percent agree. If you can't show me a real world performance difference you are just spinning your wheels and wasting time.
Yes, I agree, and my experience is the same - there's just too many folks getting stuck in that mindset and never leaving it. Looking into the history I think software engineering domain has a lot of cargo-cult, which is somewhat surprising given that people who are naturally attracted to this domain are supposed to be critical thinkers. It turns out that this may not be true for most of the time. I know that I was also afoul of that but I learned my lesson.
On the flip side, it’s easy to get a bit stuck down the road by the mere fact that you have a singleton. Maybe you have amazing performance and very carefully managed safety, but you still have a single object that is inherently shared by all users in the same process, and it’s very very easy to end up regretting the semantic results. Been there, done that.
Worse, while shipping Electron crap is the other extreme, not everything needs to be written to fit into 64 KB or 16ms rendering frame.
Many times taking a few extra ms, or God forbid 1s, is more than acceptable when there are humans in the loop.
agreed. Strong emphasis on "profiling and identifying the actual bottleneck". Every benchmark will show a nested stack of performance offenders, but a solid interpretation requires a much deeper understanding of systems in general. My biggest aha moment yrs ago was when I realized that removing the function I was trying to optimize will still result in a benchmark output that shows top offenders and without going into too many details that minor perspective shift ended up paying dividends as it helped me rebuild my perspective on what benchmarks tell us.
Yeah ... and so it happens that this particular function in the profile is just a symptom, merely being an observation (single) data point of system behavior under given workload, and not the root cause for, let's say, load instruction burning 90% of the CPU cycles by waiting on some data from the memory, and consequently giving you a wrong clue about the actual code creating that memory bus contention.
I have to say that up until I grasped a pretty good understanding of CPU internals, memory subsystem, kernel, and generally the hardware, reading into the perf profiles was just a fun exercise giving me almost no meaningful results.
>Then there's the static initialization order fiasco
One of the reasons I hate constructors and destructors.
Explicit init()/deinit() functions are much better.
The fact that he calls the generated code good/bad without discussing the semantic differences tells that the original author doesn't really know what he is talking about. That seems problematic to me as he is selling c++ online course.
[dead]
Yes definitely not dismissing the lock overhead, but I wanted to bring attention to the implicit false equivalence made in the post. That said, I am surprised the lock check was showing up and not the logging/formatting functions.
[flagged]
a real human. threads can exist before main() starts. for example, you can include another tu which happens to launch a thread and call instance(). Singletons used to be a headache before C++11 and it was common(maybe still is) to see macros in projects that expand to a singleton class definition to avoid common pitfalls.
In fact, Windows 10+ now uses a thread pool during process init well before main is reached.
https://web.archive.org/web/20200920132133/https://blogs.bla...
It's a bit contrived, but a global with a nontrivial constructor can spawn a thread that uses another global, and without synchronization the thread can see an uninitialized or partially initialized value.
[flagged]
I haven't written C++ in a long time, but isn't the issue here that the initialization order of globals in different translation units is unspecified? Lazy initialization avoids that problem at very modest cost.
Yeah, that is part of it.
I liked using singletons back in the day, but now I simply make a struct with static members which serves the same purpose with less verbose code. Initialization order doesn't matter if you add one explicit (and also static) init function, or a lazy initialization check.
Yeah, I feel singletons are mostly a result of people learning globals are bad and wanting to pretend their global isn't a global.
A bit like how java people insisted on making naive getFoo() and setFoo() to pretend that was different from making foo public
> A bit like how java people insisted on making naive getFoo() and setFoo() to pretend that was different from making foo public
But it's absolutely different and sometimes it really matters.
I primarily work with C# which has the "property" member type which is essentially a first-class language feature for having a get and set method for a field on a type. What's nice about C# properties is that you don't have to manually create the backing field and implement the logic to get/set it, but you still have the option to do it at a later time if you want.
When you compile C# code (I expect Java is the essentially same) which accesses the member of another class, the generated IL/Bytecode is different depending on whether you're accessing a field, property or method.
This means that if you later find it would be useful to intercept gets or updates to a field and add some additional logic for some reason (e.g. you want to now do lazy initialization), if you naively change the field to a method/property (even with the same name), existing code compiled against your original class will now fail at runtime with something like a "member not found" exception. Consumers of your library will be forced to recompile their code against your latest version for things to work again.
By having getters and setters, you have the option of changing things without breaking existing consumers of your code. For certain libraries or platforms, this is the practical difference between being stuck with certain (now undesirable) behaviour forever or trivially being able to change it.
Adding lots of code for the common case to support consumers of the code not recompiling for some uncommon potential future corner-cases seems like a bad deal.
Recompiling isn't that hard usually.
In a product world where customers are building on your platform, requiring that they schedule time with their own developers to recompile everything in order to move to the latest version of your product is an opportunity to lose one or more of those paying customers.
These customers would also be quite rightfully annoyed when their devs report back to them that the extra work could have been entirely avoided if your own devs had done the industry norm of using setters/getters.
Maybe you're not a product but there are various other teams at your organization which use your library, now in order to go live you need to coordinate with various different teams that they also update their code so that things don't break. These teams will report to their PMs how this could have all been avoided if only you had used getters and setters, like the entire industry recommends.
Unless you're in a company with a single development team building a small system whose code would never be touched by anyone else, it's a good idea to do the setters/getters. And even then, what's true today might not be true years from now.
It's generally good practice for a reason.
This is... not an example of good optimization.
Focusing on micro-"optimizations" like this one do absolutely nothing for performance (how many times are you actually calling Instance() per frame?) and skips over the absolutely-mandatory PROFILE BEFORE YOU OPTIMIZE rule.
If a coworker asked me to review this CL, my comment would be "Why are you wasting both my time and yours?"
Optimizing requires a (performane) problem, and often needs a benchmark.
In my view, the article is not about optimizing, but about understanding how things work under the hood. Which is interesting for some.
Among C++ programmers "Best performance" isn't about actual optimization it's a brag with about the same semantic value as for Broadway and so it's not measurable. The woman who wins a Tony for "Best performance in a musical" isn't measurably faster, or smaller, she's just "best" according to some panel. Did audiences like it? Did the musical make money? Doesn't matter, she won a "Best performance" Tony.
Only those that have C++ tatoos and their whole career bound to mastering a single language, selling that knowledge.
Other of us, use it as tool required to integrate with existing products, language runtimes, and SDKs written in C++, which most likely won't get replaced anytime soon.
How is performance not measurable?
Getting into the weeds of what a compiler does with your code is fun.
People have been doing micro optimisations since computers became a thing, you benefit from them every day without realising - and seemingly not appreciating - it.
[dead]
> If a coworker asked me to review this CL, my comment would be "Why are you wasting both my time and yours?"
If a coworker submitted a patch to existing code, I'd be right there with you. If they submitted new code, and it just so happened to be using this more optimal strategy, I wouldn't blink twice before accepting it.
Regarding the user-defined default constructor: making it constexpr makes it generate the same code as before:
https://compiler-explorer.com/z/Tsbz7nd44
This is about constant vs dynamic initialization, not trivial vs nontrivial default construction. To be fair, the article doesn't claim this, but that's the comparison being made.
The standard allows to optimize away dynamic initialization, but AFAIK there are ABI implications of doing that, so compilers tend to not do that.
If you absolutely want to guarantee that a global is constant initialized, use "constinit" on the variable declarations too. It can also have some positive codegen effects on declarations of thread_locals.
Honestly the guard overhead is a non-issue in practice — it's one atomic check after first init. The real problem with the static data member approach is initialization order across translation units. If singleton A touches singleton B during startup you get fun segfaults that only show up in release builds with a different link order.
I ended up using std::call_once for those cases. More boilerplate but at least you're not debugging init order at 2am.
"it's one atomic check after first init" And that's slow :P [0] If you don't need to access it from multiple threads, cutting that out can mean a huge difference in a hot path.
[0] https://stackoverflow.com/questions/51846894/what-is-the-per...
Came here to say the same thing. Static is OK as long as the object has no dependencies but as soon as it does you're asking for trouble. Second the call_once approach. Another approach is an explicit initialization order system that ensures dependencies are set up in the right order, but that's more complex and only works for binaries you control.
AI. Probably clawdbot.
Meyer's implementation (with static block variable) is elegant and thread safe, and retains lazy initialization which can be important for the initialization order. https://www.modernescpp.com/index.php/thread-safe-initializa...
i am not sure why this entire article is warranted :o) just use `std::call_once` and you are all set.
It is strange to use lightdm and gdm as examples, which are both written in C (if nothing has changed recently).
Tbh, I never understood the point of singletons in C++ versus just having a namespace with functions in it, e.g. instead of
...just this: ...since it's a singleton, the state only exists once anyway so there's no point in wrapping it in an object.E.g. what's the point of globally visible singletons except "everything is an object" cargo-culting?
Its brain damage from people coming from Java.
Just put your state in an anonymous namespace in the implementation file.
A static class member function would be the same in your example. If you don't have any state to maintain, then that's fine. The reason people use singletons is to manage that state; it's easier to handle it all in a class instance. If you end up having to manage it in a function using globals or static instances somewhere you get the same issues. Also often the object exposed as a Singleton is not the only use of that object.
In your case SetResolution could be a static method calling a private instance-method SetResolutionImpln, for example, similar to what other people said.
> what's the point of globally visible singletons except "everything is an object" cargo-culting?
Having the singleton be an object becomes interesting when:
1) it contains attributes that themselves have non-trivial constructors and/or destructors. Order of initialization and destruction is guaranteed (init is in forward order of declaration, destruction in reverse order)
2) more rarely, inheritance (code reuse)
In the case of 1), you can just opt to construct the singleton on a sufficiently-aligned byte buffer in-place with `std::construct_at`. This gets rid of static-init order fiasco, __cxa bloat (if applicable), atexit bloat, and you can chose to just not call `std::destroy_at` if you don't need to.
In these two scenarios it's a lot more efficient to group many related objects into a bigger object.
Nice breakdown. I’m curious how often the guard check for a function-local static actually shows up in real profiles. In most codebases Instance() isn’t called in tight loops, so the safety of lazy initialization might matter more than a few extra instructions. Has anyone run into this being a real bottleneck in practice?
Singletons are global variables in a suit. Don‘t use them.
Globals are very useful in a lot of places - but they have significant downsides. Aviod them but when they are the best answer just rquire approval of that design from a lot of smart people - to make sure you are not uverlooking something.