The bug in the program reveals a poor understanding of object lifecycles by whoever wrote it. The `obj` argument to `simple` is not globally unique and so it makes a poor location to store global state information (a count of how often `simple` is called, in this example).
Never tie global state information to ephemeral objects whose lifetime may be smaller than what you want to track. In this case, they want to know how many times `simple` is called across the program's lifetime. Unless you can guarantee the `obj` argument or its `counter` member exists from before the first call to `simple` and through the last call to `simple` and is the only `obj` to ever be passed to `simple`, it is the wrong place to put the count information. And with those guarantees, you may as well remove `obj` as a parameter to both `simple` and `complex` and just treat it as a global.
State information needs to exist in objects or locations that last as long as that state information is relevant, no more, no less. If the information is about the overall program lifecycle, then a global can make sense. If you only need to know how many times `simple` was invoked with a particular `obj` instance, then tie it to the object passed in as the `obj` argument.
> it is the wrong place to put the count information.
I'd argue this is the case regardless of lifetime. It's trying to squash two unrelated things into one object and should have been two different arguments.
Way more obvious if "obj" is replaced with some example object instead of an empty one:
let person = { name: "Foo Bar", age: 30, counter: counter };
My JS is terrible, but it seems like once you make the counter a global variable it is just better to change it to have an atomic dedicated count function. So instead of incrementing the counter in simple, a globalCount() function gets called that isolates the state. Something like
{
let i = 0;
var counter = function (){
console.log(++i);
}
}
Then call counter() to count & log and document that something horrible and stateful is happening. I wouldn't call that a global variable although the article author disagrees.
Because there is no intent, need or reason to vary it.
Nearly everything in RAM is technically a global variable to someone who is keen enough. Programming depends on a sort-of honour system not to treat things like variables if it is hinted that they shouldn't and it doesn't make sense to. Otherwise we can all start making calls to scanmem & friends.
Could you tell me where this was posted? I thought no one would see this after I got no comments the first day
No one I showed this to complained about the first example but online many people did. I wrote a completely different article which I think is much better that uses examples I would have used in the follow up. I'll post that article next week
Second chance pool. This post is, per your submission history, from 2 days ago. HN has a second chance pool that lets articles continue collecting upvotes (more easily than digging through the full history). Some of those articles will get their timestamp updated to artificially inflate their ranking. This brings them to the front page again and gives them their "second chance". After a few hours or whatever time, the timestamp is reverted and they'll start falling back into their natural place in the rankings.
In this case, yes. Its scope should be the lowest necessary scope. Does JS provide static variables in functions? If not, then that forces you to lift it to some other scope like file or module scope or the surrounding function or class if that's viable.
I think this article broadens the definition of global variable and then says "Look, the things I added to the definition aren't bad, so global variables aren't always bad."
If you just look at what people normally mean by global variable, then I don't think the article changes minds on that.
From the article> "Static Function Variable: In C inside a function, you can declare a static variable. You could consider this as a local variable but I don't since it's on the heap, and can be returned and modified outside of the functions. I absolutely avoid this except as a counter whose address I never return."
These variables are not on the heap. They are statically allocated. Their visibility is the only thing that differentiates them from global variables and static variables defined outside of functions.
I think such variables can be useful, if you need a simple way of keeping some persistent state within a function. Of course it's more state you are carrying around, so it's hard to defend in a code review as best practice.
Amusingly, you can modify such variables from outside the function, simply by getting your function to provide the modifying code with a pointer to the variable, eg by returning the variable's address. If you do that though you're probably creating a monster. In contrast I think returning the value of the static variable (which the author casts shade on in the quote above) seems absolutely fine.
Edit: I should have stated that the number one problem with this technique is that it's absolutely not thread safe for obvious reasons.
> These variables are not on the heap. They are statically allocated. Their visibility is the only thing that differentiates them from global variables and static variables defined outside of functions.
In C++, there is a difference: function-static variables get initialized when control first passes into the function. That difference matters in cases where the initial value (partly) comes from a function call or from another global. For example, in C++ one can do:
int bar = -1;
void foo() {
static char * baz = malloc(bar);
…
}
void quux() {
bar = 13;
}
That’s fine as long as the code calls quux before ever calling foo.
They added that feature not to support such monstrosities, but because they wanted to support having static variables of class type.
If they were to design that today, I think they would require the initialization expression to be constexpr.
Just in case anyone still doesn't understand what that means to be statically allocated. It means that they are allocated in the Data Segment, which is a separate area of virtual memory from the stack and the heap.
static variable addresses are an extremely important tool in static analysis, for proving certain program invariants hold. In C, a const static often will be able to tell you more about the structure of your code at runtime than a #define macro ever could!
though unless you're programming extremely defensively (eg trying to thwart a nation state), I see no reason why you would use them at runtime
Global variables (in languages where they otherwise make sense and don't have footguns at initialization and whatnot) have two main problems:
1. They work against local reasoning as you analyze the code
2. The semantic lifetime for a bundle of data is rarely actually the lifetime of the program
The second of those is easy to guard against. Just give the bundle of data a name associated with its desired lifetime. If you really only need one of those lifetimes then globally allocate one of them (in most languages this is as cheap as independently handling a bunch of globals, baked into the binary in a low-level language). If necessary, give it a `reset()` method.
The first is a more interesting problem. Even if you bundle data into some sort of `HTTPRequest` lifetime or whatever, the fact that it's bundled still works against local reasoning as you try to use your various counters and loggers and what have you. It's the same battle between implicit and explicit parameters we've argued about for decades. I don't have any concrete advice, but anecdotally I see more bugs from biggish collections of heterogeneous data types than I do from passing everything around manually (just the subsets people actually need).
I don't think #1 is necessarily true. Take a common case for a global variable, a metric your app is exposing via prometheus. You have some global variable representing its value. Libraries like to hide the global variable sometimes with cuteness like MetricsRegistey.get_metric("foo") but they're globally readable and writable state. And in your function you do a little metric.observe_event() to increment your counter. I think having this value global helps reasoning because the alternative is going to be a really clunky plumbing of the variable down the stack.
Of course #1 is not necessarily true, it depends on one's coding style, and using globals for application-scoped services like logging/metrics is tentatively fine... although I also think that if we're going to dedicate globals almost exclusively to this use, they probably should have dynamic scoping.
On the other hand, I have seen quite a lot of parsing/compilers' code from the seventies and eighties and let me tell you: for some reason, having interfaces between lexer and parser, or lexer and buffered reader, or whatever else to be "call void NextToken(void), it updates global variables TokenKind, TokenNumber, TokenText to store the info about the freshly produced token" was immensely popular. This has gotten slightly less popular but even today e.g. Golang's scanner has method next() that updates scanner's "current token"-related fields and returns nothing. I don't know why: I've written several LL(1) recursive-descent parsers that explicitly pass the current token around the parsing functions and it works perfectly fine.
It helps with reasoning in some sense, and the net balance might be positive in how much reasoning it enables, but it definitely hurts local reasoning. You need broader context to be able to analyze whether the function is correct (or even what it's supposed to be doing if you don't actively work to prevent entropic decay as the codebase changes). You can't test that function without bringing in that outer context. It (often) doesn't naturally compose well with other functions.
I find the concept of a context structure passed as the first parameter to all your functions with all your "globals" to be very compelling for this sort of stuff.
This is very similar to dependency injection. Separating state and construction from function or method implementation makes things a lot easier to test. In my opinion it's also easier to comprehend what the code actually does.
And do you always know beyond any reasonable doubt that your code will be single-threaded for all time? Because the moment this changes, you're in for a world of pain.
Wrapping the globals into to a struct context #ifdef MULTI-THREADED and adding this ctx for each call as first call is a matter of minutes. I've done this multiple times.
Much worse is protecting concurrent writes, eg to an object or hash table
If the discussion here is meant to be solely about JavaScript, then I'll happily consider all my comments in this thread to be obsolete, since I don't have particularly strong opinions about that language since I don't use it a lot.
I was under the impression that many people here were discussing the usage of global variables more generally, though.
That's exactly why I used this specific example. I seen many code bases that use clone to avoid mutation problems so I wrote this specifically to show it can become a problem too.
I wrote a better article on globals. I plan on posting it next week
This seems more an issue with not understanding structuralClone, than one of understanding globals or lack thereof. There’s nothing wrong with the example, it does exactly what the code says it should — if you want counter to be “global” then structuralClone isn’t the function you want to call. The bug isn’t in how counter was in obj, the bug is in calling structuralClone when its behaviour wasn’t wanted.
With that said, it seems obvious that if you want to globally count the calls, then that count shouldn’t live in an argument where you (the function) don’t control its lifetime or how global it actually is. Simple has no say over what object obj.counter points to, it could trivially be a value type passed into that particular call, so if you know you want a global count then of course storing it in the argument is the wrong choice.
Global has two conflated meanings: global lifetime (ie lifetime of the whole program) and global access (which the article states). Simple needs global lifetime but not global access.
You rarely ”need” global access, although for things like a logger it can be convenient. Often you do need global lifetime.
That just seems like globals with extra steps. Suddenly if your context structure has a weird value in it, you’ll have to check every function to see who messed it up.
First, that's true for globals as well. Second, with "context structure" pattern, the modifications to it are usually done by copying this structure, modifying some fields in the copy and passing the copy downwards, which severely limits the impact radius and simplifies tracking down which function messed it up: it's either something above you in the call stack or one of the very few (hopefully) functions that changes this context by-reference, with intent to apply such changes globally.
If I have 500 functions, I don't want to extrapolate out the overhead of passing a state object around to all of them. That's a waste of effort, and frankly makes me think you want to code using an FP paradigm even in imperative languages.
Module-level and thread-level "globals" are fine. You gain nothing (other than some smug ivory tower sense of superiority) by making your functions pure and passing around a global state object to every single method invocation.
If that’s so useful, make your language support the concept of lexical environments instead. Otherwise it’s just manual sunsetting every day of week. Our craft is full of this “let’s pretend we’re in a lisp with good syntax” where half of it is missing, but fine, we’ll simulate it by hand. Dirt and sticks engineering.
(To be clear, I’m just tangentially ranting about the state of things in general, might as well post this under somewhere else.)
You can have it either way, it’s not for you but for people who disagree with what they deem a preference that is the only option when there’s no alternative.
I got into this argument with my former coworkers. Huge legacy codebase. Important information (such as the current tenant of our multi-tenant app) was hidden away in thread-local vars. This made code really hard to understand for newcomers because you just had to know that you'd have to set certain variables before calling certain other functions. Writing tests was also much more difficult and verbose. None of these preconditions were of course documented. We started getting into more trouble once we started using Kotlin coroutines which share threads between each other. You can solve this (by setting the correct coroutine context), but it made the code even harder to understand and more error-prone.
I said we should either abolish the thread-local variables or not use coroutines, but they said "we don't want to pass so many parameters around" and "coroutines are the modern paradigm in Kotlin", so no dice.
You know what helps manage all this complexity and keep the state internally and externally consistent?
Encapsulation. Provide methods for state manipulation that keep the application state in a known good configuration. App level, module level or thread level.
Use your test harness to control this state.
If you take a step back I think you’ll realize it’s six of one, half dozen of the other. Except this way doesn’t require manually passing an object into every function in your codebase.
These methods existed. The problem was that when you added some code somewhere deep down in layers of layers of business code, you never knew whether the code you'd call would need to access that information or whether it had already previously been set.
Hiding state like that is IMHO just a recipe for disaster. Sure, if you just use global state for metrics or something, it may not be a big deal, but to use it for important business-critical code... no, please pass it around, so I can see at a glance (and with help from my compiler) which parts of the code need what kind of information.
If your global state contains something that runs in prod but should not run in a testing environment (e.g. a database connection), your global variable based code is now untestable.
Dependency Injection is popular for a very good reason.
Just need to make sure your module doesn't get too big or unwieldy. I work in a codebase with some "module" C files with a litany of global statics and it's very difficult to understand the possible states it can be in or test it.
I agree that so long as overall complexity is limited these things can be OK. As soon as you're reading and writing a global in multiple locations though I would be extremely, extremely wary.
I did not say you are targeting OP. I meant that you are degrading your parent commenter.
This:
"You gain nothing (other than some smug ivory tower sense of superiority) by making your functions pure and passing around a global state object to every single method invocation."
...is neither productive nor actually true. But I'll save the latter part for your other reply.
I could not initially reply to you. Your comment rubbed me the wrong way, because I had no intention of trying to degrade anyone, and frankly, I was offended. But I thought better of my hasty and emotional response. I would rather take a deep breath, re-focus, re-engage, and be educated in a thoughtful dialog than get into a mud slinging contest. I am always willing to be enlightened.
A tip, in your profile you can set a delay which is a number of minutes before your comments will become visible to other people. Mine is set to 2 right now. This gives you time to edit your comment (helpful for some longer ones) but also to write some garbage response and then think better and delete it before anyone's the wiser.
It's also helpful to give you time to re-read a mostly good, but maybe not polite, response and tone down your response.
Of course in a real life program, it may be lost in other code logic and, most importantly, the function performing the clone may not be so explicit about it (e.g. an "update" function that returns a different copy of the object).
Moving state into the global namespace and accessing it directly from that space makes it much more difficult to test, instrument and integrate.
Sure, if you're building disposable toy software, do whatever is easiest. But if you're building software for others to use, at least provide a context struct and pass that around when you can.
For those cases where this is challenging or impossible, please sequence your application or library initialization so that these globals are at least fungible/assignable at runtime.
Most programming languages written after 1990 let you initialize global variables lazily. The main problem is that the initialization order might be unexpected or you may run into conflicts. Singletons make the order slightly more predictable (based on first variable access), although it is till implicit (lazy).
But singletons are still a terrible idea. The issue with global variables is not just initialization. I would argue it's one of the more minor issues.
The major issues with global variables are:
1. Action at distance (if the variable is mutable)
2. Tight coupling (global dependencies are hard-coded, you cannot use dependency injection)
3. Hidden dependency. The dependency is not only tightly-coupled, it is also hidden from the interface. Client code that calls your function doesn't know that you rely on a global variable and that you can run into conflict with other code using that variable or that you may suddenly start accessing some database it didn't even know about.
Singleton does not solve any of the above. The only thing it ensures is lazy initialization on access.
Because as a programmer I have responsibility for the technical soundness of the program, and I don't create threads haphazardly.
> when i thought i understood multi-threaded code, but didn't.
All the more reason to carefully plan and limit shared state among threads. It's hard enough to get right when you know where the problems are and impossible if you spray and pray with mutexes.
Yeah but just because you do the right thing doesn’t mean that others will. That one person that creates threads haphazardly will wreak havoc and at scale this will happen. It’s an unstable equilibrium to put the onus of program soundness on the contributors to a program.
> That one person that creates threads haphazardly will wreak havoc
What if someone comes along and starts adding threads and doesn't check what they are accessing? And doesn't read the documented invariants of the design?
Well I don't think any project can succeed if that's the level disorganization.
Are there certain kinds of bugs that are easy to regress last minute? Yes. A brand new thread without limited state is not one of them.
> put the onus of program soundness on the contributors to a program.
Who then is responsible for making sure the program is correct?
> Who then is responsible for making sure the program is correct?
I’d say this is mostly a function of the language or framework.
After that, it’s up to the tech leads to provide access patterns/examples that align with the needs of a particular project.
My point is not so much that you shouldn’t ever think about the implications of your code, just that contributors are humans and any project with enough contributors will have a mix of contributors with different knowledge, experience and skill sets. If you are leading one of those, it would behoove you to make doing the right thing easy for later contributors.
The burden is on the programmer adding a new thread to know what they can safely access.
The conclusion of your argument looks like 2000s Java - throw a mutex on every property because you never know when it will need to be accessed on a thread.
Designs that spread complexity rather than encapsulate it are rarely a good idea.
I agree that dependent sequences of events, coordinated through a global are bad. But there are other usages which are not error prone. For example an allocation, logger, readonly settings, or a cache.
You may hate my article next week, it's meant to replace this article. If you want you can email me for early access and tell me how I can improve the article. Lets say you can guess my email if you're emailing the right domain
Any program that uses a database has a very similar problem to global variables.
As Gilad Bracha has pointed out, types are antimodular, and your database schema can be considered one giant type that pervades your program, just like globals can be.
I don't think we have tools to compositionally solve this, across different programming languages.
I think the author forgot the most useful use case for globals, and that is variables that has to do with the context the program is running under such as command line arguments and environment variables (properly validated and if needed escaped).
mutable global variables are intrinsically incompatible with a multithreaded environment. Having mutable shared state is never the right solution, unless you basically can live with data races. And that's before taking maintainability in consideration too
Global variables are fine when you read many times, but you write to them sparingly and when they are updated but the old value is read/used you have mechanisms to handle errors and retry
> The problem is data access. Nothing more, nothing less. There's a term for this that has nothing to do with global variables: "action at a distance."
I mean yes, using global variables is just one of the ways to cause action-at-a-distance and that is... apparently a big reveal?
Otherwise sure, there is no pattern that cannot be utilized 100% correctly and without introducing bugs. Theoretically. Now let's look at the practical aspects and how often indiscriminately using such tempting patterns like global variables -- and mutexes-when-we-are-not-sure and I-will-remember-not-to-mutate-through-this-pointer -- lead to bugs down the road.
The answer is: fairly often.
IMO the article would be better titled as "There is no pattern that a lazy or careless programmer cannot use to introduce a bug".
Globals are essential to track state/values of variables in intermediate step, for debug,
I create unique object on globals and store intermediate variables in it, so i can inspect values when something goes wrong, being only one unique object to maintain it will not override any existing vars thus not affecting existing programs
What do you suppose is the recommended way to use the encapsulation? ;) This is partly why there's a "defining global variables" section, I know people will will consider some usage as not using a global
> What do you suppose is the recommended way to use the encapsulation? ;) This is partly why there's a "defining global variables" section, I know people will will consider some usage as not using a global
What the post discusses can be distilled into the concept of Referential Transparency[0]:
A linguistic construction is called referentially
transparent when for any expression built from it,
replacing a subexpression with another one that denotes the
same value[b] does not change the value of the expression.
Any construct which is not referentially transparent can be classified as potentially effectual, which is not a Bad Thing(TM) as if there were no observable effects of a program, it would be useless.
What makes programs easier to prove correct and reason about is when effects are localized to well-defined areas often "at the edges" of the architecture.
What makes programs impossible to prove correct and very difficult to reason about is when effectual logic is pervasive throughout, such as most of the "Global Variable Use Cases" the post advocates.
I've recently found it helpful to think of problems at the scope of an individual process, rather than a set of functions in a library or framework. This makes it much clearer when a global variable makes sense or not, and generally where to initialize things, and place state.
> The problem is data access. Nothing more, nothing less.
I agree with this, but the problem with global variables is precisely that they make bad data access patterns look easy and natural. Speaking from experience, it’s a lot easier to enforce a “no global variables” rule than explain to a new graduate why you won’t allow them to assign a variable in module X even though it’s OK in module Y.
You might like the article I wrote for next week. Could you tell me where this post is linked from? I didn't think anyone would see this when no one commented the first day
Agreed, global variables are fine up to a certain scale, especially if they're only used in the main file. They only really become a problem if you start modifying them from inside different files.
The real underlying problem is 'Spooky action at a distance'; it's not an issue that is specific to global variables. If you pass an instance between components by-reference and its properties get modified by multiple components, it can become difficult to track where the state changes originated and that can create very nasty, difficult-to-reproduce bugs. So this can happen even if your code is fully modularized; the issue is that passing instances by reference means that the properties of that instance behave similarly to global variables as they can be modified by multiple different components/files (without a single component being responsible for it).
That's partly where the motivation for functional programming comes from; it forces pass-by-value all the time to avoid all possibility of mutations. The core value is not unique to FP though; it comes from designing components such that they have a simple interface which requires mostly primitive types as parameters. Passing objects is OK too, so long as these objects only represent structured information and their references aren't being held onto for future transformation.
So for example, you can let components fully encapsulate all the 'instances' which they manage and only give those parent components INFORMATION about what they have to do (without trying to micromanage their child instances); I avoid passing instances or modules to each other as it generally indicates a leaky abstraction.
Sometimes it takes some creativity to find a solution which doesn't require instance-passing but when you find such solution, the benefits are usually significant and lasting. The focus should be on message-passing. Like when logging, the code will be easier to follow if all the errors from all the components bubble up to the main file (e.g. via events, streams, callbacks...) and are logged inside the main file because then any developer debugging the code can find the log inside the main file and then trade it down to its originating component.
Methods should be given information about what to do, they should not be given the tools to do their job... Like if you catch a taxi in real life, you don't bring a jerrycan of petrol and a steering wheel with you to give to the taxi driver. You just provide them with information; the address of your desired destination. You trust that the Taxi driver has all the tools they need to do the job.
If you do really want to pass an instance to another instance to manage, then the single-responsibility principle helps limit the complexity and possibility for spooky action. It should only be passed once to initialize and then the receiving component needs to have full control/responsibility for that child. I try to avoid as much as possible though.
What does the ideal look like then? Data storage encapsulation in the app? Perhaps different DB users with granular access, accessed from different non-global parts of the program. Chuck in some views later when you need performance pragmatism!
Man, the bugs they prevent. Vs everyone rolling their own multithreaded file reader/writer code in C. How many programmers would think to journal transactions and ship them for backup for example.
SQL or 15000 lines of C++ or Go or Rust doing whatever with files.
Well personally I favour event sourcing and so in my systems the SQL database (if there is one) is only written to by the event rollup component. But even if you're not that extreme you probably want to have some clear structure to where your database writes happen rather than randomly writing it from any arbitrary line of code. The important thing is to have a structure that everyone working on the code understands so that you all know where any write to the database must have come from, not the specifics of what that structure is.
I have gotten shit before for using global variables, and sometimes that is justified, but I almost never see anyone given shit over treating Redis as a big ol’ global map.
The bug in the program reveals a poor understanding of object lifecycles by whoever wrote it. The `obj` argument to `simple` is not globally unique and so it makes a poor location to store global state information (a count of how often `simple` is called, in this example).
Never tie global state information to ephemeral objects whose lifetime may be smaller than what you want to track. In this case, they want to know how many times `simple` is called across the program's lifetime. Unless you can guarantee the `obj` argument or its `counter` member exists from before the first call to `simple` and through the last call to `simple` and is the only `obj` to ever be passed to `simple`, it is the wrong place to put the count information. And with those guarantees, you may as well remove `obj` as a parameter to both `simple` and `complex` and just treat it as a global.
State information needs to exist in objects or locations that last as long as that state information is relevant, no more, no less. If the information is about the overall program lifecycle, then a global can make sense. If you only need to know how many times `simple` was invoked with a particular `obj` instance, then tie it to the object passed in as the `obj` argument.
> it is the wrong place to put the count information.
I'd argue this is the case regardless of lifetime. It's trying to squash two unrelated things into one object and should have been two different arguments.
Way more obvious if "obj" is replaced with some example object instead of an empty one:
I like the diagnosis.
My JS is terrible, but it seems like once you make the counter a global variable it is just better to change it to have an atomic dedicated count function. So instead of incrementing the counter in simple, a globalCount() function gets called that isolates the state. Something like
Then call counter() to count & log and document that something horrible and stateful is happening. I wouldn't call that a global variable although the article author disagrees.How is a globally-scoped closure not a global variable?
Because there is no intent, need or reason to vary it.
Nearly everything in RAM is technically a global variable to someone who is keen enough. Programming depends on a sort-of honour system not to treat things like variables if it is hinted that they shouldn't and it doesn't make sense to. Otherwise we can all start making calls to scanmem & friends.
A function isn't a variable.
A function can absolutely be a variable. This isn't even an anonymous closure.
Could you tell me where this was posted? I thought no one would see this after I got no comments the first day
No one I showed this to complained about the first example but online many people did. I wrote a completely different article which I think is much better that uses examples I would have used in the follow up. I'll post that article next week
Second chance pool. This post is, per your submission history, from 2 days ago. HN has a second chance pool that lets articles continue collecting upvotes (more easily than digging through the full history). Some of those articles will get their timestamp updated to artificially inflate their ranking. This brings them to the front page again and gives them their "second chance". After a few hours or whatever time, the timestamp is reverted and they'll start falling back into their natural place in the rankings.
https://news.ycombinator.com/submitted?id=levodelellis
https://news.ycombinator.com/lists
https://news.ycombinator.com/pool
The counter should be declared as static inside the function, thus limiting is scope and avoiding pollution of the global namespace.
In this case, yes. Its scope should be the lowest necessary scope. Does JS provide static variables in functions? If not, then that forces you to lift it to some other scope like file or module scope or the surrounding function or class if that's viable.
> Does JS provide static variables
:-)I think this article broadens the definition of global variable and then says "Look, the things I added to the definition aren't bad, so global variables aren't always bad."
If you just look at what people normally mean by global variable, then I don't think the article changes minds on that.
From the article> "Static Function Variable: In C inside a function, you can declare a static variable. You could consider this as a local variable but I don't since it's on the heap, and can be returned and modified outside of the functions. I absolutely avoid this except as a counter whose address I never return."
These variables are not on the heap. They are statically allocated. Their visibility is the only thing that differentiates them from global variables and static variables defined outside of functions.
I think such variables can be useful, if you need a simple way of keeping some persistent state within a function. Of course it's more state you are carrying around, so it's hard to defend in a code review as best practice.
Amusingly, you can modify such variables from outside the function, simply by getting your function to provide the modifying code with a pointer to the variable, eg by returning the variable's address. If you do that though you're probably creating a monster. In contrast I think returning the value of the static variable (which the author casts shade on in the quote above) seems absolutely fine.
Edit: I should have stated that the number one problem with this technique is that it's absolutely not thread safe for obvious reasons.
> These variables are not on the heap. They are statically allocated. Their visibility is the only thing that differentiates them from global variables and static variables defined outside of functions.
In C++, there is a difference: function-static variables get initialized when control first passes into the function. That difference matters in cases where the initial value (partly) comes from a function call or from another global. For example, in C++ one can do:
That’s fine as long as the code calls quux before ever calling foo.They added that feature not to support such monstrosities, but because they wanted to support having static variables of class type.
If they were to design that today, I think they would require the initialization expression to be constexpr.
> If they were to design that today, I think they would require the initialization expression to be constexpr.
Why would they?
This (non-constexpr) semantic would be useful as lazy initialization. .. and C++ love tiny little features like these.
Just in case anyone still doesn't understand what that means to be statically allocated. It means that they are allocated in the Data Segment, which is a separate area of virtual memory from the stack and the heap.
static variable addresses are an extremely important tool in static analysis, for proving certain program invariants hold. In C, a const static often will be able to tell you more about the structure of your code at runtime than a #define macro ever could!
though unless you're programming extremely defensively (eg trying to thwart a nation state), I see no reason why you would use them at runtime
Could you expand on this? I don't understand the point you're making, or how it's useful.
Global variables (in languages where they otherwise make sense and don't have footguns at initialization and whatnot) have two main problems:
1. They work against local reasoning as you analyze the code
2. The semantic lifetime for a bundle of data is rarely actually the lifetime of the program
The second of those is easy to guard against. Just give the bundle of data a name associated with its desired lifetime. If you really only need one of those lifetimes then globally allocate one of them (in most languages this is as cheap as independently handling a bunch of globals, baked into the binary in a low-level language). If necessary, give it a `reset()` method.
The first is a more interesting problem. Even if you bundle data into some sort of `HTTPRequest` lifetime or whatever, the fact that it's bundled still works against local reasoning as you try to use your various counters and loggers and what have you. It's the same battle between implicit and explicit parameters we've argued about for decades. I don't have any concrete advice, but anecdotally I see more bugs from biggish collections of heterogeneous data types than I do from passing everything around manually (just the subsets people actually need).
I don't think #1 is necessarily true. Take a common case for a global variable, a metric your app is exposing via prometheus. You have some global variable representing its value. Libraries like to hide the global variable sometimes with cuteness like MetricsRegistey.get_metric("foo") but they're globally readable and writable state. And in your function you do a little metric.observe_event() to increment your counter. I think having this value global helps reasoning because the alternative is going to be a really clunky plumbing of the variable down the stack.
Of course #1 is not necessarily true, it depends on one's coding style, and using globals for application-scoped services like logging/metrics is tentatively fine... although I also think that if we're going to dedicate globals almost exclusively to this use, they probably should have dynamic scoping.
On the other hand, I have seen quite a lot of parsing/compilers' code from the seventies and eighties and let me tell you: for some reason, having interfaces between lexer and parser, or lexer and buffered reader, or whatever else to be "call void NextToken(void), it updates global variables TokenKind, TokenNumber, TokenText to store the info about the freshly produced token" was immensely popular. This has gotten slightly less popular but even today e.g. Golang's scanner has method next() that updates scanner's "current token"-related fields and returns nothing. I don't know why: I've written several LL(1) recursive-descent parsers that explicitly pass the current token around the parsing functions and it works perfectly fine.
It helps with reasoning in some sense, and the net balance might be positive in how much reasoning it enables, but it definitely hurts local reasoning. You need broader context to be able to analyze whether the function is correct (or even what it's supposed to be doing if you don't actively work to prevent entropic decay as the codebase changes). You can't test that function without bringing in that outer context. It (often) doesn't naturally compose well with other functions.
I find the concept of a context structure passed as the first parameter to all your functions with all your "globals" to be very compelling for this sort of stuff.
https://en.wikipedia.org/wiki/Dependency_injection
This is very similar to dependency injection. Separating state and construction from function or method implementation makes things a lot easier to test. In my opinion it's also easier to comprehend what the code actually does.
That's only needed if you use multiple threads. In a single thread global vars are just fine, and much simplier than passing around ctx
And do you always know beyond any reasonable doubt that your code will be single-threaded for all time? Because the moment this changes, you're in for a world of pain.
Wrapping the globals into to a struct context #ifdef MULTI-THREADED and adding this ctx for each call as first call is a matter of minutes. I've done this multiple times.
Much worse is protecting concurrent writes, eg to an object or hash table
If JavaScript ever gets real shared memory multithreading, it will be opt in so that the entire web doesn’t break. So, yes.
If the discussion here is meant to be solely about JavaScript, then I'll happily consider all my comments in this thread to be obsolete, since I don't have particularly strong opinions about that language since I don't use it a lot.
I was under the impression that many people here were discussing the usage of global variables more generally, though.
That's exactly why I used this specific example. I seen many code bases that use clone to avoid mutation problems so I wrote this specifically to show it can become a problem too.
I wrote a better article on globals. I plan on posting it next week
This seems more an issue with not understanding structuralClone, than one of understanding globals or lack thereof. There’s nothing wrong with the example, it does exactly what the code says it should — if you want counter to be “global” then structuralClone isn’t the function you want to call. The bug isn’t in how counter was in obj, the bug is in calling structuralClone when its behaviour wasn’t wanted.
With that said, it seems obvious that if you want to globally count the calls, then that count shouldn’t live in an argument where you (the function) don’t control its lifetime or how global it actually is. Simple has no say over what object obj.counter points to, it could trivially be a value type passed into that particular call, so if you know you want a global count then of course storing it in the argument is the wrong choice.
Global has two conflated meanings: global lifetime (ie lifetime of the whole program) and global access (which the article states). Simple needs global lifetime but not global access.
You rarely ”need” global access, although for things like a logger it can be convenient. Often you do need global lifetime.
The "god object"
The "environment".
couple dudes pooh-poohing the reader monad
That just seems like globals with extra steps. Suddenly if your context structure has a weird value in it, you’ll have to check every function to see who messed it up.
First, that's true for globals as well. Second, with "context structure" pattern, the modifications to it are usually done by copying this structure, modifying some fields in the copy and passing the copy downwards, which severely limits the impact radius and simplifies tracking down which function messed it up: it's either something above you in the call stack or one of the very few (hopefully) functions that changes this context by-reference, with intent to apply such changes globally.
That's 2 parts: 1. Global variable (mutable) 2. Local function with context argument (mutations)
You have clear tracking of when and how functions change the global variable
Hard disagree.
If I have 500 functions, I don't want to extrapolate out the overhead of passing a state object around to all of them. That's a waste of effort, and frankly makes me think you want to code using an FP paradigm even in imperative languages.
Module-level and thread-level "globals" are fine. You gain nothing (other than some smug ivory tower sense of superiority) by making your functions pure and passing around a global state object to every single method invocation.
You get functions that are easily testable in isolation with all state provided in parameters.
You also get explicit dependencies and scoping controlled by caller.
I don't mind globals but saying you get nothing for avoiding them is :/
I tend to use getter and setter functions to access globals and manage state.
Advantage only the function that depends on the global needs to bring in the dependency.
If that’s so useful, make your language support the concept of lexical environments instead. Otherwise it’s just manual sunsetting every day of week. Our craft is full of this “let’s pretend we’re in a lisp with good syntax” where half of it is missing, but fine, we’ll simulate it by hand. Dirt and sticks engineering.
(To be clear, I’m just tangentially ranting about the state of things in general, might as well post this under somewhere else.)
So, making it implicit again? No.
You can have it either way, it’s not for you but for people who disagree with what they deem a preference that is the only option when there’s no alternative.
I got into this argument with my former coworkers. Huge legacy codebase. Important information (such as the current tenant of our multi-tenant app) was hidden away in thread-local vars. This made code really hard to understand for newcomers because you just had to know that you'd have to set certain variables before calling certain other functions. Writing tests was also much more difficult and verbose. None of these preconditions were of course documented. We started getting into more trouble once we started using Kotlin coroutines which share threads between each other. You can solve this (by setting the correct coroutine context), but it made the code even harder to understand and more error-prone.
I said we should either abolish the thread-local variables or not use coroutines, but they said "we don't want to pass so many parameters around" and "coroutines are the modern paradigm in Kotlin", so no dice.
You know what helps manage all this complexity and keep the state internally and externally consistent?
Encapsulation. Provide methods for state manipulation that keep the application state in a known good configuration. App level, module level or thread level.
Use your test harness to control this state.
If you take a step back I think you’ll realize it’s six of one, half dozen of the other. Except this way doesn’t require manually passing an object into every function in your codebase.
These methods existed. The problem was that when you added some code somewhere deep down in layers of layers of business code, you never knew whether the code you'd call would need to access that information or whether it had already previously been set.
Hiding state like that is IMHO just a recipe for disaster. Sure, if you just use global state for metrics or something, it may not be a big deal, but to use it for important business-critical code... no, please pass it around, so I can see at a glance (and with help from my compiler) which parts of the code need what kind of information.
Yes, you gain testability.
If your global state contains something that runs in prod but should not run in a testing environment (e.g. a database connection), your global variable based code is now untestable.
Dependency Injection is popular for a very good reason.
This sounds like a design deficiency.
If you have something that should only run in testing, perhaps your test harness should set the global variable appropriately, no?
Just need to make sure your module doesn't get too big or unwieldy. I work in a codebase with some "module" C files with a litany of global statics and it's very difficult to understand the possible states it can be in or test it.
I agree that so long as overall complexity is limited these things can be OK. As soon as you're reading and writing a global in multiple locations though I would be extremely, extremely wary.
Hard to take your comment seriously when you go out of your way to degrade a discussion opponent, FYI.
My comment was not intended to be personally degrading to OP. Apologies if it was taken that way.
I did not say you are targeting OP. I meant that you are degrading your parent commenter.
This:
"You gain nothing (other than some smug ivory tower sense of superiority) by making your functions pure and passing around a global state object to every single method invocation."
...is neither productive nor actually true. But I'll save the latter part for your other reply.
I'll take my medicine. :)
I'm not above being put in my place and being shown the light.
(And when I said OP, I did mean the parent poster)
You should also understand that the "you" in my comment you quoted is the colloquial you and not necessarily the parent poster.
So lay it on me.
OK, I will.
But I also got a notification about a big comment you posted but it is now deleted. Did you want to rewrite it or did you give up on it?
I could not initially reply to you. Your comment rubbed me the wrong way, because I had no intention of trying to degrade anyone, and frankly, I was offended. But I thought better of my hasty and emotional response. I would rather take a deep breath, re-focus, re-engage, and be educated in a thoughtful dialog than get into a mud slinging contest. I am always willing to be enlightened.
A tip, in your profile you can set a delay which is a number of minutes before your comments will become visible to other people. Mine is set to 2 right now. This gives you time to edit your comment (helpful for some longer ones) but also to write some garbage response and then think better and delete it before anyone's the wiser.
It's also helpful to give you time to re-read a mostly good, but maybe not polite, response and tone down your response.
That is a good tip.
Much appreciated :)
Commendable, still let's not forget that the part I quoted was the beginning of a mud-slinging contest that you seemed willing to start. ;)
So indeed, let's re-focus and re-engage.
To that end, I will post you a longer reply Later™.
> If you run the code you'll see 1 2 3 4 3 printed instead of 5.
I'm really confused, as this behaviour appears to be completely obvious to me.
Yeah in the example it was obvious.
Of course in a real life program, it may be lost in other code logic and, most importantly, the function performing the clone may not be so explicit about it (e.g. an "update" function that returns a different copy of the object).
The one where the author almost rediscovers the singleton pattern.
It can also be an almost rediscovery of closures.
Please. Just don't.
Moving state into the global namespace and accessing it directly from that space makes it much more difficult to test, instrument and integrate.
Sure, if you're building disposable toy software, do whatever is easiest. But if you're building software for others to use, at least provide a context struct and pass that around when you can.
For those cases where this is challenging or impossible, please sequence your application or library initialization so that these globals are at least fungible/assignable at runtime.
Please no.
Singletons if you must. At least you can wrap a mutex around access if you're trying to make it thread safe.
For every example of a bug caused by not using a global variable I’m sure could find 10 caused by a global variable
Why are singletons better? That's just a global object.
> wrap a mutex
What if my program has one thread? Or the threads have clearly defined responsibilities?
Global objects need to be initialized. And if two are ever initialized you run into problems like the above.
Singleton is a pattern to ensure that a global objects is only ever initialized once.
Most programming languages written after 1990 let you initialize global variables lazily. The main problem is that the initialization order might be unexpected or you may run into conflicts. Singletons make the order slightly more predictable (based on first variable access), although it is till implicit (lazy).
But singletons are still a terrible idea. The issue with global variables is not just initialization. I would argue it's one of the more minor issues.
The major issues with global variables are:
1. Action at distance (if the variable is mutable)
2. Tight coupling (global dependencies are hard-coded, you cannot use dependency injection)
3. Hidden dependency. The dependency is not only tightly-coupled, it is also hidden from the interface. Client code that calls your function doesn't know that you rely on a global variable and that you can run into conflict with other code using that variable or that you may suddenly start accessing some database it didn't even know about.
Singleton does not solve any of the above. The only thing it ensures is lazy initialization on access.
how do you know that? just about every serious bug i have ever written was when i thought i understood multi-threaded code, but didn't.
Because as a programmer I have responsibility for the technical soundness of the program, and I don't create threads haphazardly.
> when i thought i understood multi-threaded code, but didn't.
All the more reason to carefully plan and limit shared state among threads. It's hard enough to get right when you know where the problems are and impossible if you spray and pray with mutexes.
Yeah but just because you do the right thing doesn’t mean that others will. That one person that creates threads haphazardly will wreak havoc and at scale this will happen. It’s an unstable equilibrium to put the onus of program soundness on the contributors to a program.
> That one person that creates threads haphazardly will wreak havoc
What if someone comes along and starts adding threads and doesn't check what they are accessing? And doesn't read the documented invariants of the design?
Well I don't think any project can succeed if that's the level disorganization.
Are there certain kinds of bugs that are easy to regress last minute? Yes. A brand new thread without limited state is not one of them.
> put the onus of program soundness on the contributors to a program.
Who then is responsible for making sure the program is correct?
> Who then is responsible for making sure the program is correct?
I’d say this is mostly a function of the language or framework.
After that, it’s up to the tech leads to provide access patterns/examples that align with the needs of a particular project.
My point is not so much that you shouldn’t ever think about the implications of your code, just that contributors are humans and any project with enough contributors will have a mix of contributors with different knowledge, experience and skill sets. If you are leading one of those, it would behoove you to make doing the right thing easy for later contributors.
> I’d say this is mostly a function of the language or framework.
frameworks cannot ensure your program does what it's supposed to. People are responsible, not tools.
> contributors are humans
Yes - which is why I would discourage haphazard thread usage, and document existing thread architecture.
That's safer than "hey I threw on a mutex because who knows where this is accessed from".
You know it is /currently/ accessed on one thread. These are little landmines that add up over time.
The burden is on the programmer adding a new thread to know what they can safely access.
The conclusion of your argument looks like 2000s Java - throw a mutex on every property because you never know when it will need to be accessed on a thread.
Designs that spread complexity rather than encapsulate it are rarely a good idea.
> Designs that spread complexity rather than encapsulate it are rarely a good idea.
Exactly, globals spread complexity.
You need to look at the implementation of each function to know if you can call it. That’s the landmine.
I agree that dependent sequences of events, coordinated through a global are bad. But there are other usages which are not error prone. For example an allocation, logger, readonly settings, or a cache.
That’s sounds more like a weakness of the language: that preventing data races is not automatically tracked for you, no?
If you care about performance, many locks across many pieces of shared state is always a bad idea.
If you don’t care about performance and want safety you should be using processes. Explicit shared memory is safer than implicit.
and yes. Better thread management tools are always welcome when we can get them.
You may hate my article next week, it's meant to replace this article. If you want you can email me for early access and tell me how I can improve the article. Lets say you can guess my email if you're emailing the right domain
You say "thread safe", I say "dead lock".
Any program that uses a database has a very similar problem to global variables.
As Gilad Bracha has pointed out, types are antimodular, and your database schema can be considered one giant type that pervades your program, just like globals can be.
I don't think we have tools to compositionally solve this, across different programming languages.
Your program is also one giant type.
Yeah but I don't use that type in 100s of places.
Migrations? Generate bindings for your queries automatically with query+schema definitions? (sqlc)
I think the author forgot the most useful use case for globals, and that is variables that has to do with the context the program is running under such as command line arguments and environment variables (properly validated and if needed escaped).
Do those change during program runtime though? I don't think many people have problems with global constants.
mutable global variables are intrinsically incompatible with a multithreaded environment. Having mutable shared state is never the right solution, unless you basically can live with data races. And that's before taking maintainability in consideration too
There are also things like translation, environment details and logger, which I wish were always present, like a super global.
Global variables are fine when you read many times, but you write to them sparingly and when they are updated but the old value is read/used you have mechanisms to handle errors and retry
> The problem is data access. Nothing more, nothing less. There's a term for this that has nothing to do with global variables: "action at a distance."
I mean yes, using global variables is just one of the ways to cause action-at-a-distance and that is... apparently a big reveal?
Otherwise sure, there is no pattern that cannot be utilized 100% correctly and without introducing bugs. Theoretically. Now let's look at the practical aspects and how often indiscriminately using such tempting patterns like global variables -- and mutexes-when-we-are-not-sure and I-will-remember-not-to-mutate-through-this-pointer -- lead to bugs down the road.
The answer is: fairly often.
IMO the article would be better titled as "There is no pattern that a lazy or careless programmer cannot use to introduce a bug".
Globals are essential to track state/values of variables in intermediate step, for debug, I create unique object on globals and store intermediate variables in it, so i can inspect values when something goes wrong, being only one unique object to maintain it will not override any existing vars thus not affecting existing programs
Surely both of the examples in the article are using global variables? Nobody ever said global variables are ok if they are an object? (Or did they?)
This assertion made in the article invalidates the premise of same:
What do you suppose is the recommended way to use the encapsulation? ;) This is partly why there's a "defining global variables" section, I know people will will consider some usage as not using a global
> What do you suppose is the recommended way to use the encapsulation? ;) This is partly why there's a "defining global variables" section, I know people will will consider some usage as not using a global
What the post discusses can be distilled into the concept of Referential Transparency[0]:
Any construct which is not referentially transparent can be classified as potentially effectual, which is not a Bad Thing(TM) as if there were no observable effects of a program, it would be useless.What makes programs easier to prove correct and reason about is when effects are localized to well-defined areas often "at the edges" of the architecture.
What makes programs impossible to prove correct and very difficult to reason about is when effectual logic is pervasive throughout, such as most of the "Global Variable Use Cases" the post advocates.
0 - https://en.wikipedia.org/wiki/Referential_transparency
I've recently found it helpful to think of problems at the scope of an individual process, rather than a set of functions in a library or framework. This makes it much clearer when a global variable makes sense or not, and generally where to initialize things, and place state.
Instance variables are to instance methods what global variables are to free functions and have exactly the same problems.
> The problem is data access. Nothing more, nothing less.
I agree with this, but the problem with global variables is precisely that they make bad data access patterns look easy and natural. Speaking from experience, it’s a lot easier to enforce a “no global variables” rule than explain to a new graduate why you won’t allow them to assign a variable in module X even though it’s OK in module Y.
You might like the article I wrote for next week. Could you tell me where this post is linked from? I didn't think anyone would see this when no one commented the first day
> Global Variables Are Not the Problem
It should have been "Mutability is the Problem, Globalness is not".
"Guns aren't the problem, it's all the people pulling triggers."
Agreed, global variables are fine up to a certain scale, especially if they're only used in the main file. They only really become a problem if you start modifying them from inside different files.
The real underlying problem is 'Spooky action at a distance'; it's not an issue that is specific to global variables. If you pass an instance between components by-reference and its properties get modified by multiple components, it can become difficult to track where the state changes originated and that can create very nasty, difficult-to-reproduce bugs. So this can happen even if your code is fully modularized; the issue is that passing instances by reference means that the properties of that instance behave similarly to global variables as they can be modified by multiple different components/files (without a single component being responsible for it).
That's partly where the motivation for functional programming comes from; it forces pass-by-value all the time to avoid all possibility of mutations. The core value is not unique to FP though; it comes from designing components such that they have a simple interface which requires mostly primitive types as parameters. Passing objects is OK too, so long as these objects only represent structured information and their references aren't being held onto for future transformation.
So for example, you can let components fully encapsulate all the 'instances' which they manage and only give those parent components INFORMATION about what they have to do (without trying to micromanage their child instances); I avoid passing instances or modules to each other as it generally indicates a leaky abstraction.
Sometimes it takes some creativity to find a solution which doesn't require instance-passing but when you find such solution, the benefits are usually significant and lasting. The focus should be on message-passing. Like when logging, the code will be easier to follow if all the errors from all the components bubble up to the main file (e.g. via events, streams, callbacks...) and are logged inside the main file because then any developer debugging the code can find the log inside the main file and then trade it down to its originating component.
Methods should be given information about what to do, they should not be given the tools to do their job... Like if you catch a taxi in real life, you don't bring a jerrycan of petrol and a steering wheel with you to give to the taxi driver. You just provide them with information; the address of your desired destination. You trust that the Taxi driver has all the tools they need to do the job.
If you do really want to pass an instance to another instance to manage, then the single-responsibility principle helps limit the complexity and possibility for spooky action. It should only be passed once to initialize and then the receiving component needs to have full control/responsibility for that child. I try to avoid as much as possible though.
There are only a few use-cases where global variables make sense:
1. Thread container registry with mutex lock for garbage collection and inter-process communication (children know their thread ID)
2. volatile system register and memory DMA use in low level cpu/mcu (the compiler and or linker could pooch the hardware memory layout)
3. Performance optimized pre-cached shared-memory state-machines with non-blocking magic
4. OS Unikernels
Not sure I have seen many other valid use-cases that splatter language scopes with bad/naive designs. YMMV =3
SQL databases are global variables
What does the ideal look like then? Data storage encapsulation in the app? Perhaps different DB users with granular access, accessed from different non-global parts of the program. Chuck in some views later when you need performance pragmatism!
And man, the bugs they cause.
Man, the bugs they prevent. Vs everyone rolling their own multithreaded file reader/writer code in C. How many programmers would think to journal transactions and ship them for backup for example.
SQL or 15000 lines of C++ or Go or Rust doing whatever with files.
Yes, and code that mutates them in an unmanaged way should be banned.
What is the managed way in this context?
Well personally I favour event sourcing and so in my systems the SQL database (if there is one) is only written to by the event rollup component. But even if you're not that extreme you probably want to have some clear structure to where your database writes happen rather than randomly writing it from any arbitrary line of code. The important thing is to have a structure that everyone working on the code understands so that you all know where any write to the database must have come from, not the specifics of what that structure is.
Or caches!
I have gotten shit before for using global variables, and sometimes that is justified, but I almost never see anyone given shit over treating Redis as a big ol’ global map.
When it comes to "patterns" and "anti-patterns", double standards are common.