It's sure a corny stance to hold if you're navigating an infrastructure nightmare daily, but in my opinion, much of the complexity addresses not technical, but organisational issues: You want straightforward, self-contained deployments for one, instead of uploading files onto your single server. If the process crashes or your harddisk dies, you want redundancy so even those twelve customers can still access the application. You want a CI pipeline, so the junior developer can't just break prod because they forgot to run the tests before pushing. You want proper secret management, so the database credentials aren't just accessible to everyone. You want a caching layer, so you're not surprised by a rogue SQL query that takes way too long, or a surge of users that exhaust the database connections because you never bothered to add proper pooling.
Adding guardrails to protect your team from itself mandates some complexity, but just hand-waving that away as unnecessary is a bad answer. At least if you're not working as part of a team.
>It's sure a corny stance to hold if you're navigating an infrastructure nightmare daily, but in my opinion, much of the complexity addresses not technical, but organisational issues: You want straightforward, self-contained deployments for one, instead of uploading files onto your single server ...
You can get all that with a monolith server and a Postgres backend.
With time, I discovered something interesting: for us, techies, using container orchestration is about reliability, zero-downtime deployments, limiting blast radius etc.
But for management, it's completely different. It's all about managing complexity on an organizational level. It's so much easier to think in terms "Team 1 is in charge of microservice A". And I know from experience that it works decently enough, at least in some orgs with competent management.
It’s not a management thing. I’m an engineer and I think it’s THE main advantage micro services actually provide: they split your code hard and allow a team to actually get ownership of the domain. No crossing domain boundaries, no in between shared code, etc.
I know: it’s ridiculous to have an architectural barrier for an organizational reason, and the cost of a bad slice multiplies. I still think in some situations, that is better to the gas-station-bathroom effect of shared codebases.
I don't see why it's ridiculous to have an architectural barrier for org reasons. Requiring every component to be behind a network call seems like overkill in nearly all cases, but encapsulating complexity into a library where domain experts can maintain it is how most software gets built. You've got to lock those demons away where they can't affect the rest of the users.
The problem is, that library usually does not provide good enough boundaries. C library can just shit over your process memory. Java library can cause all the hell over your objects with reflection, can just call System.exit(LOL). Minimal boundary to keep demons at bay is process boundary and you need some way for processes to talk to each other. If you're separating components into processes, it's very natural to put them to different machines, so you need your IPC to be network calls. One more step and you're implementing REST, because infra people love HTTP.
> it's very natural to put them to different machines, so you need your IPC to be network calls
But why is this natural? I’m not saying we shouldn’t have network RPC, but it’s not obvious to me that we should have only network RPC when there are cheap local IPC mechanisms.
>Requiring every component to be behind a network call seems like overkill in nearly all cases
That’s what I was referring to, sorry for the inaccurate adjective.
Most people try to split a monolith in domains, move code as libraries, or stuff like that - but IMO you rarely avoid a shared space importing the subdomains, with blurry/leaky boundaries, and with ownership falling between the cracks.
Micro services predispose better to avoid that shared space, as there is less expectation of an orchestrating common space. But as you say the cost is ridiculous.
I think there’s an unfilled space for an architectural design that somehow enforces boundaries and avoids common spaces as strongly as microservices do, without the physical separation.
How about old fashioned interprocess communication? You can have separate codebases, written in different languages, with different responsibilities, running on the same computer. Way fewer moving parts than RPC over a network.
And then you have some other group of people that sees all the redundancy and decides to implement a single unified platform on which all the microservices shall be deployed.
As soon there is more than one container to organise, it becomes a management task for said techies.
Then suddenly one realises that techies can also be bad at management.
Management of a container environment not only requires deployment skills but also documentational and communication skills. Suddenly it’s not management rather the techie that can't manage their tech stack.
This pointing of fingers at management is rather repetitive and simplistic but also very common.
> using container orchestration is about reliability, zero-downtime deployments
I think that's the first time I've heard any "techie" say we use containers because of reliability or zero-downtime deployments, those feel like they have nothing to do with each other, and we've been building reliable server-side software with zero-downtime deployments long before containers became the "go-to", and if anything it was easier before containers.
It would be interesting to hear your story, mine is that containers in general start an order of magnitude faster than vms (in general! we can easily find edge cases) and hence e.g. horizontal scaling is faster. You say it was easier before containers, I say k8s in spite of its complexity is a huge blessing as teams can upgrade their own parts independently and do things like canary releases easily with automated rollbacks etc. It's so much faster than VMs or bare metal (which I still use a lot and don't plan to abandon anytime soon but I understand their limitations).
You don't. When your server crashes, your availability is zero. It might crash because of a myriad of reasons; at some times, you might need to update the kernel to patch a security issue for example, and are forced to take your app down yourself.
If your business can afford irregular downtime, by all means, go for it. Otherwise, you'll need to take precautions, and that will invariably make the system more complex than that.
>You don't. When your server crashes, your availability is zero.
As your business needs grow, you can start layering complexity on top. The point is you don't start at 11 with a overly complex architecture.
In your example, if your server crashes, just make sure you have some sort of automatic restart. In practice that may mean a downtime of seconds for your 12 users. Is that more complexity? Sure - but not much. If you need to take your service down for maintenance, you notify your 12 users and schedule it for 2am ... etc.
Later you could create a secondary cluster and stick a load-balancer in-front. You could also add a secondary replicated PostgreSQL instance. So the monolith/postgres architecture can actually take you far as your business grows.
Changing/layering architecture adds risk. If you've got a standard way of working you can easily throw in on day one whose fundamentals then don't need to be changed for years, that's way lower risk, easier, faster
It is common for founding engineers to start with a preexisting way of working that they import from their previous more-scaled company, and that approach is refined and compounded over time
It does mean starting with more than is necessary at the start, but that doesn't mean it has to be particularly complex. It means you start with heaps of already-solved problems that you simply never have to deal with, allowing focus on the product goals and deep technical investments that need to be specific to the new company
Yeah theoretically that sounds good. But I had more downtime through cloud outages, Kubernetes updates then I ever had using simple linux server with nginx on hardware; most outages I had on linux was with my VPS was due to Digital Ocean issue with their own hardware failures. AWS was down not so long ago.
And if certain servers do get very important you just run a backup server with VPS and switch over DNS (even if you keep a high ttl, most servers update within minutes nowadays) or if you want to be fancy throw a load balancer in front of it.
If you solve issues in a few minutes people are always thankful, and most dont notice. With complicated setups it tends to take much longer before figuring out what the issue is in the first place.
You can have redundancy with a monolithic architecture. Just have two different web server behind a proxy, and use postgres with a hot standby (or use a managed postgres instance which already has that).
They are: But now you've expanded the definition of "a single monolith with postgres" to multiple replicas that need to be updated in sync, you've suddenly got shared state across multiple, fully isolated processes (in the best case) or running on multiple nodes (in the worst case), and a myriad of other subtle gotchas you need to account for, which raises the overall complexity considerably.
I don't see how you solve this with microservices. You'll have to take down your services in these situations too, a monolith vs microservices soup has the exact same problem.
Also in 5 years of working on both microservicy systems and monoliths, not once has these things you describe been a problem for me. Everything I've hosted in Azure has been perfectly available pretty much all the time unless a developer messed up or Azure itself has downtime that would have taken down either kind of app anyway.
But sure let's make our app 100 times more complicated because maybe some time in the next 10 years the complexity might save us an hour of downtime. I'd say it's more likely the added complexity will cause more downtime than it saves.
> I don't see how you solve this with microservices.
I don't think I implied that microservices are the solution, really. You can have a replicated monolith, but that absolutely adds complexity of its own.
> But sure let's make our app 100 times more complicated because maybe some time in the next 10 years the complexity might save us an hour of downtime.
Adding replicas and load balancing doesn't have to be a hundred times more complex.
> I'd say it's more likely the added complexity will cause more downtime than it saves.
As I said before, this is an assessment you will need to make for your use case, and balance uptime requirements against your complexity budget; either answer is valid, as long as you feel confident with it. Only a Sith believes in absolutes.
You're sarcastic, but heavens above, have I had some cringe interviews in my last round of interviews, and most of the absurdity came from smaller start-ups too
If you don't make it clear people will think you're serious.
Sarcasm doesn't work online, If I write something like "Donald Trump is the best president ever" you don't have any way of knowing whether I'm being sarcastic or I'm just really really stupid. Only people who know me can make that judgement, and basically nobody on here knows me. So I either have to avoid sarcasm or make it clear that I'm being sarcastic.
Most times it isn't complexity that bites, it is the brittleness. It's much easier to work with bad but well documented solution(e.g github actions) where all the issues have been faced by users and workaround is documented by community, rather than rolling out your own(e.g. simple script based CI/CD).
I'm not sure why your architecture needs to be complex to support CI pipelines and proper workflow for change management.
And some of these guidelines have grown into satus quo common recipes. Take your starting database for example, the guideline is always "sqlite only for testing, but for production you want Postgres" - it's misleading and absolutely unnecessary. These defaults have also become embedded into PaaS services e.g. the likes of Fly or Scaleway - having a disk attached to a VM instance where you can write data is never a default and usually complicated or expensive to setup. All while there is nothing wrong with a disk that gets backed up - it can support most modern mid sized apps out there before you need block storage and what not.
I've been involved in bootstrapping the infrastructure for several companies. You always start small, and add more components over time. I dare say, on the projects I was involved, we were fairly successful in balancing complexity, but some things really just make sense. Using a container orchestration tool spares you from tending to actual Linux servers, for example, that need updates and firewalls and IP addresses and managing SSH keys properly. The complexity is still there, but it shifts somewhere else. Looking at the big picture, that might mean your knowledge requirements ease on the systems administration stuff, and tighten on the cloud provider/IaC end; that might be a good trade off if you're working with a team of younger software engineers that don't have a strong Linux background, for example, which I assume is pretty common these days.
Or, consider redundancy: Your customers likely expect your service to not have an outage. That's a simple requirement, but very hard to get right, especially if you're using a single server that provides your application. Just introducing multiple copies of the app running in parallel comes with changes required in the app (you can't assume replica #1 will handle the first and second request—except if you jump through sticky session hoops, which is a rabbit hole on its own), in your networking (HTTP requests to the domain must be sent to multiple destinations), and your deployment process (artefacts must go to multiple places, restarts need to be choreographed).
Many teams (in my experience) that have a disdain for complex solutions will choose their own, bespoke way of solving these issues one by one, only to end up in a corner of their own making.
I guess what I'm saying is pretty mundane actually—solve the right problem at the right time, but no later.
Having recently built a Django app, I feel like I need to highlight the issues coming with using sqlite. Once you get into many to many relationships in your model, suddenly all kinds of things are not supported by sqlite, while they are when you use postgres. This also shows, that you actually cannot (!) use sqlite for testing, because it behaves significantly differently from postgres.
So I think now: Unless you have a really really simple model and app, you are just better off simply starting postgres or a postgres container.
My comment is that this is a choice that should be made for each project depending on what you’re building - does your model require features not supported by SQLite or Postgres etc.
> Unless you have a really really simple model and app
And this is the wrong conclusion. I have a really really complex model that works just fine with SQlite. So it’s not about how complex the model is, it’s about what you need. In the same way in the original post there were so many storage types, no doubt because of such “common knowledge guidelines”
OK, well, you don't always know all requirements ahead of time. When I do find out about them later on, I don't want to have to switch database backend then. For example initially I thought I would avoid those many to many relationships all together ... But turned out to be the most fitting way to do what I needed to do in Django.
I guess you could say "use sqlite as long as it lends itself well to what you are doing", sure. But when do you switch? At the first inconvenience? Or do you wait a while, until N inconveniences have been put into the codebase? And not to forget, the organizational resistance to things like changing the database. People not in the know (mangement usually) might question your plan to switch the database, because this workaround for this small little inconvenience _right now_ seems much less work and less risky for production ... Before you know it, you will have 10 workarounds in there, and sunken cost fallacy.
I may be exaggerating a little bit, but it's not like this is a crazy to imagine picture I am painting here.
Years ago we had someone who wanted to make sure that two deployments were mutually exclusive. Can’t recall why now, but something with a test environment and bootstrapping so no redundancy.
I just set one build agent up with a tag that both plans required. The simplest thing that could possibly work.
I think that's a slightly different set of things to what OP is complaining about though. They're much more reasonable, but also "outside" of the application. Having secret management or CI (pretty much mandatory!) does not dictate the architecture of the application at all.
(except the caching layer. Remember the three hard problems of computer science, of which cache invalidation is one.)
Still hoping for a good "steelman" demonstration of microservices for something that isn't FAANG-sized.
> Having secret management or CI (pretty much mandatory!) does not dictate the architecture of the application at all.
Oh, it absolutely does. You need some way to get your secrets into the application, at build- or at runtime, for one, without compromising security. There's a lot of subtle catches here that can be avoided by picking standard tooling instead of making it yourself, but doing so definitely shapes your architecture.
It really shouldn't. Getting the secrets in place should be done by otherwise unrelated tooling. Your apps or services should rely on the secrets being in place at start time. Often it is a matter of rendering a file at deployment time and the jobs of putting the secrets there is the job of the CI, and CI invoked tools, not the job of the service itself.
Cache invalidation is replacing one logical thing with a new version of the same logical thing. So technically that’s also naming things. Doubly so when you put them in a kv store.
That angle seems potentially insightful, and I'm going to have to think about it, but to me, cache invalidation seems more like replacing one logical thing with nothing. It may or may not get replaced with a new version of the same logical thing later if that's required.
To me, cache invalidation is not strictly about either replacing or removing cache entries.
Rather, cache invalidation is the process of determining which cache entries are stale and need to be replaced/removed.
It gets hairy when determining that depends on users, user group memberships AND per-user permissions, access TTL, multiple types of timestamps and/or revision numbering, and especially when the cache entries are composite as in contain data from multiple database entities, where some are e.g. representing a hierarchy and may not even have direct entity relationships with the cached data.
> You want a CI pipeline, so the junior developer can't just break prod because they forgot to run the tests before pushing.
Make them part of your build first. Tagging a release? Have a documented process (checklist) that says 'run this, do that'. Like how in a Java Maven build you would execute `mvn release:prepare` and `mvn release:perform`, which will execute all tests as well as do the git tagging and anything else that needs doing.
Scale up to a CI pipeline once that works. It is step one for doing that anyway.
Why not do a CI pipeline from the beginning instead of relying on trust that no one ever forgets to run a check, considering adding CI is trivial with gitlab or github.
Because it adds friction, and whoever introduces that CI pipeline will be the one getting messages from annoyed developers, saying "your pipeline isn't working again". It's definitely a source of complexity on its own, so something you want to consider first.
I'm aware of how much overhead CI pipelines can be, especially for multiple platforms and architectures, but at the same time developing for N>1 developers without some sort of CI feels like developing without version control: it's like coming to work without your trousers on.
Yeah, that was my entire point really—there's some complexity that's just warranted. It's similar to a proper risk assessment analysis: The goal isn't to avoid all possible risks, but accepting some risk factors as long as you can justify them properly.
As long as you're pragmatic and honest with what you need from your CI setup, it's okay that it makes your system more complex—you're getting something in return after all.
U agree it adds a bit of complexity, but all code adds complexity.
Maybe interacted with CIs too much and it's Stockholm syndrome, but they are there to help tame and offload complexity, not just complexity for complexity'a sake
> they are there to help tame and offload complexity, not just complexity for complexity'a sake
Theoretically. Practically, you're hunting for the reason why your GitHub token doesn't allow you to install a private package from another repository in your org during the build, then you learn you need a classic personal access token tied to an individual user account to interact with GitHub's own package registry, you decide that that sounds brittle and after some pondering, you figure that you can just create a GitHub app that you install in your org and write a small action that uses the GitHub API to create an on-demand token with the correct scopes, and you just need to bundle that so you can use it in your pipeline, but that requires a node_modules folder in your repository, and…
Oh! Could it be that you just added complexity for complexity's sake?
They wouldn't have time to hear it because they'd be trying to fix their local dev environment.
I worked for a company that had done pretty much that - not fun at all (for extra fun half the microservices where in a language only half the dev team had even passing familiarity with).
You need someone in charge with "taste" enough to not allow that to happen or it can happen.
Probably =), or Conway's law was always about the lower ends of the communication nodes in a company graph. I think it's time we also always include the upper ends of our cognitive limits of multitasking when we design systems in relation to organization structures.
That does imply that the people in the business with the authority to do that know how to do that and they in my experience don't - they can't solve a problem they don't understand and are unwilling to delegate it to someone who can understand it.
The same pattern repeats across multiple companies - it comes down to trust and delegation, if the people with the power are unwilling to delegate bad things happen.
> If the process crashes or your harddisk dies, you want redundancy so even those twelve customers can still access the application.
That's fine, 6 of them are test accounts :-)
> It's sure a corny stance to hold if you're navigating an infrastructure nightmare daily, but in my opinion, much of the complexity addresses not technical, but organisational issues
If you have an entire organisation dedicated to 6 users, those users had better be ultra profitable.
> If the process crashes or your harddisk dies, you want redundancy so even those twelve customers can still access the application
Can be done simply by a sole company owner; no need for tools that make sense in an organisation (K8s, etc)
> You want a CI pipeline, so the junior developer can't just break prod because they forgot to run the tests before pushing.
A deployment script that includes test runners is fine for focused product. You can even do it using a green/blue strategy if you can afford the extra $5-$10/m for an extra VPS.
> You want proper secret management, so the database credentials aren't just accessible to everyone.
Sure, but you don't need to deploy a full-on secrets-manager product for this.
> You want a caching layer, so you're not surprised by a rogue SQL query that takes way too long, or a surge of users that exhaust the database connections because you never bothered to add proper pooling.
Meh. The caching layer is not to protect you against rogue SQL queries taking too long; that's not what a cache is for, after all. As for proper pooling, what's wrong with using the pool that came with your tech stack? Do you really need to spend time setting up a different product for pooling?
> dding guardrails to protect your team from itself mandates some complexity, but just hand-waving that away as unnecessary is a bad answer.
I agree with that; the key is knowing when those things are needed, and TBH unless you're doing a B2C product, or have an extremely large B2B client, those things are unnecessary.
Sure, but most of that doesn't make it into the final production thing on the server. CI? Nope. Tests? Nope. The management of the secrets (not the secrets themselves)? Nope. Caching? OK that one does. Rate limits? Maybe, but could be another layer outside the normal services' implementation.
I feel like sometimes it’s a form of procrastination.
There are things we don’t want to do (talk to costumers, investors, legal, etc.), so instead we do the fun things (fun for engineers).
It’s a convenient arrangement because we can easily convince ourselves and others that we’re actually being productive (we’re not, we’re just spinning wheels).
My first 5 years or so of solo bootstrapping were this. Then you learn that if you want to make money you have to prioritise the right things and not the fun things.
I'm at this stage. We have a good product with a solid architecture but only a few paying clients due to a complete lack of marketing. So I'm now doing the unfun things!
It's the natural evolution to becoming a fun addict.
Unless you actively push yourself to do the uncomfortable work every day, you will always slowly deteriorate and you will run into huge issues in the future that could've been avoided.
I see your point. But accidental complexity is the most uncomfortable work there is to me. Do programmers really find so much fun in creating accidental complexity?
Removing it, no matter whether I created it myself, sure, that can be a hard problem.
I've certainly been guilty creating accidental complexity as a form of procrastrination I guess. But building a microservices architecture is not one of these cases.
FWIW, the alternative stack presented here for small web sites/apps seems infinitely more fun.
Immediate feedback, easy to create something visible and change things, etc.
Ironically, it could also lead to complexity when in reality, there is (for example) an actual need for a message queue.
But setting up such stuff without a need sounds easier to avoid to me than, for example, overgeneralizing some code to handle more cases than the relevant ones.
When I feel there are customer or company requirements that I can't fulfill properly, but I should, that's a hard problem for me. Or when I feel unable to clarify achievable goals and communicate productively.
But procrastrination via accidental complexity is mostly the opposite of fun to me.
It all comes back when trying to solve real problems and spending work time solving these problems is more fun than working on homemade problems.
Doing work that I am able to complete and achieving tangible results is more fun than getting tangled in a mess of unneeded complexity. I don't see how this is fun for engineers, maybe I'm not an engineer then.
Over-generalization, setting wrong priorities, that I can understand.
But setting up complex infra or a microservices architecture where it's unneeded, that doesn't seem fun to me at all :)
Normally the impetus to overcomplicate ends before devs become experienced enough to be able to even do such complex infra by themselves. It often manifests as complex code only.
Overengineered infra doesn't happen in a vacuum. There is always support from the entire company.
"Do programmers really find so much fun in creating accidental complexity?"
I certainly did for a number of years - I just had the luck that the cool things I happened to pick on in the early/mid 1990s turned out to be quite important (Web '92, Java '94).
Now my views have flipped almost completely the other way - technology as a means of delivering value.
Edit: Other cool technology that I loved like Common Lisp & CLOS, NeWS and PostScript turned out to be less useful...
I see what you mean, sometimes "accidental complexity" can also be a form of getting to know a technology really well and that can be useful and still fun. Kudos for that :)
I like your idea of doing some amount of uncomfortable work every day, internalizing it until it becomes second nature. Any tips on how to start? (other than just do it) :)
Or is it to satisfy the ideals of some CTO/VPE disconnected from the real world that wants architecture to be done a certain way?
I still remember doing systems design interviews a few years ago when microservices were in vogue, and my routine was probing if they were ok with a simpler monolith or if they wanted to go crazy on cloud-native, serverless and microservices shizzle.
It did backfire once on a cloud infrastructure company that had "microservices" plastered in their marketing, even though the people interviewing me actually hated it. They offered me an IC position (which I told them to fuck off), because they really hated how I did the exercise with microservices.
Before that, it almost backfired when I initially offered a monolith for a (unbeknownst to me) microservice-heavy company. Luckily I managed to read the room and pivot to microservice during the 1h systems design exercise.
EDIT: Point is, people in positions of power have very clear expectations/preferences of what they want, and it's not fun burning political capital to go against those preferences.
I dont quite follow. I understand mono vs micro services, and in the last 3 weeks I had to study for system design and do the interviews to get offers.
Its a tradeoff, and the system design interview is meant to see if one understands how systems can scale to hypothetical (maybe unrealistic) high loads. In this context the only reason for a microservice is independent scaling and with that also fault tolerance if an unimportant service goes down. But its really the independent scaling. One would clearly say that a monolith is good for the start because it offer simplicity or low complexity but it doesn't scale well to the hypothetical of mega scale.
In practice, it seems not to be a tradeoff but an ideology. Largely because you can't measure the counter-factual of building the app the other way.
It's been a long time since I've done "normal" web development, but I've done a number of high-performance or high-reliability non-web applications, and I think people really underestimate vertical scaling. Even back in the early 2000s when it was slightly hard to get a machine with 128GB of RAM to run some chip design software, doing so was much easier than trying to design a distributed system to handle the problem.
(we had a distributed system of ccache/distcc to handle building the thing instead)
Do people have a good example of microservices they can point us to the source of? By definition it's not one of those things that makes much sense with toy-sized examples. Things like Amazon and Twitter have "micro" services that are very much not micro.
I dont disagree, but you can horizontally scale a monolith too, no? So scaling vert vs horiz is independent of microservices, its just that separating services allows you to be precise with your scaling. Ie you can scale up a compute heavy micorservice by 100x, the upload service by 10x but keep the user service at low scale. I agree that one can vert scale, why not. And I also agree that there are probably big microservices. At my last workplace, we also had people very bullish on microservices but for bad reasons and it didn't make sense, ie ideology.
Can you elaborate on the fault tolerance advantage of micro services?
For context, my current project is a monolith web app with services being part of the monolith and called with try/catch. I can understand perhaps faster, independent, less risky recovery in the micro services case but don’t quite understand the fault tolerance gain.
Im no world leading expert but as far as I understand, coupled with events, if an unimportant service goes offline for 5 min (due to some crash, ie "fault"), its possible to have a graceful degradation, meaning the rest of the system still works, maybe with reduced ability. With events, other systems simply stop receiving events from the dead service. I agree you can achieve a lot of this also in a monolith with try catch and error handling, but I guess there is an inherent decoupling in having different services run on separate nodes.
I fully agree on workplace politics, but for system design interviews, are you not also just supposed to ask your interviewer, ie give them your premises and if they like your conclusions? I also understand that some companies and their interviews are weird, but thats okay too, no? You just reject them and move on.
If there's a big enough bias, the questions become entirely about finding that bias. And on 90% of cases the systems design questions are about something they designed in-house, and they often don't have a lot of experience as well.
Also: if there's limited knowledge on the interviewer side, an incorrect answer to a question might throw off a more experienced candidate.
It's no big deal but it becomes more about reading the room and knowing the company/interviewers than being honest in what you would do. People don't want to hear that their pet solution is not the best. Of course you still need to know the tech and explain it all.
I worked for a company once where the CEO said I need to start using Kubernetes. Why? We didn't really have any pressing use cases / issues that were shouting out for Kubernetes at all.
His reasoning was all the big players use it, so we should be too...
It was literally a solution looking for a problem. Which is completely arse backwards.
An improved CV, lets be honest most stuff is boring projects that could even be built with 1990's technology, distributed systems is not something that was invented yesterday.
However having in the CV any of those items from left side in the deployment strategy is way cooler than mentioning n-tier architecture, RPC (regardless how they are in the wire), any 1990's programming language, and so forth.
A side effect from how hiring works so badly in our industry, it isn't enough to know how master a knife to be a chef, it must be a specific brand of knife, otherwise the chef is not good enough for the kitchen.
This is also how you can identify decent places to work at: look for job postings that emphasize you aren't expected to already know the language.
For example, in the recent "who's hiring" thread, I saw at least two places where they did that: Duckduckgo (they mention only algorithms and data structures and say "in case you're curious, we use Perl") and Stream (they offer a 10-week intro course to Go if you're not already familiar with it). If I remember correctly, Jane Street also doesn't require prior OCaml experience.
The place where I work (bevuta IT GmbH) also allowed me to learn Clojure on the job (but it certainly helped that I was already an expert in another Lisp dialect).
These hiring practices are a far cry from those old style job postings like "must have 10+ years of experience with Ruby on Rails" when the framework was only 5 years old.
To do that you need a mixture of elements: work in a somehow "exotic" language [1] and the company can afford to pay top-talent salary [2]
[1] all those examples check that box, but please let's not start a language war over this statement.
[2] for Jane Street I hear they do, DDG pays pretty well especially because it pay the same rate regardless where you are in the world, so it's a top-talent salary for many places outside SV.
Sounds like the best type of place to work for me. Instead of being a replaceable cog in a meat grinder that doesn't even pay well, working with boring tech, you get to work with talented people in an actually interesting language and get decently paid.
And best of all, you don't feel the need to keep chasing after the latest hype just to keep your CV relevant.
This comment sums up my view as well, but I must confess that I’ve designed architectures more complex than necessary more than once, just to try new things and compare them with what I already knew. I just had to know!
Any minute you spend in a job interview defending your application server + Postgres solution, is a minute that you will lack to talk of follow up questions about the distributed system that interviewer was expecting.
Yes, it’s nonsense, stirring up a turbulent slurry of eventually consistent components for the sake of supporting hundreds of users per second, it’s also the nonsense that you’re expected to say, just do it.
Really that's going way too far - you do NOT need Redis for caching. Just put it in Postgres. Why go to this much trouble to put people in their place for over engineering then concede "maybe Redis for caching" when this is absolutely something you can do in Postgres. The author clearly cannot stop their own inner desire for overengineering.
A cache can help even for small stuff if there's something time-consuming to do on a small server.
Redis/valkey is definitely overkill though. A slightly modified memcached config (only so it accepts larger keys; server responses larger than 1MB aren't always avoidable) is a far simpler solution that provides 99% of what you need in practice. Unlike redis/valkey, it's also explicitly a volatile cache that can't do persistence, meaning you are disincentivized from bad software design patterns where the cache becomes state your application assumes any level of consistency of (including it's existence). If you aren't serving millions of users, stateful cache is a pattern best avoided.
DB caches aren't very good mostly because of speed; they have to read from the filesystem (and have network overhead), while a cache reads from memory and can often just live on the same server as the rest of the service.
I personally wouldn't like to put caching in Postgres, even though it would work at lower scales. But at that scale I don't really need caching anyway. Having the ephemeral data in a different system is more appealing to me as well.
The caching abstractions your frameworks have are also likely designed with something like Redis in mind and work with it out of the box. And often you can just start with an in-memory cache and add Redis later, if you need it.
>I personally wouldn't like to put caching in Postgres, even though it would work at lower scales.
Probably should stop after this line - that was the point of the article. It will work at lower scales. Optimize later when you actually know what to optimize.
My point is more that at that scale I'd try to avoid caching entirely. Unless you're doing analytical queries over large tables, Postgres is plenty fast without caching if you're not doing anything stupid.
Redis is the filler you shove in there when Postgres itself starts slowing down. Writing database queries that work and writing database queries that work efficiently are very different things.
It'll give you time to redesign and rebuild so Postgres is fast enough again. Then you can take Redis out, but once you've set it up you may as well keep it running just in case.
Ha ha nice one - when your startup is Facebook you'll need that, not for your 12 users.
The reason startups get to their super kubernetes 6 layers mega AWS powered ultra cached hyper pipelined ultra optimised web queued applicatyion with no users is because "but technology X has support for an eventually consistent in-memory caching layer!!"
What about when we launch and hit the front page of HN how will the site stay up without "an eventually consistent in-memory caching layer"?
The sentiment here is right, but redis does make a difference at scale. I built a web app this year on AWS lambda that had up to 1000/requests/second and at that scale, you can have trouble with Postgres, but redis handles it like it’s nothing.
I think that redis is a reasonable exception to the rule of ”don’t complicate things” because it’s so simple. Even if you have never used it before, it takes a few minutes to setup and it’s very easy to reason about, unlike mongodb or Kafka or k8s.
Postgres itself has no issue with 1000 simple requests per second. On normal notebook hardware you'll get easily several thousand requests per second if you're just looking up small amounts of data by the primary key. And that is without any optimization and with non-DB overhead included. The actual performance you can get is probably quite a bit higher, but I've seen ~4-6k requests per second on naive, unoptimized endpoints that just look up some data in Postgres.
> Postgres itself has no issue with 1000 simple requests per second
Postgres in isolation has no problem with 1000 RPS. But does your Postgres server have that ability? Your server is also handling more complex requests and maybe some writes and concurrent re-indexing.
Because they’re meeting the patients at their own level. Plus while using PG for everything is a currently popular meme on HN (and I am all for it), it’s not something you see all that often. An app server, a database and a cache is a pretty sensible and simple starting point.
Until you get to 100 test users. Then you need Kafka and k8.
Sometimes you have to pick your poison when those with other agendas or just inexperience want to complicate things. Conceding that they can use Redis somehow might be enough to get them to stop blaming you for the 'out of date' architecture?
I love the fact that the author "wrote" this page with massive CSS framework (tailwind) and some sort of Javascript framework, with a bundler and obfuscator - instead of a plain, simple HTML page. Well played! :-)
Fair, the author's point would have been stronger if the page was made using just static HTML/CSS.
But I have to defend Tailwind, it's not a massive CSS framework, it just generates CSS utility classes. Only the utility classes you use end up in the output CSS.
good point, got rid of google font. for the rest, it's just that the DX is unmatched to get something quick and dirty started up. (I _could_ have done a simple .html though).
and I'll concede that react is overkill for this, I just wanted components + typescript to not waste time on silly vanilla js bugs
Thinking is scary. No one (among non-thinking colleagues) is going to criticize you for using de-facto standard services like kafka, mongo, redis, ecc... regardless of the nonsensical architecture you come up with.
Yes, I also put Redis in that list. You can cache and serve data structure in many other ways, for example replicate the individual features you need in you application instead of going the lazy route and another service to the mix. And don't get me started on Kafka... money thrown in the drain when a stupid grpc/whatever service would do.
Part of being an engineer is also selecting the minimum amount of components for your architecture and not being afraid of implementing something on your own if you only need 1 of 100s features that an existing product require.
> Add complexity only when you have proof you need it.
This does assume that said complexity can be added ad hoc later. Often earlier architecture choices make additions complex too or even prevent it entirely without a complete rewrite
So while the overall message is true there is some liberal use of simplification at play here too
In some cases a compromise can make sense. Eg use k8s but keep it simple within that - as vanilla as you can make it
Useful, but 10 years ago without JSONB in PG it wasn't really the answer to everything. But as of today, I am recommending PG to anyone that does not have a good reason or use case to NOT use it.
There is an argument I rarely ever see in discussions like this, which is about reducing the need for working memory in humans. I'm just in the mid thirties, but my ability to keep things in working memory is vastly reduced compared to my twenties. Might just be me who's not cut out for programming or system architecturing, but in my experience what is hard for me is often what is hard for others, they just either don't think about it or ignore it and push through keeping hidden costs alive.
My argument is this; even if the system itself becomes more complex, it might be worth it to make it better partitioned for human reasoning. I tend to quickly get overwhelmed and my memory is getting worse by the minute. It's a blessing for me with smaller services that I can reason about, predict consequences from, deeply understand. I can ignore everything else. When I have to deal with the infrastructure, I can focus on that alone. We also have better and more declarative tools for handling infrastructure compared to code. It's a blessing when 18 services doesn't use the same database and it's a blessing when 17 services isn't colocated in the same repository having dependencies that most people don't even identify as dependencies. Think law of leaky abstractions.
This is a good point - having your code broken up into standalone units that can fit into working memory has real benefits to the coder. I think especially with the rise of coding agents (which, like it or not, are here to stay and are likely going to increase in use over time), sections of code that can fit in a context window cleanly will be much more amenable to manipulation by LLMs and require less human oversight to modify, which may be super useful for companies that want to move faster than the speed of human programming will allow.
Oh my word Riak - I haven't seen that DB mentioned for years!
I totally get the point it makes. I remember many years ago we announced SocketStream at a HackerNews meet-up and it went straight to #1. The traffic was incredible but none of us were DevOps pros so I ended up restarting the Node.js process manually via SSH from a pub in London every time the Node.js process crashed.
If only I'd known about upstart on Ubuntu then I'd have saved some trouble for that night at least.
I think the other thing is worrying about SPOF and knowing how to respond if services go down for any reason (e.g. server runs out of disk space - perhaps log rotation hasn't been setup, or has a hardware failure of some kind, or the data center has an outage - I remember Linode would have a few in their London datacenter that just happened to occur at the worst possible time).
If you're building a side project I can see the appeal of not going overboard and setting up a Kubernetes cluster from the get-go, but when it is things that are more serious and critical (like digital infrastructure for supporting car services like remotely turning on climate controls in a car), then you design the system like your life depends on it.
I think remote climate controls in a car are an ideal use-case for a simpler architecture.
Consider WhatsApp could do 2M TCP connections on a single server 13 years ago, and Ford sells about 2M cars per year. Basic controls like changing the climate can definitely fit in one TCP packet, and aren't sent frequently, so with some hand-waving, it would be reasonable to expect a single server to handle all remote controls for a manufacturer for all cars from some year model.
Or maybe you could use wifi-direct and bypass the need for a server.
Or a button on the key fob. Perhaps the app can talk to the key fob over NFC or Bluetooth? Local/non-internet controls will probably be more reliable off-grid... can't have a server outage if there are no servers.
I guess my point is if you take a step back, there are often simple, good solutions possible.
The fact that we have lambdas/serverless functions and people are still over-engineering k8s clusters for their "startup project" is genuinely hilarious. You can literally validate your idea with some janky Python code and like 20 bucks a month.
The problem is that people don't like hearing their ideas suck. I do this too, to be fair. So, yes, we spend endless hours architecting what we'd desperately hope will be the next Facebook because hearing "you are definitely not the next Facebook" sucks. But alas, that's what doing startups is: mostly building 1000 not-Facebooks.
The lesson here is that the faster you fail, the faster you can succeed.
The alternative to CI/CD pipelines is to rely on human beings to perform the same repetitive actions the exact same way every single time without any mistakes. You would never convince me to accept that for any non-trivial project.
Especially in an age where you can basically click a menu in GitHub and say "Hey, can I have a CI pipeline please?"
This kind of complexity is unfortunately also embedded into model training data.
Left unchecked, Claude is very happy to propose "robust, scalable and production ready" solutions - you can try it for yourself. Tell it you want to handle new signups and perform some work like send an email or something (outside the lifecycle of the web request).
That is, implying you need some kind of a background workload and watch it bring in redis, workflow engines, multiple layouts for docker deployment so you can run with and without jobs, obscene amount of environment variables to configure all that, create "fallbacks" and retries and all kinds of things that you will never spend time on during an MVP and even later resist adding just because of the complexity and maintenance they require.
All that while (as in the diagram of the post), there is an Erlang/Elixir app capable of doing all that in memory :).
Or at least it's not engaging with the obvious counterargument at all - that: "You may not need the scale now, but you may need it later". For a startup being a unicorn with a bajillion users is the only outcome that actually counts as success. It's the outcome they sell to their investors.
So sure, you can make a unscalable solution that works for the current moment. Most likely you wont need more. But that's only true b/c most startups don't end up unicorns. Most likely is you burn through their VC funding and fold
Okay stack overflow allegedly runs on a toaster, but most products don't fit that mold - and now that they're tied to their toaster it probably severely constrains what SO can do it terms of evolving their service
>So sure, you can make a unscalable solution that works for the current moment.
You're making two assumptions - both wrong:
1) That this is an unscalable solution - A monolith app server backed by Postgres can take you very very far. You can vertically scale by throwing more hardware at it, and you can horizontally scale, by just duplicating your monolith server behind a load-balancer.
2) That you actually know where your bottlenecks will be when you actually hit your target scale. When (if) you go from 1000 users to 10,000,000 users, you WILL be re-designing and re-architecting your solution regardless what you started with because at that point, you're going to have a different team, different use-cases, and therefore a different business.
Ironic that the clicking those big buttons only causes a JS error to be logged to console with nothing else happening. That doesn't particularly lend to the authors credibility, although the advice of using simple architecture where possible is correct.
Postgres is simpler. Get your cloud to manage it. Click to create instance, get failover with zero setup. Click button 2 to get guaranteed backups and snapshot point in time.
I would argue it is not resilient enough for a web app.
No one wrote the rules in stone, but I assume server side you want the host to manage data recovery and availability. Client side it is the laptop owners problem. On a laptop, availability is almost entirely correlated with "has power source" and "works" and data recovery "made a backup somehow".
I was especially thinking about a program running on a server. I wouldn't statically link everything for a desktop program, because it just causes incompatibilities left-and-right. But for a server, you don't have external vendors, so this is a good option, and if you use SQLite as the database, then this means that deploying is not more complicated than uploading a single executable and this is also something, that can be done atomically.
I don't see how this has worse effects on recovery and availability. The data is still in a separate file, that you can backup and the modification still happens through a database layer which handles atomic transactions and file system interaction. The availability is also not worse, unless you would have hot code reloading without SQLite, which seems like an orthogonal issue.
You always get a comment like this. I don't particularly agree. There are pros and cons to either approach
The tooling in a lot of languages and frameworks expects you to use an ORM, so a lot of the time you will have to put up a fair bit of upfront effort to just use Raw SQL (especially in .NET land).
On top of that ORM makes a lot of things that are super tedious like mapping to models extremely easy. The performance gains of writing SQL is very minor if the ORM is good.
Don't agree. Getting managed postgress from one of the myriad providers is not much harder than using sqlite, but postgress is more flexible and future proof.
I use Postgres for pretty much everything once I get beyond "text in a database with a foreign key to a couple of things".
Why?
Because in 1999 when I started using PHP3 to write websites, I couldn't get MySQL to work properly and Postgres was harder but had better documentation.
It's ridiculous spinning up something as "industrial strength" as Postgres for a daft wee blog, just as ridiculous as using a 500bhp Scania V8 for your lawnmower.
Now if you'll excuse me, I have to go and spend ten seconds cutting my lawn.
This. So much this. Of course, at one point you start wanting to do queues, and concurrent jobs, and not even WAL mode and a single writer approach can cut it, but if you've reached that point then usually you a) are in that "this is a good problem to have" scalability curve, and b) you can just switch to Postgres.
I've built pretty scalable things using nothing but Python, Celery and Postgres (that usually started as asyncio queues and sqlite).
Yeah, we run a fairly busy systems on sqlite + litestream. It's not a big deal if they ae down for a bit (never happened though) so they don't need failover and we never had issues (after some sqlite pragma and BUSY code tweaking). Vastly simpler than running + maintaining postgres/mysql. Of course, everything has it's place and we run those too, but just saying that not many people/companies need them really. (Also considering that we see system which DO have postgres/mysql/oracle/mssql set up in HA and still go down for hours do a day per year anyway so what's it all good for).
Recently, with the AWS outage, our stack of loads of different cloud providers ended up working pretty well! It might be a bit complex running distributed nodes and updating state via API, but its cheap and clearly resilient.
Or build your microservices as a monolith using a “local” async service mesh (no libs or boilerplate needed, its just an async interface for each service) and service namespaced tables in your DB, then just swap in a distributed transport on a per-case basis if you ever need to scale.
It's like debugging the code of that guy who wrote most of the project, was consuming entire coffee in the office, outtalked everyone at the meetings, and then relocated for a new job to Zurich or London.
12 years on, and a lot of Postgres-based services built since the OP site first went live, I now actually may recommend MongoDB as the sensible option...
The problem is job interviews, where you are expected to know how to scale everything reliably, so it wouldn't be satisfactory to answer that just have a monolith against a postgres instance.
I had a call the other day with a consultancy to potentially pick up some infrastructure work/project type stuff. Asked about timezones involved and they said a lot of their clientele are US based startups. "So it's mainly Kubernetes work" they said.
I personally would suggest the vast majority of those startups do not need Kubernetes and certainly don't need to be paying a consultancy to then pay me to fix their issues for them.
The problem with kubernetes is that containers just aren't quite enough.
You have an app which runs, now you want to put it in a container somewhere. Great. how do you build that container? Github actions. Great. How does that deploy your app to wherever it's running? Err... docker tag + docker push + ssh + docker pull + docker restart?
You've hit scale. You want redis now. How do you deploy that? Do you want redis, your machine, and your db in thre separate datacenters and to pay egress between all the services? Probably not, so you just want a little redis sidecar container... How does the app get the connection string for it?
When you're into home grown shim scripts which _are_ brittle and error prone, it's messy.K8s is a sledgehammer, but it's a sledgehammer that works. ECS is aws-only, and has its own fair share of warts. Cloud Run/Azure Container Apps are _great_ but there's nothing like those to run on DigitalOcean/Hetzner/whatever. So your choices are to use a big cloud with a simpler orchestartion, or use some sort of non-standard orchestration that you have to manage yourself, or just use k8s...
While I agree with most of this rant, I have a problem with the common "just use postgres" trope, often repeated here.
I recently had to work with SQL again after many years, and was appalled at the incidental complexity and ridiculous limitations. Who in this century would still voluntarily do in-band database commands mixed with user-supplied data? Also, the restrictions on column naming mean that you pretty much have to use some kind of ORM mapping, you can't just store your data. That means an entire layer of code that your application doesn't really need, just to adapt to a non-standard from the 70s.
"just use postgres" is an excellent advice. How about incidental complexity and ridiculous limitations of an ORM?
Time spent learning how to use an ORM can better be spent 'refreshing' your SQL knowledge.
Also, when you learn how an ORM works, you still don't know proper SQL nor how do databases works, so when you switch language now what, you quickly take a course on another ORM?
SQL is a language, ORM is not,it's just ' an entire layer of code that your application doesn't really need' and in some applications you could never ever use an ORM.
It is rather clear that I am the only one in this discussion who recently had to write code that synchronizes data that I do not control with an SQL database. Everything is easy in toy applications, where you have your USERS table with First_Name and Last_Name.
Now every system design interview expects you to build some monstrous stack with layers of caching and databases for a hypothetical 1M DAU (daily active users) app.
As a practitioner we subconciously optimise for "beauty", in maths, physics or dev. Most hackers are self-motivated, by that beauty, not by 40k ork style functional design.
Is this targeted at startup bros with an MVP and a dream ?
In almost any other scenario I feel the author is being intentionally obtuse about much of the reality surrounding technology decisions. An engineer operating a linux box running postgres & redis (or working in an environment with this approach) would become increasingly irrelevant & would certainly earn far less than the engineer operating the other. An engineering department following "complexity is not a virtue" would either struggle to hire or employ engineers considered up-to-date in 2006.
Management & EXCO would also have different incentives, in my limited observations I would say that middle and upper management are incentivised to increase the importance of thier respective verticals either in terms of headcount, budget or tech stack.
Both examples achieve a similar outcome except one is : scalable, fault tolerant, automated and the other is at best a VM at Hetzner that would be swiftly replaced should it have any importance to the org, the main argument here (and in the wild) seems to be "but its haaaard" or "I dont want to keep up with the tech"
KISS has a place and I certainly appreciate it in the software I use and operating systems I prefer but lets take a moment to consider the other folks in the industry who aren't happy to babysit a VM until they retire (or become redundant) before dispensing blanket advice like we are all at a 2018 ted talk . Thanks for coming to my ted talk
While you you're making good points, this shows that engineers and industry intentionally make work more complex than necessary in order to justify higher prices for labor. This is not so uncommon in today's economy, especially white collar and regulated work that most people don't understand, but worth thinking about regardless.
To be fair, it's hard to imagine economy and civilization crashing hard enough to force us to be more efficient. But who knows.
Job offers require experience in technologies that you won't ever need building solo project. I'm not surprised when those big scale technologies get shoehorned into small project for the sake of learning, showcasing "look I know that one" etc.
Only totally missing this point could explain why someone would make this hyperbole rant page
Funny, from the title I was expecting a productivity-adjacent "What have you even built?" article.
Except it's really a "What over-engineered monstrosity have you built?" in the theme of "choose boring technology"
p.s. MariaDB (MySQL fork) is technically older and more boring than PostgreSQL so they're both equally valid choices. Best choice is ultimately whatever you're most familiar with.
PostgreSQL from 1996, based on Postgres95 from 1995, based on POSTGRES from 1989, based on INGRES from 1974(?).
I wonder if any lines of 1970's or at least 1980's code still survive in some corner of the PostgeSQL code base or if everything has been rewritten at least once by now? Must have started out in K&R C, if it was even C?
We don't have the complete version history of postgres, so that's not easy to know. There definitely are still lines from Postgres95 that haven't been changed since the initial import into our repository.
Somewhere there's a CVS repository with some history from before the import into the current repository, but unfortunately there's a few years missing between that repository and the initial import. I've not done the work to analyze whether any lines from that historical repo still survive.
Once you have a service that has users and costs actual money, while you don’t need to make it a spaghetti of 100 software products, you need a bit of redundancy at each layer — backend, frontend, databases, background jobs — so that you don’t end up in a catastrophic failure mode each time some piece of software decides to barf.
Yes, but if your web server goes down for whatever reason, you’d rather have some more for your load balancer to round robin. Things like physical host dying are not exactly unheard of. Same with DB, once you take money, you want that replication and quick failover and offsite backup.
The problem with doing things the sensible way (eschewing microservices and k8s when you work on projects that aren't hyperscale) is that you end up missing opportunities later on because recruiters will filter you because you can't meaningfully respond to the question about “how experienced you are with micro service architecture”. Granted I may have dodged a bullet by not joining a company with 50 engineers that claim to replicate Google's practices (most of which are here to make sure tens of thousands of engineers can work efficiently together), but still someone gets to pay the bill at the end of the month…
Are you doing software for money? Because not having Kubernetes in the project will stop you from receiving money. Someone please create with one of these smart AI tools the ultimate killer app: Kubernetes+crypto+AI+blockchain+Angular+Redux+Azure (Working only in Chrome browser).
That's already a preset in claude - use the /reddit-recommends-stack command. It doesn't bother to understand and modify your existing code, just completely rewrites it every time for speed and ease of vibe.
it's a web page on a subdomain of his existing personal website, so probably didn't take him much time at all, probably about as fast as it would take to write up the text in a word document and then farting around with the styling and javascript a bit.
What the hell have you built? Turns out a pretty straightforward service.
That diagram is just aws, programming language, database. For some reason hadoop I guess. And riak/openstack as redundant.
It just seems like pretty standard stuff with some seemingly small extra parts because that make me think that someone on the team was familiar with something like ruby, so they used that instead of using java.
"Why is Redis talking to MongoDB" It isn't.
"Why do you even use MongoDB" Because that's the only database there, and nosql schemaless solutions are faster to get started... because you don't have to specify a schema. It's not something I would ever choose, but there is a reason for it.
"Let's talk about scale" Let's not, because other than hadoop, these are all valid solutions for projects that don't prioritize scale. Things like a distributed system aren't just about technology, but also data design that aren't that difficult to do and are useful for reasons other thant performance.
"Your deployment strategy" Honestly, even 15 microservices and 8 databases (assuming that it's really 2 databases across multiple envs) aren't that bad. If they are small and can be put on one single server, they can be reproduced for dev/testing purposes without all the networking cruft that devops can spend their time dealing with.
This comment makes this thread a great time capsule. Given that the website is now over 10 years old, it perfectly illustrates how much 'best practices' and architectural complexity (and cloud bills) have changed since then.
I was there before 10 years ago. I remember the pain in the ass that was hosting your own web server and own hardware, dealing with networking issues with cisco switches and thinking about getting a ccna. I remember the days of trying to figure out php and ranodm ass modules or how python and wsgi fit together on a slow ass windows machine instead of just spinning up an app and doing network calls using a spa.
Have you guys just forgotten all the enterprise crap that existed? Have you guys forgotten before that how things like compilers (ones you had to pay exorbintant amounts of money for) and different architectures were the headaches?
It's been two steps forward, one steps back, but we're still way better off.
Yes, people bring in k8s because they want to resume build and it goes poorly, but I've also used k8s in my personal setup that was much easier than the poor man's version I had of it.
All of this is just rose-tinted glasses, and people throwing the baby out with the bathwater. Just because some people have bad experiences with microservices because people don't often do them right, people just write them off completely.
I know people who can wrangle k8s and set up rules in whatever it's called to spin up and down the whole kaboodle of services effortlessly. It's like they know a whole level of programming I'm not familiar with, at all. I know the dotnet stuff pretty well, after many years fiddling with it. I do stuff in dotnet now I didn't have even terminology to talk about before. What they do in k8s and friends reminds me of that.
I personally don't care for it and if I design something I make it so it avoids that stuff if I can at all help it. But I've come to see that it can have real value.
The thing is though, that then you really need someone to be that very competent ops person. If you're a grug like me, you don't get many shots to be good at something. I probably don't have the years in me to be good at both ops and "pure" programming.
So if you are a startup and you're not some kind of not only very smart but smart, fast and with taste, maybe pick your battles.
If you are great at the ops side, ok, maybe design it from that perspective and hire a bunch of not-made-of-unobtainium regular middle-of-the-road coders to fill in what the microservices and stuff should contain and manage those. This requires money for a regular hiring budget. (Or you are supersmart and productive and "play pretend enterprise" with all roles yourself. But I have never seen such a person.)
Or focus on a tight design which can run without any of that, if you come more from the "I'm making a single program" part of the world.
Tinkering syndrome can strike in any kind of design, so you need personal maturity whatever path you choose.
No it doesn't. The fact that the article had to say "Maybe Redis for caching" because Postgres can't handle caching at scale shows that Postgres is not a perfect solution. Choosing an alternative database that can do everything you need means simplifying your architecture in the spirit of the article (not to say that MariaDB specifically is the right choice here, I'm not familiar enough with it to comment on that).
Which is the exact point the article is making. You don't have scale. You don't need to optimize for scale. Just use Postgres on its own, and it'll handle the scale you need fine.
It's sure a corny stance to hold if you're navigating an infrastructure nightmare daily, but in my opinion, much of the complexity addresses not technical, but organisational issues: You want straightforward, self-contained deployments for one, instead of uploading files onto your single server. If the process crashes or your harddisk dies, you want redundancy so even those twelve customers can still access the application. You want a CI pipeline, so the junior developer can't just break prod because they forgot to run the tests before pushing. You want proper secret management, so the database credentials aren't just accessible to everyone. You want a caching layer, so you're not surprised by a rogue SQL query that takes way too long, or a surge of users that exhaust the database connections because you never bothered to add proper pooling.
Adding guardrails to protect your team from itself mandates some complexity, but just hand-waving that away as unnecessary is a bad answer. At least if you're not working as part of a team.
>It's sure a corny stance to hold if you're navigating an infrastructure nightmare daily, but in my opinion, much of the complexity addresses not technical, but organisational issues: You want straightforward, self-contained deployments for one, instead of uploading files onto your single server ...
You can get all that with a monolith server and a Postgres backend.
With time, I discovered something interesting: for us, techies, using container orchestration is about reliability, zero-downtime deployments, limiting blast radius etc.
But for management, it's completely different. It's all about managing complexity on an organizational level. It's so much easier to think in terms "Team 1 is in charge of microservice A". And I know from experience that it works decently enough, at least in some orgs with competent management.
It’s not a management thing. I’m an engineer and I think it’s THE main advantage micro services actually provide: they split your code hard and allow a team to actually get ownership of the domain. No crossing domain boundaries, no in between shared code, etc.
I know: it’s ridiculous to have an architectural barrier for an organizational reason, and the cost of a bad slice multiplies. I still think in some situations, that is better to the gas-station-bathroom effect of shared codebases.
I don't see why it's ridiculous to have an architectural barrier for org reasons. Requiring every component to be behind a network call seems like overkill in nearly all cases, but encapsulating complexity into a library where domain experts can maintain it is how most software gets built. You've got to lock those demons away where they can't affect the rest of the users.
The problem is, that library usually does not provide good enough boundaries. C library can just shit over your process memory. Java library can cause all the hell over your objects with reflection, can just call System.exit(LOL). Minimal boundary to keep demons at bay is process boundary and you need some way for processes to talk to each other. If you're separating components into processes, it's very natural to put them to different machines, so you need your IPC to be network calls. One more step and you're implementing REST, because infra people love HTTP.
> it's very natural to put them to different machines, so you need your IPC to be network calls
But why is this natural? I’m not saying we shouldn’t have network RPC, but it’s not obvious to me that we should have only network RPC when there are cheap local IPC mechanisms.
And then we're back to 1980's UNIX process model before wide adoption of dynamic loading, but because we need to be cool we call them microservices.
>Requiring every component to be behind a network call seems like overkill in nearly all cases
That’s what I was referring to, sorry for the inaccurate adjective.
Most people try to split a monolith in domains, move code as libraries, or stuff like that - but IMO you rarely avoid a shared space importing the subdomains, with blurry/leaky boundaries, and with ownership falling between the cracks.
Micro services predispose better to avoid that shared space, as there is less expectation of an orchestrating common space. But as you say the cost is ridiculous.
I think there’s an unfilled space for an architectural design that somehow enforces boundaries and avoids common spaces as strongly as microservices do, without the physical separation.
How about old fashioned interprocess communication? You can have separate codebases, written in different languages, with different responsibilities, running on the same computer. Way fewer moving parts than RPC over a network.
That was the original Amazon motivation, and it makes sense. Conway's law. A hundred developers on a single codebase needs significant discipline.
But that doesn't warrant its use in smaller organizations, or for smaller deployments.
Libraries do exist, unfortunely too many developers apparently never learn about code modularity.
And then you have some other group of people that sees all the redundancy and decides to implement a single unified platform on which all the microservices shall be deployed.
As soon there is more than one container to organise, it becomes a management task for said techies.
Then suddenly one realises that techies can also be bad at management.
Management of a container environment not only requires deployment skills but also documentational and communication skills. Suddenly it’s not management rather the techie that can't manage their tech stack.
This pointing of fingers at management is rather repetitive and simplistic but also very common.
> using container orchestration is about reliability, zero-downtime deployments
I think that's the first time I've heard any "techie" say we use containers because of reliability or zero-downtime deployments, those feel like they have nothing to do with each other, and we've been building reliable server-side software with zero-downtime deployments long before containers became the "go-to", and if anything it was easier before containers.
It would be interesting to hear your story, mine is that containers in general start an order of magnitude faster than vms (in general! we can easily find edge cases) and hence e.g. horizontal scaling is faster. You say it was easier before containers, I say k8s in spite of its complexity is a huge blessing as teams can upgrade their own parts independently and do things like canary releases easily with automated rollbacks etc. It's so much faster than VMs or bare metal (which I still use a lot and don't plan to abandon anytime soon but I understand their limitations).
You don't. When your server crashes, your availability is zero. It might crash because of a myriad of reasons; at some times, you might need to update the kernel to patch a security issue for example, and are forced to take your app down yourself.
If your business can afford irregular downtime, by all means, go for it. Otherwise, you'll need to take precautions, and that will invariably make the system more complex than that.
>You don't. When your server crashes, your availability is zero.
As your business needs grow, you can start layering complexity on top. The point is you don't start at 11 with a overly complex architecture.
In your example, if your server crashes, just make sure you have some sort of automatic restart. In practice that may mean a downtime of seconds for your 12 users. Is that more complexity? Sure - but not much. If you need to take your service down for maintenance, you notify your 12 users and schedule it for 2am ... etc.
Later you could create a secondary cluster and stick a load-balancer in-front. You could also add a secondary replicated PostgreSQL instance. So the monolith/postgres architecture can actually take you far as your business grows.
Changing/layering architecture adds risk. If you've got a standard way of working you can easily throw in on day one whose fundamentals then don't need to be changed for years, that's way lower risk, easier, faster
It is common for founding engineers to start with a preexisting way of working that they import from their previous more-scaled company, and that approach is refined and compounded over time
It does mean starting with more than is necessary at the start, but that doesn't mean it has to be particularly complex. It means you start with heaps of already-solved problems that you simply never have to deal with, allowing focus on the product goals and deep technical investments that need to be specific to the new company
Yeah theoretically that sounds good. But I had more downtime through cloud outages, Kubernetes updates then I ever had using simple linux server with nginx on hardware; most outages I had on linux was with my VPS was due to Digital Ocean issue with their own hardware failures. AWS was down not so long ago.
And if certain servers do get very important you just run a backup server with VPS and switch over DNS (even if you keep a high ttl, most servers update within minutes nowadays) or if you want to be fancy throw a load balancer in front of it.
If you solve issues in a few minutes people are always thankful, and most dont notice. With complicated setups it tends to take much longer before figuring out what the issue is in the first place.
You can have redundancy with a monolithic architecture. Just have two different web server behind a proxy, and use postgres with a hot standby (or use a managed postgres instance which already has that).
Well, load balancers are an option.
They are: But now you've expanded the definition of "a single monolith with postgres" to multiple replicas that need to be updated in sync, you've suddenly got shared state across multiple, fully isolated processes (in the best case) or running on multiple nodes (in the worst case), and a myriad of other subtle gotchas you need to account for, which raises the overall complexity considerably.
Postgres.
I don't see how you solve this with microservices. You'll have to take down your services in these situations too, a monolith vs microservices soup has the exact same problem.
Also in 5 years of working on both microservicy systems and monoliths, not once has these things you describe been a problem for me. Everything I've hosted in Azure has been perfectly available pretty much all the time unless a developer messed up or Azure itself has downtime that would have taken down either kind of app anyway.
But sure let's make our app 100 times more complicated because maybe some time in the next 10 years the complexity might save us an hour of downtime. I'd say it's more likely the added complexity will cause more downtime than it saves.
> I don't see how you solve this with microservices.
I don't think I implied that microservices are the solution, really. You can have a replicated monolith, but that absolutely adds complexity of its own.
> But sure let's make our app 100 times more complicated because maybe some time in the next 10 years the complexity might save us an hour of downtime.
Adding replicas and load balancing doesn't have to be a hundred times more complex.
> I'd say it's more likely the added complexity will cause more downtime than it saves.
As I said before, this is an assessment you will need to make for your use case, and balance uptime requirements against your complexity budget; either answer is valid, as long as you feel confident with it. Only a Sith believes in absolutes.
In this job market, how am I supposed to get hired without the latest buzzwords on my resume? I can’t just have monolithic server and Postgres!
(Sarcasm)
You're sarcastic, but heavens above, have I had some cringe interviews in my last round of interviews, and most of the absurdity came from smaller start-ups too
Indicating sarcasm ruins the sarcasm
Sadly, it is missed on a lot of people. Without the disclaimer, I would then have a bunch of serious replies “educating” me about my life choices.
Squint or pretend it’s not there. This crowd is hit or miss on picking it up o’ naturál.
If you don't make it clear people will think you're serious.
Sarcasm doesn't work online, If I write something like "Donald Trump is the best president ever" you don't have any way of knowing whether I'm being sarcastic or I'm just really really stupid. Only people who know me can make that judgement, and basically nobody on here knows me. So I either have to avoid sarcasm or make it clear that I'm being sarcastic.
Context of the comment would tell
Most times it isn't complexity that bites, it is the brittleness. It's much easier to work with bad but well documented solution(e.g github actions) where all the issues have been faced by users and workaround is documented by community, rather than rolling out your own(e.g. simple script based CI/CD).
I'm not sure why your architecture needs to be complex to support CI pipelines and proper workflow for change management.
And some of these guidelines have grown into satus quo common recipes. Take your starting database for example, the guideline is always "sqlite only for testing, but for production you want Postgres" - it's misleading and absolutely unnecessary. These defaults have also become embedded into PaaS services e.g. the likes of Fly or Scaleway - having a disk attached to a VM instance where you can write data is never a default and usually complicated or expensive to setup. All while there is nothing wrong with a disk that gets backed up - it can support most modern mid sized apps out there before you need block storage and what not.
I've been involved in bootstrapping the infrastructure for several companies. You always start small, and add more components over time. I dare say, on the projects I was involved, we were fairly successful in balancing complexity, but some things really just make sense. Using a container orchestration tool spares you from tending to actual Linux servers, for example, that need updates and firewalls and IP addresses and managing SSH keys properly. The complexity is still there, but it shifts somewhere else. Looking at the big picture, that might mean your knowledge requirements ease on the systems administration stuff, and tighten on the cloud provider/IaC end; that might be a good trade off if you're working with a team of younger software engineers that don't have a strong Linux background, for example, which I assume is pretty common these days.
Or, consider redundancy: Your customers likely expect your service to not have an outage. That's a simple requirement, but very hard to get right, especially if you're using a single server that provides your application. Just introducing multiple copies of the app running in parallel comes with changes required in the app (you can't assume replica #1 will handle the first and second request—except if you jump through sticky session hoops, which is a rabbit hole on its own), in your networking (HTTP requests to the domain must be sent to multiple destinations), and your deployment process (artefacts must go to multiple places, restarts need to be choreographed).
Many teams (in my experience) that have a disdain for complex solutions will choose their own, bespoke way of solving these issues one by one, only to end up in a corner of their own making.
I guess what I'm saying is pretty mundane actually—solve the right problem at the right time, but no later.
Having recently built a Django app, I feel like I need to highlight the issues coming with using sqlite. Once you get into many to many relationships in your model, suddenly all kinds of things are not supported by sqlite, while they are when you use postgres. This also shows, that you actually cannot (!) use sqlite for testing, because it behaves significantly differently from postgres.
So I think now: Unless you have a really really simple model and app, you are just better off simply starting postgres or a postgres container.
My comment is that this is a choice that should be made for each project depending on what you’re building - does your model require features not supported by SQLite or Postgres etc.
> Unless you have a really really simple model and app
And this is the wrong conclusion. I have a really really complex model that works just fine with SQlite. So it’s not about how complex the model is, it’s about what you need. In the same way in the original post there were so many storage types, no doubt because of such “common knowledge guidelines”
OK, well, you don't always know all requirements ahead of time. When I do find out about them later on, I don't want to have to switch database backend then. For example initially I thought I would avoid those many to many relationships all together ... But turned out to be the most fitting way to do what I needed to do in Django.
I guess you could say "use sqlite as long as it lends itself well to what you are doing", sure. But when do you switch? At the first inconvenience? Or do you wait a while, until N inconveniences have been put into the codebase? And not to forget, the organizational resistance to things like changing the database. People not in the know (mangement usually) might question your plan to switch the database, because this workaround for this small little inconvenience _right now_ seems much less work and less risky for production ... Before you know it, you will have 10 workarounds in there, and sunken cost fallacy.
I may be exaggerating a little bit, but it's not like this is a crazy to imagine picture I am painting here.
Years ago we had someone who wanted to make sure that two deployments were mutually exclusive. Can’t recall why now, but something with a test environment and bootstrapping so no redundancy.
I just set one build agent up with a tag that both plans required. The simplest thing that could possibly work.
I think that's a slightly different set of things to what OP is complaining about though. They're much more reasonable, but also "outside" of the application. Having secret management or CI (pretty much mandatory!) does not dictate the architecture of the application at all.
(except the caching layer. Remember the three hard problems of computer science, of which cache invalidation is one.)
Still hoping for a good "steelman" demonstration of microservices for something that isn't FAANG-sized.
> Having secret management or CI (pretty much mandatory!) does not dictate the architecture of the application at all.
Oh, it absolutely does. You need some way to get your secrets into the application, at build- or at runtime, for one, without compromising security. There's a lot of subtle catches here that can be avoided by picking standard tooling instead of making it yourself, but doing so definitely shapes your architecture.
It really shouldn't. Getting the secrets in place should be done by otherwise unrelated tooling. Your apps or services should rely on the secrets being in place at start time. Often it is a matter of rendering a file at deployment time and the jobs of putting the secrets there is the job of the CI, and CI invoked tools, not the job of the service itself.
Cache invalidation is replacing one logical thing with a new version of the same logical thing. So technically that’s also naming things. Doubly so when you put them in a kv store.
That angle seems potentially insightful, and I'm going to have to think about it, but to me, cache invalidation seems more like replacing one logical thing with nothing. It may or may not get replaced with a new version of the same logical thing later if that's required.
To me, cache invalidation is not strictly about either replacing or removing cache entries.
Rather, cache invalidation is the process of determining which cache entries are stale and need to be replaced/removed.
It gets hairy when determining that depends on users, user group memberships AND per-user permissions, access TTL, multiple types of timestamps and/or revision numbering, and especially when the cache entries are composite as in contain data from multiple database entities, where some are e.g. representing a hierarchy and may not even have direct entity relationships with the cached data.
Yes—and, in many cases, ensuring that you don't use entries which become outdated during your computation.
A bit of TOCTOU sprinkled in the cache integration ensures a fun day at the races!
> You want a CI pipeline, so the junior developer can't just break prod because they forgot to run the tests before pushing.
Make them part of your build first. Tagging a release? Have a documented process (checklist) that says 'run this, do that'. Like how in a Java Maven build you would execute `mvn release:prepare` and `mvn release:perform`, which will execute all tests as well as do the git tagging and anything else that needs doing.
Scale up to a CI pipeline once that works. It is step one for doing that anyway.
Why not do a CI pipeline from the beginning instead of relying on trust that no one ever forgets to run a check, considering adding CI is trivial with gitlab or github.
Because it adds friction, and whoever introduces that CI pipeline will be the one getting messages from annoyed developers, saying "your pipeline isn't working again". It's definitely a source of complexity on its own, so something you want to consider first.
I'm aware of how much overhead CI pipelines can be, especially for multiple platforms and architectures, but at the same time developing for N>1 developers without some sort of CI feels like developing without version control: it's like coming to work without your trousers on.
Yeah, that was my entire point really—there's some complexity that's just warranted. It's similar to a proper risk assessment analysis: The goal isn't to avoid all possible risks, but accepting some risk factors as long as you can justify them properly.
As long as you're pragmatic and honest with what you need from your CI setup, it's okay that it makes your system more complex—you're getting something in return after all.
U agree it adds a bit of complexity, but all code adds complexity.
Maybe interacted with CIs too much and it's Stockholm syndrome, but they are there to help tame and offload complexity, not just complexity for complexity'a sake
> they are there to help tame and offload complexity, not just complexity for complexity'a sake
Theoretically. Practically, you're hunting for the reason why your GitHub token doesn't allow you to install a private package from another repository in your org during the build, then you learn you need a classic personal access token tied to an individual user account to interact with GitHub's own package registry, you decide that that sounds brittle and after some pondering, you figure that you can just create a GitHub app that you install in your org and write a small action that uses the GitHub API to create an on-demand token with the correct scopes, and you just need to bundle that so you can use it in your pipeline, but that requires a node_modules folder in your repository, and…
Oh! Could it be that you just added complexity for complexity's sake?
Conway's Law:
> Organizations which design systems... are constrained to produce designs which are copies of the communication structures of these organizations.
Tell this to a company of 4 engineers that created a system with 40 microservices, deployed as one VM image, to be running on 1 machine.
They wouldn't have time to hear it because they'd be trying to fix their local dev environment.
I worked for a company that had done pretty much that - not fun at all (for extra fun half the microservices where in a language only half the dev team had even passing familiarity with).
You need someone in charge with "taste" enough to not allow that to happen or it can happen.
LOL, perhaps the communication structure there was "silent, internalised turmoil".
Probably =), or Conway's law was always about the lower ends of the communication nodes in a company graph. I think it's time we also always include the upper ends of our cognitive limits of multitasking when we design systems in relation to organization structures.
Lesser known trick: reorganize your teams so the code isn’t batshit.
That does imply that the people in the business with the authority to do that know how to do that and they in my experience don't - they can't solve a problem they don't understand and are unwilling to delegate it to someone who can understand it.
The same pattern repeats across multiple companies - it comes down to trust and delegation, if the people with the power are unwilling to delegate bad things happen.
> If the process crashes or your harddisk dies, you want redundancy so even those twelve customers can still access the application.
That's fine, 6 of them are test accounts :-)
> It's sure a corny stance to hold if you're navigating an infrastructure nightmare daily, but in my opinion, much of the complexity addresses not technical, but organisational issues
If you have an entire organisation dedicated to 6 users, those users had better be ultra profitable.
> If the process crashes or your harddisk dies, you want redundancy so even those twelve customers can still access the application
Can be done simply by a sole company owner; no need for tools that make sense in an organisation (K8s, etc)
> You want a CI pipeline, so the junior developer can't just break prod because they forgot to run the tests before pushing.
A deployment script that includes test runners is fine for focused product. You can even do it using a green/blue strategy if you can afford the extra $5-$10/m for an extra VPS.
> You want proper secret management, so the database credentials aren't just accessible to everyone.
Sure, but you don't need to deploy a full-on secrets-manager product for this.
> You want a caching layer, so you're not surprised by a rogue SQL query that takes way too long, or a surge of users that exhaust the database connections because you never bothered to add proper pooling.
Meh. The caching layer is not to protect you against rogue SQL queries taking too long; that's not what a cache is for, after all. As for proper pooling, what's wrong with using the pool that came with your tech stack? Do you really need to spend time setting up a different product for pooling?
> dding guardrails to protect your team from itself mandates some complexity, but just hand-waving that away as unnecessary is a bad answer.
I agree with that; the key is knowing when those things are needed, and TBH unless you're doing a B2C product, or have an extremely large B2B client, those things are unnecessary.
Whatever happened to "profile, then optimise"?
Sure, but most of that doesn't make it into the final production thing on the server. CI? Nope. Tests? Nope. The management of the secrets (not the secrets themselves)? Nope. Caching? OK that one does. Rate limits? Maybe, but could be another layer outside the normal services' implementation.
I feel like sometimes it’s a form of procrastination.
There are things we don’t want to do (talk to costumers, investors, legal, etc.), so instead we do the fun things (fun for engineers).
It’s a convenient arrangement because we can easily convince ourselves and others that we’re actually being productive (we’re not, we’re just spinning wheels).
My first 5 years or so of solo bootstrapping were this. Then you learn that if you want to make money you have to prioritise the right things and not the fun things.
I'm at this stage. We have a good product with a solid architecture but only a few paying clients due to a complete lack of marketing. So I'm now doing the unfun things!
It's the natural evolution to becoming a fun addict.
Unless you actively push yourself to do the uncomfortable work every day, you will always slowly deteriorate and you will run into huge issues in the future that could've been avoided.
And that doesn't just apply to software.
I see your point. But accidental complexity is the most uncomfortable work there is to me. Do programmers really find so much fun in creating accidental complexity?
Removing it, no matter whether I created it myself, sure, that can be a hard problem.
I've certainly been guilty creating accidental complexity as a form of procrastrination I guess. But building a microservices architecture is not one of these cases.
FWIW, the alternative stack presented here for small web sites/apps seems infinitely more fun. Immediate feedback, easy to create something visible and change things, etc.
Ironically, it could also lead to complexity when in reality, there is (for example) an actual need for a message queue.
But setting up such stuff without a need sounds easier to avoid to me than, for example, overgeneralizing some code to handle more cases than the relevant ones.
When I feel there are customer or company requirements that I can't fulfill properly, but I should, that's a hard problem for me. Or when I feel unable to clarify achievable goals and communicate productively.
But procrastrination via accidental complexity is mostly the opposite of fun to me.
It all comes back when trying to solve real problems and spending work time solving these problems is more fun than working on homemade problems.
Doing work that I am able to complete and achieving tangible results is more fun than getting tangled in a mess of unneeded complexity. I don't see how this is fun for engineers, maybe I'm not an engineer then.
Over-generalization, setting wrong priorities, that I can understand.
But setting up complex infra or a microservices architecture where it's unneeded, that doesn't seem fun to me at all :)
I 100% agree.
Normally the impetus to overcomplicate ends before devs become experienced enough to be able to even do such complex infra by themselves. It often manifests as complex code only.
Overengineered infra doesn't happen in a vacuum. There is always support from the entire company.
> Do programmers really find so much fun in creating accidental complexity?
I believe only bad (inexperienced) programmers do.
"Do programmers really find so much fun in creating accidental complexity?"
I certainly did for a number of years - I just had the luck that the cool things I happened to pick on in the early/mid 1990s turned out to be quite important (Web '92, Java '94).
Now my views have flipped almost completely the other way - technology as a means of delivering value.
Edit: Other cool technology that I loved like Common Lisp & CLOS, NeWS and PostScript turned out to be less useful...
I see what you mean, sometimes "accidental complexity" can also be a form of getting to know a technology really well and that can be useful and still fun. Kudos for that :)
Oh yes I loved building stuff with all these technologies mostly for my own entertainment - fortunately I was in academia so could indulge myself. ;-)
> a fun addict
Interesting term. Probably pretty on-point.
I’ve been shipping (as opposed to just “writing”) software for almost my entire adult life.
In my experience, there’s a lot of “not fun” stuff involved in shipping.
I like your idea of doing some amount of uncomfortable work every day, internalizing it until it becomes second nature. Any tips on how to start? (other than just do it) :)
You know what, you're right.
I should get off HN, close the editor where I'm dicking about with HTMX, and actually close some fucking tickets today.
Right after I make another pot of coffee.
...
No. Now. Two tickets, then coffee.
Thank you for the kick up the arse.
It's also virtue signaling of what a great engineer they are. Have you wired together ABC with XYZ? No? Well I did... blah blah blah
Is it really for "fun"?
Or is it to satisfy the ideals of some CTO/VPE disconnected from the real world that wants architecture to be done a certain way?
I still remember doing systems design interviews a few years ago when microservices were in vogue, and my routine was probing if they were ok with a simpler monolith or if they wanted to go crazy on cloud-native, serverless and microservices shizzle.
It did backfire once on a cloud infrastructure company that had "microservices" plastered in their marketing, even though the people interviewing me actually hated it. They offered me an IC position (which I told them to fuck off), because they really hated how I did the exercise with microservices.
Before that, it almost backfired when I initially offered a monolith for a (unbeknownst to me) microservice-heavy company. Luckily I managed to read the room and pivot to microservice during the 1h systems design exercise.
EDIT: Point is, people in positions of power have very clear expectations/preferences of what they want, and it's not fun burning political capital to go against those preferences.
I dont quite follow. I understand mono vs micro services, and in the last 3 weeks I had to study for system design and do the interviews to get offers. Its a tradeoff, and the system design interview is meant to see if one understands how systems can scale to hypothetical (maybe unrealistic) high loads. In this context the only reason for a microservice is independent scaling and with that also fault tolerance if an unimportant service goes down. But its really the independent scaling. One would clearly say that a monolith is good for the start because it offer simplicity or low complexity but it doesn't scale well to the hypothetical of mega scale.
In practice, it seems not to be a tradeoff but an ideology. Largely because you can't measure the counter-factual of building the app the other way.
It's been a long time since I've done "normal" web development, but I've done a number of high-performance or high-reliability non-web applications, and I think people really underestimate vertical scaling. Even back in the early 2000s when it was slightly hard to get a machine with 128GB of RAM to run some chip design software, doing so was much easier than trying to design a distributed system to handle the problem.
(we had a distributed system of ccache/distcc to handle building the thing instead)
Do people have a good example of microservices they can point us to the source of? By definition it's not one of those things that makes much sense with toy-sized examples. Things like Amazon and Twitter have "micro" services that are very much not micro.
I dont disagree, but you can horizontally scale a monolith too, no? So scaling vert vs horiz is independent of microservices, its just that separating services allows you to be precise with your scaling. Ie you can scale up a compute heavy micorservice by 100x, the upload service by 10x but keep the user service at low scale. I agree that one can vert scale, why not. And I also agree that there are probably big microservices. At my last workplace, we also had people very bullish on microservices but for bad reasons and it didn't make sense, ie ideology.
Can you elaborate on the fault tolerance advantage of micro services?
For context, my current project is a monolith web app with services being part of the monolith and called with try/catch. I can understand perhaps faster, independent, less risky recovery in the micro services case but don’t quite understand the fault tolerance gain.
Im no world leading expert but as far as I understand, coupled with events, if an unimportant service goes offline for 5 min (due to some crash, ie "fault"), its possible to have a graceful degradation, meaning the rest of the system still works, maybe with reduced ability. With events, other systems simply stop receiving events from the dead service. I agree you can achieve a lot of this also in a monolith with try catch and error handling, but I guess there is an inherent decoupling in having different services run on separate nodes.
The point is that it doesn't matter which is better or worse for the case, or if you know the pros/cons of each:
In those interviews (and in real work too) people still want you skewing towards certain answers. They wanna see you draw their pet architecture.
And it's the same thing in the workplace.
I fully agree on workplace politics, but for system design interviews, are you not also just supposed to ask your interviewer, ie give them your premises and if they like your conclusions? I also understand that some companies and their interviews are weird, but thats okay too, no? You just reject them and move on.
If there's a big enough bias, the questions become entirely about finding that bias. And on 90% of cases the systems design questions are about something they designed in-house, and they often don't have a lot of experience as well.
Also: if there's limited knowledge on the interviewer side, an incorrect answer to a question might throw off a more experienced candidate.
It's no big deal but it becomes more about reading the room and knowing the company/interviewers than being honest in what you would do. People don't want to hear that their pet solution is not the best. Of course you still need to know the tech and explain it all.
I worked for a company once where the CEO said I need to start using Kubernetes. Why? We didn't really have any pressing use cases / issues that were shouting out for Kubernetes at all.
His reasoning was all the big players use it, so we should be too...
It was literally a solution looking for a problem. Which is completely arse backwards.
An improved CV, lets be honest most stuff is boring projects that could even be built with 1990's technology, distributed systems is not something that was invented yesterday.
However having in the CV any of those items from left side in the deployment strategy is way cooler than mentioning n-tier architecture, RPC (regardless how they are in the wire), any 1990's programming language, and so forth.
A side effect from how hiring works so badly in our industry, it isn't enough to know how master a knife to be a chef, it must be a specific brand of knife, otherwise the chef is not good enough for the kitchen.
This is also how you can identify decent places to work at: look for job postings that emphasize you aren't expected to already know the language.
For example, in the recent "who's hiring" thread, I saw at least two places where they did that: Duckduckgo (they mention only algorithms and data structures and say "in case you're curious, we use Perl") and Stream (they offer a 10-week intro course to Go if you're not already familiar with it). If I remember correctly, Jane Street also doesn't require prior OCaml experience.
The place where I work (bevuta IT GmbH) also allowed me to learn Clojure on the job (but it certainly helped that I was already an expert in another Lisp dialect).
These hiring practices are a far cry from those old style job postings like "must have 10+ years of experience with Ruby on Rails" when the framework was only 5 years old.
To do that you need a mixture of elements: work in a somehow "exotic" language [1] and the company can afford to pay top-talent salary [2]
[1] all those examples check that box, but please let's not start a language war over this statement.
[2] for Jane Street I hear they do, DDG pays pretty well especially because it pay the same rate regardless where you are in the world, so it's a top-talent salary for many places outside SV.
Sounds like the best type of place to work for me. Instead of being a replaceable cog in a meat grinder that doesn't even pay well, working with boring tech, you get to work with talented people in an actually interesting language and get decently paid.
And best of all, you don't feel the need to keep chasing after the latest hype just to keep your CV relevant.
This comment sums up my view as well, but I must confess that I’ve designed architectures more complex than necessary more than once, just to try new things and compare them with what I already knew. I just had to know!
Any minute you spend in a job interview defending your application server + Postgres solution, is a minute that you will lack to talk of follow up questions about the distributed system that interviewer was expecting.
Yes, it’s nonsense, stirring up a turbulent slurry of eventually consistent components for the sake of supporting hundreds of users per second, it’s also the nonsense that you’re expected to say, just do it.
"Maybe Redis for caching".
Really that's going way too far - you do NOT need Redis for caching. Just put it in Postgres. Why go to this much trouble to put people in their place for over engineering then concede "maybe Redis for caching" when this is absolutely something you can do in Postgres. The author clearly cannot stop their own inner desire for overengineering.
A cache can help even for small stuff if there's something time-consuming to do on a small server.
Redis/valkey is definitely overkill though. A slightly modified memcached config (only so it accepts larger keys; server responses larger than 1MB aren't always avoidable) is a far simpler solution that provides 99% of what you need in practice. Unlike redis/valkey, it's also explicitly a volatile cache that can't do persistence, meaning you are disincentivized from bad software design patterns where the cache becomes state your application assumes any level of consistency of (including it's existence). If you aren't serving millions of users, stateful cache is a pattern best avoided.
DB caches aren't very good mostly because of speed; they have to read from the filesystem (and have network overhead), while a cache reads from memory and can often just live on the same server as the rest of the service.
I personally wouldn't like to put caching in Postgres, even though it would work at lower scales. But at that scale I don't really need caching anyway. Having the ephemeral data in a different system is more appealing to me as well.
The caching abstractions your frameworks have are also likely designed with something like Redis in mind and work with it out of the box. And often you can just start with an in-memory cache and add Redis later, if you need it.
>I personally wouldn't like to put caching in Postgres, even though it would work at lower scales.
Probably should stop after this line - that was the point of the article. It will work at lower scales. Optimize later when you actually know what to optimize.
My point is more that at that scale I'd try to avoid caching entirely. Unless you're doing analytical queries over large tables, Postgres is plenty fast without caching if you're not doing anything stupid.
Redis is the filler you shove in there when Postgres itself starts slowing down. Writing database queries that work and writing database queries that work efficiently are very different things.
It'll give you time to redesign and rebuild so Postgres is fast enough again. Then you can take Redis out, but once you've set it up you may as well keep it running just in case.
Postgres has support for an eventually consistent in-memory caching layer?
Ha ha nice one - when your startup is Facebook you'll need that, not for your 12 users.
The reason startups get to their super kubernetes 6 layers mega AWS powered ultra cached hyper pipelined ultra optimised web queued applicatyion with no users is because "but technology X has support for an eventually consistent in-memory caching layer!!"
What about when we launch and hit the front page of HN how will the site stay up without "an eventually consistent in-memory caching layer"?
The sentiment here is right, but redis does make a difference at scale. I built a web app this year on AWS lambda that had up to 1000/requests/second and at that scale, you can have trouble with Postgres, but redis handles it like it’s nothing.
I think that redis is a reasonable exception to the rule of ”don’t complicate things” because it’s so simple. Even if you have never used it before, it takes a few minutes to setup and it’s very easy to reason about, unlike mongodb or Kafka or k8s.
Postgres itself has no issue with 1000 simple requests per second. On normal notebook hardware you'll get easily several thousand requests per second if you're just looking up small amounts of data by the primary key. And that is without any optimization and with non-DB overhead included. The actual performance you can get is probably quite a bit higher, but I've seen ~4-6k requests per second on naive, unoptimized endpoints that just look up some data in Postgres.
> Postgres itself has no issue with 1000 simple requests per second
Postgres in isolation has no problem with 1000 RPS. But does your Postgres server have that ability? Your server is also handling more complex requests and maybe some writes and concurrent re-indexing.
Because they’re meeting the patients at their own level. Plus while using PG for everything is a currently popular meme on HN (and I am all for it), it’s not something you see all that often. An app server, a database and a cache is a pretty sensible and simple starting point.
Until you get to 100 test users. Then you need Kafka and k8.
That seemed odd to me too, they're talking about single server, which to me would mean running postgres on the application server itself.
In that scenario, the last thing you need is another layer between application and database.
Even in a distributed environment, you can scale pretty far with direct-to-database as you say.
Imho, if you can, use a fixed chunk of server memory directly for cache. That scales out with instances if/when you ever scale out
Sometimes you have to pick your poison when those with other agendas or just inexperience want to complicate things. Conceding that they can use Redis somehow might be enough to get them to stop blaming you for the 'out of date' architecture?
Or do not use caching at all until you need it
"Why is Redis talking to MongoDB?"
lol, In the diagram, Redis is not even talking with MongoDB
oh it is, look again.
just use the filesystem, it's superfast and reliable.
I love the fact that the author "wrote" this page with massive CSS framework (tailwind) and some sort of Javascript framework, with a bundler and obfuscator - instead of a plain, simple HTML page. Well played! :-)
Fair, the author's point would have been stronger if the page was made using just static HTML/CSS.
But I have to defend Tailwind, it's not a massive CSS framework, it just generates CSS utility classes. Only the utility classes you use end up in the output CSS.
React + Tailwind + bundler + googlefont + ... Yeah, humans are paradoxical
good point, got rid of google font. for the rest, it's just that the DX is unmatched to get something quick and dirty started up. (I _could_ have done a simple .html though). and I'll concede that react is overkill for this, I just wanted components + typescript to not waste time on silly vanilla js bugs
From the titular tweet (12 years already !): https://x.com/codinghorror/status/347070841059692545
Need to add (2013) to the title of this article
Thinking is scary. No one (among non-thinking colleagues) is going to criticize you for using de-facto standard services like kafka, mongo, redis, ecc... regardless of the nonsensical architecture you come up with.
Yes, I also put Redis in that list. You can cache and serve data structure in many other ways, for example replicate the individual features you need in you application instead of going the lazy route and another service to the mix. And don't get me started on Kafka... money thrown in the drain when a stupid grpc/whatever service would do.
Part of being an engineer is also selecting the minimum amount of components for your architecture and not being afraid of implementing something on your own if you only need 1 of 100s features that an existing product require.
> No one (among non-thinking colleagues) is going to criticize you for using de-facto standard services
Well put!
The cloud-native version of "Nobody gets fired for buying IBM"
EXACTLY.
> Add complexity only when you have proof you need it.
This does assume that said complexity can be added ad hoc later. Often earlier architecture choices make additions complex too or even prevent it entirely without a complete rewrite
So while the overall message is true there is some liberal use of simplification at play here too
In some cases a compromise can make sense. Eg use k8s but keep it simple within that - as vanilla as you can make it
I built a small simple page that I send to people when they start proposing crazy db architectures that people might like if they like this page:
https://nocommasql.com/
just a nit. it pollutes back button history when I expand content. took 9 presses of back button to return to HN.
Useful, but 10 years ago without JSONB in PG it wasn't really the answer to everything. But as of today, I am recommending PG to anyone that does not have a good reason or use case to NOT use it.
There is an argument I rarely ever see in discussions like this, which is about reducing the need for working memory in humans. I'm just in the mid thirties, but my ability to keep things in working memory is vastly reduced compared to my twenties. Might just be me who's not cut out for programming or system architecturing, but in my experience what is hard for me is often what is hard for others, they just either don't think about it or ignore it and push through keeping hidden costs alive.
My argument is this; even if the system itself becomes more complex, it might be worth it to make it better partitioned for human reasoning. I tend to quickly get overwhelmed and my memory is getting worse by the minute. It's a blessing for me with smaller services that I can reason about, predict consequences from, deeply understand. I can ignore everything else. When I have to deal with the infrastructure, I can focus on that alone. We also have better and more declarative tools for handling infrastructure compared to code. It's a blessing when 18 services doesn't use the same database and it's a blessing when 17 services isn't colocated in the same repository having dependencies that most people don't even identify as dependencies. Think law of leaky abstractions.
This is a good point - having your code broken up into standalone units that can fit into working memory has real benefits to the coder. I think especially with the rise of coding agents (which, like it or not, are here to stay and are likely going to increase in use over time), sections of code that can fit in a context window cleanly will be much more amenable to manipulation by LLMs and require less human oversight to modify, which may be super useful for companies that want to move faster than the speed of human programming will allow.
Oh my word Riak - I haven't seen that DB mentioned for years!
I totally get the point it makes. I remember many years ago we announced SocketStream at a HackerNews meet-up and it went straight to #1. The traffic was incredible but none of us were DevOps pros so I ended up restarting the Node.js process manually via SSH from a pub in London every time the Node.js process crashed.
If only I'd known about upstart on Ubuntu then I'd have saved some trouble for that night at least.
I think the other thing is worrying about SPOF and knowing how to respond if services go down for any reason (e.g. server runs out of disk space - perhaps log rotation hasn't been setup, or has a hardware failure of some kind, or the data center has an outage - I remember Linode would have a few in their London datacenter that just happened to occur at the worst possible time).
If you're building a side project I can see the appeal of not going overboard and setting up a Kubernetes cluster from the get-go, but when it is things that are more serious and critical (like digital infrastructure for supporting car services like remotely turning on climate controls in a car), then you design the system like your life depends on it.
I think remote climate controls in a car are an ideal use-case for a simpler architecture.
Consider WhatsApp could do 2M TCP connections on a single server 13 years ago, and Ford sells about 2M cars per year. Basic controls like changing the climate can definitely fit in one TCP packet, and aren't sent frequently, so with some hand-waving, it would be reasonable to expect a single server to handle all remote controls for a manufacturer for all cars from some year model.
Or maybe you could use wifi-direct and bypass the need for a server.
Or a button on the key fob. Perhaps the app can talk to the key fob over NFC or Bluetooth? Local/non-internet controls will probably be more reliable off-grid... can't have a server outage if there are no servers.
I guess my point is if you take a step back, there are often simple, good solutions possible.
The fact that we have lambdas/serverless functions and people are still over-engineering k8s clusters for their "startup project" is genuinely hilarious. You can literally validate your idea with some janky Python code and like 20 bucks a month.
The problem is that people don't like hearing their ideas suck. I do this too, to be fair. So, yes, we spend endless hours architecting what we'd desperately hope will be the next Facebook because hearing "you are definitely not the next Facebook" sucks. But alas, that's what doing startups is: mostly building 1000 not-Facebooks.
The lesson here is that the faster you fail, the faster you can succeed.
The alternative to CI/CD pipelines is to rely on human beings to perform the same repetitive actions the exact same way every single time without any mistakes. You would never convince me to accept that for any non-trivial project.
Especially in an age where you can basically click a menu in GitHub and say "Hey, can I have a CI pipeline please?"
No, the alternative is/was something like "make test" or "build_deploy_and_test.sh".
I think the 2 hours bit was the important part
CDD, or CV-driven development, as I like to call it.
This kind of complexity is unfortunately also embedded into model training data.
Left unchecked, Claude is very happy to propose "robust, scalable and production ready" solutions - you can try it for yourself. Tell it you want to handle new signups and perform some work like send an email or something (outside the lifecycle of the web request).
That is, implying you need some kind of a background workload and watch it bring in redis, workflow engines, multiple layouts for docker deployment so you can run with and without jobs, obscene amount of environment variables to configure all that, create "fallbacks" and retries and all kinds of things that you will never spend time on during an MVP and even later resist adding just because of the complexity and maintenance they require.
All that while (as in the diagram of the post), there is an Erlang/Elixir app capable of doing all that in memory :).
I don't really get this line of argument
Or at least it's not engaging with the obvious counterargument at all - that: "You may not need the scale now, but you may need it later". For a startup being a unicorn with a bajillion users is the only outcome that actually counts as success. It's the outcome they sell to their investors.
So sure, you can make a unscalable solution that works for the current moment. Most likely you wont need more. But that's only true b/c most startups don't end up unicorns. Most likely is you burn through their VC funding and fold
Okay stack overflow allegedly runs on a toaster, but most products don't fit that mold - and now that they're tied to their toaster it probably severely constrains what SO can do it terms of evolving their service
>So sure, you can make a unscalable solution that works for the current moment.
You're making two assumptions - both wrong:
1) That this is an unscalable solution - A monolith app server backed by Postgres can take you very very far. You can vertically scale by throwing more hardware at it, and you can horizontally scale, by just duplicating your monolith server behind a load-balancer.
2) That you actually know where your bottlenecks will be when you actually hit your target scale. When (if) you go from 1000 users to 10,000,000 users, you WILL be re-designing and re-architecting your solution regardless what you started with because at that point, you're going to have a different team, different use-cases, and therefore a different business.
Do you have actual examples of this?
Your solution is to basically do a re-write when scale becomes a problem. Which is the textbook example of something that sounds good but never works
On the other hand I can't think of a business that failed b/c it failed to scale :)
Ironic that the clicking those big buttons only causes a JS error to be logged to console with nothing else happening. That doesn't particularly lend to the authors credibility, although the advice of using simple architecture where possible is correct.
s/postgres/sqlite/g
Postgres is simpler. Get your cloud to manage it. Click to create instance, get failover with zero setup. Click button 2 to get guaranteed backups and snapshot point in time.
When you use sqlite, you can distribute your program by distributing a single executable file. That's what I call simple.
Yes that is simple, especially for a desktop app.
I would argue it is not resilient enough for a web app.
No one wrote the rules in stone, but I assume server side you want the host to manage data recovery and availability. Client side it is the laptop owners problem. On a laptop, availability is almost entirely correlated with "has power source" and "works" and data recovery "made a backup somehow".
So I think we are both right?
I was especially thinking about a program running on a server. I wouldn't statically link everything for a desktop program, because it just causes incompatibilities left-and-right. But for a server, you don't have external vendors, so this is a good option, and if you use SQLite as the database, then this means that deploying is not more complicated than uploading a single executable and this is also something, that can be done atomically.
I don't see how this has worse effects on recovery and availability. The data is still in a separate file, that you can backup and the modification still happens through a database layer which handles atomic transactions and file system interaction. The availability is also not worse, unless you would have hot code reloading without SQLite, which seems like an orthogonal issue.
ORMs have better support I've found in the past (at least in .NET and Go) for Postgres. Especially around date types, UUIDs and JSON fields IIRC.
You don’t need an ORM either. It’s just another level of complexity for very little to no gain in almost all cases. Just write SQL.
You always get a comment like this. I don't particularly agree. There are pros and cons to either approach
The tooling in a lot of languages and frameworks expects you to use an ORM, so a lot of the time you will have to put up a fair bit of upfront effort to just use Raw SQL (especially in .NET land).
On top of that ORM makes a lot of things that are super tedious like mapping to models extremely easy. The performance gains of writing SQL is very minor if the ORM is good.
Don't agree. Getting managed postgress from one of the myriad providers is not much harder than using sqlite, but postgress is more flexible and future proof.
Isn't the entire point of this post that many companies opt for flexible+future proof far too prematurely?
I use Postgres for pretty much everything once I get beyond "text in a database with a foreign key to a couple of things".
Why?
Because in 1999 when I started using PHP3 to write websites, I couldn't get MySQL to work properly and Postgres was harder but had better documentation.
It's ridiculous spinning up something as "industrial strength" as Postgres for a daft wee blog, just as ridiculous as using a 500bhp Scania V8 for your lawnmower.
Now if you'll excuse me, I have to go and spend ten seconds cutting my lawn.
This. So much this. Of course, at one point you start wanting to do queues, and concurrent jobs, and not even WAL mode and a single writer approach can cut it, but if you've reached that point then usually you a) are in that "this is a good problem to have" scalability curve, and b) you can just switch to Postgres.
I've built pretty scalable things using nothing but Python, Celery and Postgres (that usually started as asyncio queues and sqlite).
A queue and a database are more shots on the architecture golf though.
Yeah, we run a fairly busy systems on sqlite + litestream. It's not a big deal if they ae down for a bit (never happened though) so they don't need failover and we never had issues (after some sqlite pragma and BUSY code tweaking). Vastly simpler than running + maintaining postgres/mysql. Of course, everything has it's place and we run those too, but just saying that not many people/companies need them really. (Also considering that we see system which DO have postgres/mysql/oracle/mssql set up in HA and still go down for hours do a day per year anyway so what's it all good for).
back in the day, the hype was all arround postgres, but I agree
Recently, with the AWS outage, our stack of loads of different cloud providers ended up working pretty well! It might be a bit complex running distributed nodes and updating state via API, but its cheap and clearly resilient.
Or build your microservices as a monolith using a “local” async service mesh (no libs or boilerplate needed, its just an async interface for each service) and service namespaced tables in your DB, then just swap in a distributed transport on a per-case basis if you ever need to scale.
I agree 100%. "Complexity is not a virtue. Start simple. Add complexity only when you have proof you need it."
you guys are going to miss the days of over-engineered microservice solutions when you are debugging ai workflows :)
It's like debugging the code of that guy who wrote most of the project, was consuming entire coffee in the office, outtalked everyone at the meetings, and then relocated for a new job to Zurich or London.
12 years on, and a lot of Postgres-based services built since the OP site first went live, I now actually may recommend MongoDB as the sensible option...
The problem is job interviews, where you are expected to know how to scale everything reliably, so it wouldn't be satisfactory to answer that just have a monolith against a postgres instance.
I had a call the other day with a consultancy to potentially pick up some infrastructure work/project type stuff. Asked about timezones involved and they said a lot of their clientele are US based startups. "So it's mainly Kubernetes work" they said.
I personally would suggest the vast majority of those startups do not need Kubernetes and certainly don't need to be paying a consultancy to then pay me to fix their issues for them.
The problem with kubernetes is that containers just aren't quite enough.
You have an app which runs, now you want to put it in a container somewhere. Great. how do you build that container? Github actions. Great. How does that deploy your app to wherever it's running? Err... docker tag + docker push + ssh + docker pull + docker restart?
You've hit scale. You want redis now. How do you deploy that? Do you want redis, your machine, and your db in thre separate datacenters and to pay egress between all the services? Probably not, so you just want a little redis sidecar container... How does the app get the connection string for it?
When you're into home grown shim scripts which _are_ brittle and error prone, it's messy.K8s is a sledgehammer, but it's a sledgehammer that works. ECS is aws-only, and has its own fair share of warts. Cloud Run/Azure Container Apps are _great_ but there's nothing like those to run on DigitalOcean/Hetzner/whatever. So your choices are to use a big cloud with a simpler orchestartion, or use some sort of non-standard orchestration that you have to manage yourself, or just use k8s...
While I agree with most of this rant, I have a problem with the common "just use postgres" trope, often repeated here.
I recently had to work with SQL again after many years, and was appalled at the incidental complexity and ridiculous limitations. Who in this century would still voluntarily do in-band database commands mixed with user-supplied data? Also, the restrictions on column naming mean that you pretty much have to use some kind of ORM mapping, you can't just store your data. That means an entire layer of code that your application doesn't really need, just to adapt to a non-standard from the 70s.
"just use postgres" is not good advice.
"just use postgres" is an excellent advice. How about incidental complexity and ridiculous limitations of an ORM? Time spent learning how to use an ORM can better be spent 'refreshing' your SQL knowledge. Also, when you learn how an ORM works, you still don't know proper SQL nor how do databases works, so when you switch language now what, you quickly take a course on another ORM? SQL is a language, ORM is not,it's just ' an entire layer of code that your application doesn't really need' and in some applications you could never ever use an ORM.
It is rather clear that I am the only one in this discussion who recently had to write code that synchronizes data that I do not control with an SQL database. Everything is easy in toy applications, where you have your USERS table with First_Name and Last_Name.
Blame the C-suite who approve embedding AWS solution designers into teams.
Totally agree.
Now every system design interview expects you to build some monstrous stack with layers of caching and databases for a hypothetical 1M DAU (daily active users) app.
Mess in the head.
As a practitioner we subconciously optimise for "beauty", in maths, physics or dev. Most hackers are self-motivated, by that beauty, not by 40k ork style functional design.
I love the unnecessary buttons that do nothing :)
Job security-driven development. It explains why some projects are unnecessary complex.
But that’s half the fun (and knowledge about these systems got me my current job)
Is this targeted at startup bros with an MVP and a dream ?
In almost any other scenario I feel the author is being intentionally obtuse about much of the reality surrounding technology decisions. An engineer operating a linux box running postgres & redis (or working in an environment with this approach) would become increasingly irrelevant & would certainly earn far less than the engineer operating the other. An engineering department following "complexity is not a virtue" would either struggle to hire or employ engineers considered up-to-date in 2006.
Management & EXCO would also have different incentives, in my limited observations I would say that middle and upper management are incentivised to increase the importance of thier respective verticals either in terms of headcount, budget or tech stack.
Both examples achieve a similar outcome except one is : scalable, fault tolerant, automated and the other is at best a VM at Hetzner that would be swiftly replaced should it have any importance to the org, the main argument here (and in the wild) seems to be "but its haaaard" or "I dont want to keep up with the tech"
KISS has a place and I certainly appreciate it in the software I use and operating systems I prefer but lets take a moment to consider the other folks in the industry who aren't happy to babysit a VM until they retire (or become redundant) before dispensing blanket advice like we are all at a 2018 ted talk . Thanks for coming to my ted talk
While you you're making good points, this shows that engineers and industry intentionally make work more complex than necessary in order to justify higher prices for labor. This is not so uncommon in today's economy, especially white collar and regulated work that most people don't understand, but worth thinking about regardless.
To be fair, it's hard to imagine economy and civilization crashing hard enough to force us to be more efficient. But who knows.
Job offers require experience in technologies that you won't ever need building solo project. I'm not surprised when those big scale technologies get shoehorned into small project for the sake of learning, showcasing "look I know that one" etc. Only totally missing this point could explain why someone would make this hyperbole rant page
Funny, from the title I was expecting a productivity-adjacent "What have you even built?" article.
Except it's really a "What over-engineered monstrosity have you built?" in the theme of "choose boring technology"
p.s. MariaDB (MySQL fork) is technically older and more boring than PostgreSQL so they're both equally valid choices. Best choice is ultimately whatever you're most familiar with.
Not that oldness is much of a metric for quality, but Postgres does go quite a lot further back: https://en.wikipedia.org/wiki/PostgreSQL#Ingres_and_Universi...
MariaDB from 2009, based on MySQL from 1995.
PostgreSQL from 1996, based on Postgres95 from 1995, based on POSTGRES from 1989, based on INGRES from 1974(?).
I wonder if any lines of 1970's or at least 1980's code still survive in some corner of the PostgeSQL code base or if everything has been rewritten at least once by now? Must have started out in K&R C, if it was even C?
We don't have the complete version history of postgres, so that's not easy to know. There definitely are still lines from Postgres95 that haven't been changed since the initial import into our repository.
Somewhere there's a CVS repository with some history from before the import into the current repository, but unfortunately there's a few years missing between that repository and the initial import. I've not done the work to analyze whether any lines from that historical repo still survive.
12 years later and Postgres is (still) Enough and getting better by the day: https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
Heh
Once you have a service that has users and costs actual money, while you don’t need to make it a spaghetti of 100 software products, you need a bit of redundancy at each layer — backend, frontend, databases, background jobs — so that you don’t end up in a catastrophic failure mode each time some piece of software decides to barf.
uh, maybe you only have the issue that you need redundancies because you have so many pieces of software that can barf?
I mean it will happen regardless just from the side effects of complexity. With a simpler system you can at least save on maintenance and overhead.
Yes, but if your web server goes down for whatever reason, you’d rather have some more for your load balancer to round robin. Things like physical host dying are not exactly unheard of. Same with DB, once you take money, you want that replication and quick failover and offsite backup.
Is there any good reason to switch from mysql to postgres though?
The problem with doing things the sensible way (eschewing microservices and k8s when you work on projects that aren't hyperscale) is that you end up missing opportunities later on because recruiters will filter you because you can't meaningfully respond to the question about “how experienced you are with micro service architecture”. Granted I may have dodged a bullet by not joining a company with 50 engineers that claim to replicate Google's practices (most of which are here to make sure tens of thousands of engineers can work efficiently together), but still someone gets to pay the bill at the end of the month…
Are you doing software for money? Because not having Kubernetes in the project will stop you from receiving money. Someone please create with one of these smart AI tools the ultimate killer app: Kubernetes+crypto+AI+blockchain+Angular+Redux+Azure (Working only in Chrome browser).
That's already a preset in claude - use the /reddit-recommends-stack command. It doesn't bother to understand and modify your existing code, just completely rewrites it every time for speed and ease of vibe.
yeah because kubernetes for most people isn't actually difficult and its complexity is overblown unless you are faang scale
Redux catching strays. RTK is fine.
Yet the author spent a whole afternoon (hopefully not more!) writing a website to tell some people (who exactly?) that they’re doing it wrong.
it's a web page on a subdomain of his existing personal website, so probably didn't take him much time at all, probably about as fast as it would take to write up the text in a word document and then farting around with the styling and javascript a bit.
> Yet the author spent a whole afternoon
As opposed to what? Not doing anything at all and participating in this insanity of complexity?
Yeah, but the building phase of an overly complex system is rarely the big time suck: maintaining and modifying it are.
What the hell have you built? Turns out a pretty straightforward service.
That diagram is just aws, programming language, database. For some reason hadoop I guess. And riak/openstack as redundant.
It just seems like pretty standard stuff with some seemingly small extra parts because that make me think that someone on the team was familiar with something like ruby, so they used that instead of using java.
"Why is Redis talking to MongoDB" It isn't.
"Why do you even use MongoDB" Because that's the only database there, and nosql schemaless solutions are faster to get started... because you don't have to specify a schema. It's not something I would ever choose, but there is a reason for it.
"Let's talk about scale" Let's not, because other than hadoop, these are all valid solutions for projects that don't prioritize scale. Things like a distributed system aren't just about technology, but also data design that aren't that difficult to do and are useful for reasons other thant performance.
"Your deployment strategy" Honestly, even 15 microservices and 8 databases (assuming that it's really 2 databases across multiple envs) aren't that bad. If they are small and can be put on one single server, they can be reproduced for dev/testing purposes without all the networking cruft that devops can spend their time dealing with.
This comment makes this thread a great time capsule. Given that the website is now over 10 years old, it perfectly illustrates how much 'best practices' and architectural complexity (and cloud bills) have changed since then.
No no, don't give me this.
I was there before 10 years ago. I remember the pain in the ass that was hosting your own web server and own hardware, dealing with networking issues with cisco switches and thinking about getting a ccna. I remember the days of trying to figure out php and ranodm ass modules or how python and wsgi fit together on a slow ass windows machine instead of just spinning up an app and doing network calls using a spa.
Have you guys just forgotten all the enterprise crap that existed? Have you guys forgotten before that how things like compilers (ones you had to pay exorbintant amounts of money for) and different architectures were the headaches?
It's been two steps forward, one steps back, but we're still way better off.
Yes, people bring in k8s because they want to resume build and it goes poorly, but I've also used k8s in my personal setup that was much easier than the poor man's version I had of it.
All of this is just rose-tinted glasses, and people throwing the baby out with the bathwater. Just because some people have bad experiences with microservices because people don't often do them right, people just write them off completely.
I know people who can wrangle k8s and set up rules in whatever it's called to spin up and down the whole kaboodle of services effortlessly. It's like they know a whole level of programming I'm not familiar with, at all. I know the dotnet stuff pretty well, after many years fiddling with it. I do stuff in dotnet now I didn't have even terminology to talk about before. What they do in k8s and friends reminds me of that.
I personally don't care for it and if I design something I make it so it avoids that stuff if I can at all help it. But I've come to see that it can have real value.
The thing is though, that then you really need someone to be that very competent ops person. If you're a grug like me, you don't get many shots to be good at something. I probably don't have the years in me to be good at both ops and "pure" programming.
So if you are a startup and you're not some kind of not only very smart but smart, fast and with taste, maybe pick your battles.
If you are great at the ops side, ok, maybe design it from that perspective and hire a bunch of not-made-of-unobtainium regular middle-of-the-road coders to fill in what the microservices and stuff should contain and manage those. This requires money for a regular hiring budget. (Or you are supersmart and productive and "play pretend enterprise" with all roles yourself. But I have never seen such a person.)
Or focus on a tight design which can run without any of that, if you come more from the "I'm making a single program" part of the world.
Tinkering syndrome can strike in any kind of design, so you need personal maturity whatever path you choose.
> Honestly, even 15 microservices and 8 databases (assuming that it's really 2 databases across multiple envs) aren't that bad
Sure, they aren't bad. They're horrible.
Whoosh
> Honestly, even 15 microservices and 8 databases (assuming that it's really 2 databases across multiple envs) aren't that bad.
Clown fiesta.
Nah, postgres is overhyped, MariaDB is enough and recently: <https://mariadb.org/mariadb-vs-postgresql-understanding-the-...>
I love this comment because it misses the point in exactly the way the article talks about.
No it doesn't. The fact that the article had to say "Maybe Redis for caching" because Postgres can't handle caching at scale shows that Postgres is not a perfect solution. Choosing an alternative database that can do everything you need means simplifying your architecture in the spirit of the article (not to say that MariaDB specifically is the right choice here, I'm not familiar enough with it to comment on that).
> because Postgres can't handle caching at scale
Which is the exact point the article is making. You don't have scale. You don't need to optimize for scale. Just use Postgres on its own, and it'll handle the scale you need fine.
But what if all 12 users log in 100 times during the same second of the day?
It's a reasonable worry!
Love the downvotes ;P